It looks to me like events are conspiring to reverse the long trend towards more shader math ops per texture read. Anyone like to chime in and give a clearer understanding?
What I see is:
Increases in shader performance have outpaced memory bandwidth for a very long time. If it's easy to improve shaders and hard to improve fillrate, then GPUs will be made to provide more shader math capability and games will follow, making ever more complex shaders that incestuously utilize cache bandwidth without referencing the outside world (RAM) as much. As late as the ATi R400 (e.g. radeon X850) series there was an equal number of pixel shaders, TMUs and ROPs; where as R9 480 has 2304 unified shaders to 144 TMUs and 32 ROPs. Some of those shaders will be cranking on vertices, but still, that's a card capable of 72 math operations and 4.5 texture reads per output pixel.
3D stacks of memory (e.g. HBM) with extremely wide interfaces on a silicon interposer appear to make it possible to keep making memory interfaces wider and wider without eating so much power. HBM1 to HBM2 is a factor ~2 leap in memory performance, and this seems like it could keep going for some time. How fast does the memory controller grow in die size when memory interface increases in width?
Each new process node seems to be spaced farther apart and transistor performance, cost and power seem to be making smaller increments. Geometric scaling started failing in 2000 and Dennard scaling has been almost dead since 2006. ITRS has basically given up and does not offer a road map beyond 10 nm any more; though TSMC might try to keep going on their own to 5 nm or whatever; looks like the end is in sight for silicon.
High resolution displays and VR also seem to make fillrate relatively more important than it used to be. Many VR games now use forward renderers; as far as I understand, mainly to reduce fillrate requirements over defered shading. Foveated rendering is an interesting wildcard; that would keep the resolution required to match the human visual system quite modest as you only need 60 pixels per degree within a few degrees of the foeva and detail can be drastically stepped down towards further out. Framerate has also finally started to become recognized as important.
I have a Vive and I don't at all regret it. Amazing experiences can allow you to overlook all the various visual artefacts, but they are still prescent and things would be better without them. Among the visual artefacts and problems I would say first of all resolution; you're blowing up the pixels with a huge 110 degree FOV, which makes it feel similar to sitting close to a 19 inch CRT at somewhere around 800x600 to 1024x768. Second is the optics; can't get away from god rays with a fresnel. Thirdly I'd say refresh rate. 90 Hz is OK, but not nearly close to perfect. Strobing is what saves it from being outright bad (90 FPS on a CRT or on the vive or does look better than 144 FPS on a constantly lit, non-strobing LCD; the movement of your eyes during the constantly lit phase smears the object, AKA persistence blur). Normal mapping doesn't work well; only on small features; in VR you can't fool me; I can see that door is perfectly flat and that beveling or whatever is not really there; more polygons is very important, at least up close. A distant 5th is better shader quality; that's not very important for feeling present in this alien world but surely it will look nicer. At current resolutions, supersampling and AA more generally makes a huge difference in quality. Everything about the requirements for VR just screams fillrate to me.
What about the balance of TMUs to ROPs? Is there a reason for this to change? AFAIK; if a surface uses a bunch of different maps (i.e. specular map, specular sharpness map, normal map, colour map etc.) you'd want quite a lot of texture reads for each output pixel. I don't see a reason for less TMUs per ROP, but perhaps more shader work can be rolled into textures instead of computing it all on the fly?
Do you think we can expect a significant change in the ratios between shaders, ROPs and TMUs in the comming cards from Nvidia and AMD sporting HBM2?
What I see is:
Increases in shader performance have outpaced memory bandwidth for a very long time. If it's easy to improve shaders and hard to improve fillrate, then GPUs will be made to provide more shader math capability and games will follow, making ever more complex shaders that incestuously utilize cache bandwidth without referencing the outside world (RAM) as much. As late as the ATi R400 (e.g. radeon X850) series there was an equal number of pixel shaders, TMUs and ROPs; where as R9 480 has 2304 unified shaders to 144 TMUs and 32 ROPs. Some of those shaders will be cranking on vertices, but still, that's a card capable of 72 math operations and 4.5 texture reads per output pixel.
3D stacks of memory (e.g. HBM) with extremely wide interfaces on a silicon interposer appear to make it possible to keep making memory interfaces wider and wider without eating so much power. HBM1 to HBM2 is a factor ~2 leap in memory performance, and this seems like it could keep going for some time. How fast does the memory controller grow in die size when memory interface increases in width?
Each new process node seems to be spaced farther apart and transistor performance, cost and power seem to be making smaller increments. Geometric scaling started failing in 2000 and Dennard scaling has been almost dead since 2006. ITRS has basically given up and does not offer a road map beyond 10 nm any more; though TSMC might try to keep going on their own to 5 nm or whatever; looks like the end is in sight for silicon.
High resolution displays and VR also seem to make fillrate relatively more important than it used to be. Many VR games now use forward renderers; as far as I understand, mainly to reduce fillrate requirements over defered shading. Foveated rendering is an interesting wildcard; that would keep the resolution required to match the human visual system quite modest as you only need 60 pixels per degree within a few degrees of the foeva and detail can be drastically stepped down towards further out. Framerate has also finally started to become recognized as important.
I have a Vive and I don't at all regret it. Amazing experiences can allow you to overlook all the various visual artefacts, but they are still prescent and things would be better without them. Among the visual artefacts and problems I would say first of all resolution; you're blowing up the pixels with a huge 110 degree FOV, which makes it feel similar to sitting close to a 19 inch CRT at somewhere around 800x600 to 1024x768. Second is the optics; can't get away from god rays with a fresnel. Thirdly I'd say refresh rate. 90 Hz is OK, but not nearly close to perfect. Strobing is what saves it from being outright bad (90 FPS on a CRT or on the vive or does look better than 144 FPS on a constantly lit, non-strobing LCD; the movement of your eyes during the constantly lit phase smears the object, AKA persistence blur). Normal mapping doesn't work well; only on small features; in VR you can't fool me; I can see that door is perfectly flat and that beveling or whatever is not really there; more polygons is very important, at least up close. A distant 5th is better shader quality; that's not very important for feeling present in this alien world but surely it will look nicer. At current resolutions, supersampling and AA more generally makes a huge difference in quality. Everything about the requirements for VR just screams fillrate to me.
What about the balance of TMUs to ROPs? Is there a reason for this to change? AFAIK; if a surface uses a bunch of different maps (i.e. specular map, specular sharpness map, normal map, colour map etc.) you'd want quite a lot of texture reads for each output pixel. I don't see a reason for less TMUs per ROP, but perhaps more shader work can be rolled into textures instead of computing it all on the fly?
Do you think we can expect a significant change in the ratios between shaders, ROPs and TMUs in the comming cards from Nvidia and AMD sporting HBM2?