I thought I would start a new thread about possible ways to continue the exponential growth in gfx processing power. Let my quickly set the example situation.
Let's assume we have nv30, running at 400 MHz. We can process an avarage of 50 assembler instructions / pixel in our application and achieve real-time performance. This is supported by peak rate of 4 instructions / pixel in every pipe for each cycle.
Now, we set a goal; we want to have same performance but with "photorealistic" 1024 assembler instructions / pixel. We also believe what nVidia as telling us and we can expect this performance level in about 3.5 years. Spring 2006 is our release time.
How can we achieve this goal ?
For example, we can double the clock rate to 800 MHz. We can also double the pipelines to 16. Maybe we have to also make the pipelines independent each other, so that we can process small ~one pixel polygons. Also memory bandwidth gets raised.
BUT pixelshader performance is still roughly 20 / 4 = 5 times slower than the goal. Now, our nv30 can do peak 4 instructions in paraller. This is about as good as best _general_ superscalar processors can do today . Also if our fantasy hw has fully Touring-complete pixel shaders, we have 16 general superscalar processors...
So, any insights / ideas about how the performance can be increased 5 fold for each pixelpipe ? All suggestions are welcome !
Let's assume we have nv30, running at 400 MHz. We can process an avarage of 50 assembler instructions / pixel in our application and achieve real-time performance. This is supported by peak rate of 4 instructions / pixel in every pipe for each cycle.
Now, we set a goal; we want to have same performance but with "photorealistic" 1024 assembler instructions / pixel. We also believe what nVidia as telling us and we can expect this performance level in about 3.5 years. Spring 2006 is our release time.
How can we achieve this goal ?
For example, we can double the clock rate to 800 MHz. We can also double the pipelines to 16. Maybe we have to also make the pipelines independent each other, so that we can process small ~one pixel polygons. Also memory bandwidth gets raised.
BUT pixelshader performance is still roughly 20 / 4 = 5 times slower than the goal. Now, our nv30 can do peak 4 instructions in paraller. This is about as good as best _general_ superscalar processors can do today . Also if our fantasy hw has fully Touring-complete pixel shaders, we have 16 general superscalar processors...
So, any insights / ideas about how the performance can be increased 5 fold for each pixelpipe ? All suggestions are welcome !