Where do you spend your gates?

Reverend

Banned
So... I was trading emails with a developer on this topic and it came down to "features vs performance".

He said (i.e. complained) that ATI and NVIDIA are spending massive amounts of gates to support new API iterations.... at the cost of performance optimizations. I said that performance optimizations can actually be tougher to "come up with" compared to supporting the newest API versions.

To the various IHV folks that visit our site/forums, here's a question (feel free to post anonymously :) ):

How does your company determine the amount/ratio of gates to be dedicated to performance optimizations compared to supporting new API features? Or is this question moot because of what I told the developer (above)?
 
Just a student, but I'll take a stab. It seems this question loses relevance as gpu/vpu's become more general purpose and programmable. Soon enough, all IHV's will have to do is intelligently arrange and schedule processing array's capable of handling all that API's throw at it. They will have the ability to program software in order to add all the configurable optimizations needed. Maybe the question will become: how do IHV's arrange their drivers to exploit optimizations which will accelerate the available functions?

What if IHV's made drivers for different purposes to exploit such conditions; one for offline rendering, one for generic multitexturing/combining, one for procedural shading/math, and one for a flexible conbination of these?
 
You spend enough gates to get you to your required API level, and then you spend the rest on making it fast.
 
PSarge said:
You spend enough gates to get you to your required API level, and then you spend the rest on making it fast.
I agree. Supporting the next API level is important because that pushes the industry and drives the demand for more chips as much as performance. It doesn't matter how many pixels you can push if they don't look good.
 
Luminescent said:
Just a student, but I'll take a stab. It seems this question loses relevance as gpu/vpu's become more general purpose and programmable. Soon enough, all IHV's will have to do is intelligently arrange and schedule processing array's capable of handling all that API's through at it. They will have the ability to program software in order to add all the configurable optimizations needed. Maybe the question will become: how do IHV's arrange their drivers to exploit optimizations which will accelerate the available functions?

What if IHV's made drivers for different purposes to exploit such conditions; one for offline rendering, one for generic multitexturing/combining, one for procedural shading/math, and one for a flexible conbination of these?
I agree. This is definetly the way things seem to be right now and the path the industry will take in the future.
 
To expand on this a little more.....

It all comes down to bandwidth, and bandwidth STILL being the most "expensive" aspect of video cards.

GPU complexity gets cheaper much faster than memory bandwidth does.

And, as a grossly oversimplified general rule, Bandwidth dictates "performance" and GPU complexity dictates features.

So as the two continue to diverge, the only way for the companies to churn out new products that are "better" than their older products at similar price points, is do "do more complex things" with the same performance, vs. increase the performance with the same complexity.
 
Joe DeFuria said:
To expand on this a little more.....

It all comes down to bandwidth, and bandwidth STILL being the most "expensive" aspect of video cards.

GPU complexity gets cheaper much faster than memory bandwidth does.

And, as a grossly oversimplified general rule, Bandwidth dictates "performance" and GPU complexity dictates features.

So as the two continue to diverge, the only way for the companies to churn out new products that are "better" than their older products at similar price points, is do "do more complex things" with the same performance, vs. increase the performance with the same complexity.

Pretty much accurate. But you've got to consider that memory bandwidth saving features, such as LMA & HyperZ, now take a very signifiant part of the die.
If you didn't have that, I'd guess there might be enough room to implement VS3.0. & PS3.0. on 0.15

So that reasoning is truly oversimplified, although it's fundamentally true :)

Uttar
 
Uttar said:
Pretty much accurate. But you've got to consider that memory bandwidth saving features, such as LMA & HyperZ, now take a very signifiant part of the die.

I agree, it would appear that to a certain extent tranistors can be traded for [useable] bandwith: either through more/better/smarter cache or vis routines. I also would think that although Bandwith is important, DX9 and beyond will see the paradigm shifting in favor of computational ability over external bandwith. The eventual merging of the Shaders [Frag & Vertex] at an architectural low level deomstrates shows this very fact. Architecture will most likely become much for effecient in terms of Computational effeciency Vs. Tranistor Budget allocation per task. The days when a quarter of your die and tranistor budget sitting idle while you're limited in one aspect can't end fast enough.
 
I agree, it would appear that to a certain extent tranistors can be traded for [useable] bandwith: either through more/better/smarter cache or vis routines. I also would think that although Bandwith is important, DX9 and beyond will see the paradigm shifting in favor of computational ability over external bandwith. The eventual merging of the Shaders [Frag & Vertex] at an architectural low level deomstrates shows this very fact. Architecture will most likely become much for effecient in terms of Computational effeciency Vs. Tranistor Budget allocation per task. The days when a quarter of your die and tranistor budget sitting idle while you're limited in one aspect can't end fast enough.

Most definately, this isn't new, however. This has been the intent of accelerators from the begining, that's why one uses all sorts of procedural and approximation techniques to keep storage to a minimum. The cost of storing and transmitting is too high, so it's just better to generate things procedurally. This is why I'm hoping we start seeing some real breakthroughs in the ability to cram more and more executions units into a chip.
 
Back
Top