geo said:
What I still don't get with these threads, is why G71 with it's 50% more TMU power (clock x TMUs) isn't stomping R580 into the performance dirt and providing better quality filtering to boot?
Well arguably, with AA/AF off G71 does stomp R580. But we know what a stupid comparison that is.
Is TMU power a principle constraint with R580? I don't really know, tho it wouldn't surprise me.
No, not when you turn on AA and AF. That's the point really.
Credit to Firingsquad, which nowadays normally only benches with AA/AF (unless there's HDR involved in games other than Lost Coast).
There is always a constraint, for both IHVs. How come we never talk about whatever the heck is holding G71 back from living up to its TMUs?
Yep, G71 looks pretty shit architecturally, particularly now it's running at the same or higher clocks as R580. A lot of its performance gain over G70 seems to be down to its ROPs. It's feature/technology gaps really do look quaint bearing in mind it's NVidia's 2nd/3rd generation of SM3. It makes a nice SM2a part
If you go back and compare X700XT and X1600XT, you can just about discern that R5xx style TMUs are about 50% "faster" than previous generations of ATI TMUs (in real game tests, not synthetics). It's all a bit sketchy though as there's so little data.
Having said that, G7x TMUs also seem to be more sprightly than their predecessors (in games).
In general I think R520 is a big misdirection. I think its texturing is too easily ALU-limited and a lot of the performance benefits we see in R580 are due to the texturing being able to run at full speed. Put another way, I think the texturing architecture in R5xx is too efficient for the "1:1" architecture of R520 - the ALUs seemingly can't keep up. (A more careful analysis based on framerate minima would be so useful...)
Sadly RV515 is too cut-down to make useful comparisons of texturing. Texturing performance ultimately isn't just how many and how fast the TMUs are.
I'm also fairly sure that a die photo of either RV530 or R580 would reveal just how much bigger than everyone's expecting the texturing is. And not forgetting that if you have out of order scheduling, then its complexity and size on die is rather wasted on a 1:1 architecture - effectively wasting one hierarchical level of the embarrassingly parallel nature of pixel shading, calculating (ALU) just one quad at a time, when ALU pipes don't directly consume external memory bandwidth. 3 ALU quads per shader unit is such a sublimely fruitful use of that property.
Jawed