ATI's decision concerning TMUs

superguy · Apr 21, 2006

Also nice little X1800GTO review

http://www.firingsquad.com/hardware/powercolor_radeon_x1800_gto/page4.asp

The Gto is clocked 100mhz slower than X1600..both have 12 shaders. Granted the memory bandwidth is higher, but oh look, there's 7600GT doing great with 128 bit bus so there goes that excuse.

Conclusion: TMU's!

So now ATI has to use a big die part instead of a small "built from the ground up to be mainstream" die!

ATI really needs to sort these TMU deficits! It holds back R580 immensly as well. R580 would be administering a whooping rarely seen in graphics cards on the competition..if not for the crap architecture!

Tahir2 · Apr 21, 2006

R580 would be administering a whooping rarely seen in graphics cards on the competition..if not for the crap architecture!

The R580 does not have a crap architecture. The lack of TMU's but increase in shading power makes it the most powerful and versatile graphics card on the market at the moment in a lot of different games.

The R580 also is doing more work than the equivalent G7x architecture, it has superior Anti-Aliasing (IMHO) and filtering as well as a much more competent Video Decoding engine.

Where is the crap architecture?

TehGoldenBee · Apr 21, 2006

I think he means just the TMU part.

Tahir2 · Apr 21, 2006

TehGoldenBee said:
I think he means just the TMU part.

And he thinks it is a crap architecture (both R520 and R580).

I strongly disagree.

DemoCoder · Apr 21, 2006

Tahir2 said:
The R580 also is doing more work than the equivalent G7x architecture, it has superior Anti-Aliasing (IMHO) and filtering as well as a much more competent Video Decoding engine.

Evidence? Last couple of times that they were compared, NVidia's DVD decoder still produced better images than ATIs, and NVidia eventually did deliver HD decoders, proving atleast some of the worth of having a programmable video processor onboard the chip. In fact, from a capability perspective alone (video programmable processor vs fixed function+pixel shaders), NVidia's looks better from a tech-geek architectural perspective atleast.

Tahir2 · Apr 21, 2006

http://www.beyond3d.com/previews/ati/avivo/awu/

Edit: the table is there.

superguy · Apr 21, 2006

Yes by crap architecture I am reffering to the TMU part only.

Another good example is that X1800XT is nearly as good as X1900XT. Why because both have 16 TMU's. You only need 16 pixel shaders to take advantage of the TMU's in most cases!

R580 only nets you maybe 10-15% most of the time despite triple shader power.

I would say X1800XT is a good architecture too, because the TMU's are properly matched with the shader power, giving it great performance, as we see in the review as well.

Like I said, if R580 actually could use it's 48 shaders, we would likely be seeing a buttwhooping you rarely get the chance to see in graphics cards. As if the X800XT had 32 pipes instead of 16 back in the day.

And how many chances do you get to do that? Not very many. Well, there was one blown..

superguy · Apr 22, 2006

At least when the X1900PE comes out, if it's the alledged 750 mhz, you will see a straight performance increase of 15% due to the clocks.

Because it will be speeding up the 16 TMU's..

Sobek · Apr 22, 2006

Crom laughs at your strange whining.

It's all a matter of patience, R580, regardless of how 'mainstream' it presently is, was designed for a sleuth of games that aren't really out yet. Your argument that TMU's are 'killing ATI' is valid, but only so far...with due time, the tripled shading power will come on it's own. All IMHO of course...and again, IMO, this hardly 'killing' ATI.

Geo · Apr 22, 2006

DemoCoder said:
Evidence? Last couple of times that they were compared, NVidia's DVD decoder still produced better images than ATIs,

Yeah? Since Cat5.13? Linkage, please?

Geo · Apr 22, 2006

What I still don't get with these threads, is why G71 with it's 50% more TMU power (clock x TMUs) isn't stomping R580 into the performance dirt and providing better quality filtering to boot? Is TMU power a principle constraint with R580? I don't really know, tho it wouldn't surprise me. There is always a constraint, for both IHVs. How come we never talk about whatever the heck is holding G71 back from living up to its TMUs?

superguy · Apr 22, 2006

geo said:
What I still don't get with these threads, is why G71 with it's 50% more TMU power (clock x TMUs) isn't stomping R580 into the performance dirt and providing better quality filtering to boot? Is TMU power a principle constraint with R580? I don't really know, tho it wouldn't surprise me. There is always a constraint, for both IHVs. How come we never talk about whatever the heck is holding G71 back from living up to its TMUs?

G71 has less shader power though. Much less.

It's simply not being bottlenecked like R580. Pretty simple concept..

And that's one reason it has a die nearly HALF the size.

If you have twice as big a die, you ought to be KICKING BUTT in performance, plain and simple. You can put double pipelines in double size.

If Nvidia had 32 pipes theyd probably be spanking R580 at still way less size. They just miscalculated their refresh cycle needs, which was a mistake they cant do anything about now, but probably wont ever happen again. A missed opportunity for ATI that you do not get many of.

superguy · Apr 22, 2006

It's all a matter of patience, R580, regardless of how 'mainstream' it presently is, was designed for a sleuth of games that aren't really out yet. Your argument that TMU's are 'killing ATI' is valid, but only so far...with due time, the tripled shading power will come on it's own. All IMHO of course...and again, IMO, this hardly 'killing' ATI.

Texture requirments in future games will go up as well, which is bottlenecking R580 currently. So the nicest thing you can say is R580 will slow down slower than the other guy (because it will be less bottlenecked by shaders). Not exactly a ringing endorsment.

And by then, R600 and G80 will be merrily doubling it's performance anyway. You only get a short time where people actually care about R580, and that is now.

Tahir2 · Apr 22, 2006

geo said:
Yeah? Since Cat5.13? Linkage, please?

http://www.beyond3d.com/previews/ati/avivo/awu/index.php?p=03

Please don't make me cry.. or I will call Wavey!

(P.S. Yes me knows yuz bein sarcastic and somefellas been livin under a rock

)

Geo · Apr 22, 2006

Of course G71 is being bottlenecked. Just not by the TMUs. It's easy for us to talk about missed opportunities on parts that had considerable lead times as if they should have known a year ago what we know today.

I could go off on NV for not having an 8 quad part that is still significantly smaller than R580, and therefore cheaper to produce, and kicks its performance butt. I find that pointless tho (and only mention it as an example that the coin you're wailing about has two sides).

I mean, what exactly is it you expect either IHV to do about their missed opportunities *now*?

So far as I can see, ATI has a features lead, much like NV did with NV40 vs R420. Rough performance parity is good enough for them right now, much as it was for NV then. Any they "win" is gravy. NV is the one that has the onus of winning benchmarks in such a scenario, in my view, just as that onus was on ATI last time around.

trinibwoy · Apr 22, 2006

geo said:
What I still don't get with these threads, is why G71 with it's 50% more TMU power (clock x TMUs) isn't stomping R580 into the performance dirt and providing better quality filtering to boot? Is TMU power a principle constraint with R580? I don't really know, tho it wouldn't surprise me. There is always a constraint, for both IHVs. How come we never talk about whatever the heck is holding G71 back from living up to its TMUs?

Yeah I've often wondered about that but I figure it all comes down to memory bandwidth efficiency. Because what else could explain their refusal to do away with angle dependent aniso besides the TMU's themselves being less than stellar?

Jawed · Apr 22, 2006

geo said:
What I still don't get with these threads, is why G71 with it's 50% more TMU power (clock x TMUs) isn't stomping R580 into the performance dirt and providing better quality filtering to boot?

Well arguably, with AA/AF off G71 does stomp R580. But we know what a stupid comparison that is.

Is TMU power a principle constraint with R580? I don't really know, tho it wouldn't surprise me.

No, not when you turn on AA and AF. That's the point really.

Credit to Firingsquad, which nowadays normally only benches with AA/AF (unless there's HDR involved in games other than Lost Coast).

There is always a constraint, for both IHVs. How come we never talk about whatever the heck is holding G71 back from living up to its TMUs?

Yep, G71 looks pretty shit architecturally, particularly now it's running at the same or higher clocks as R580. A lot of its performance gain over G70 seems to be down to its ROPs. It's feature/technology gaps really do look quaint bearing in mind it's NVidia's 2nd/3rd generation of SM3. It makes a nice SM2a part

If you go back and compare X700XT and X1600XT, you can just about discern that R5xx style TMUs are about 50% "faster" than previous generations of ATI TMUs (in real game tests, not synthetics). It's all a bit sketchy though as there's so little data.

Having said that, G7x TMUs also seem to be more sprightly than their predecessors (in games).

In general I think R520 is a big misdirection. I think its texturing is too easily ALU-limited and a lot of the performance benefits we see in R580 are due to the texturing being able to run at full speed. Put another way, I think the texturing architecture in R5xx is too efficient for the "1:1" architecture of R520 - the ALUs seemingly can't keep up. (A more careful analysis based on framerate minima would be so useful...)

Sadly RV515 is too cut-down to make useful comparisons of texturing. Texturing performance ultimately isn't just how many and how fast the TMUs are.

I'm also fairly sure that a die photo of either RV530 or R580 would reveal just how much bigger than everyone's expecting the texturing is. And not forgetting that if you have out of order scheduling, then its complexity and size on die is rather wasted on a 1:1 architecture - effectively wasting one hierarchical level of the embarrassingly parallel nature of pixel shading, calculating (ALU) just one quad at a time, when ALU pipes don't directly consume external memory bandwidth. 3 ALU quads per shader unit is such a sublimely fruitful use of that property.

Jawed

Geo · Apr 22, 2006

Well, then that begs the question, Jawed --what is R580's performance constraint? If xbit is right, and gddr4 is only going to add 15%? (which ain't bad for a refresh, but that's not the point I'm after here)

Mintmaster · Apr 22, 2006

trinibwoy said:
Yeah I've often wondered about that but I figure it all comes down to memory bandwidth efficiency. Because what else could explain their refusal to do away with angle dependent aniso besides the TMU's themselves being less than stellar?

Well, die space would explain it (BTW, BW efficiency doesn't make much sense as an explanation IMO). NV40 and G71 have much higher shader and texture unit density than NV30, and you can't do that for free.

Texture units aren't cheap, and that's why ATI took the route they did. ATI's strategy is to increase the cheap math units to maximize usage of the texture units. NVidia's strategy is to double the address calculation part of the texture units as arbitrary MADD/DP3 units when not in use. Both strategies have pros and cons.

Geo · Apr 22, 2006

Tahir2 said:
http://www.beyond3d.com/previews/ati/avivo/awu/index.php?p=03

Please don't make me cry.. or I will call Wavey!

(P.S. Yes me knows yuz bein sarcastic and somefellas been livin under a rock )

I wasn't being sarcastic, I swear! I have a lot of respect for Democoders interest and expertise in video-related matters.

So, two thoughts occurred to me.

1). I haven't seem him around as much lately, so maybe he is a bit behind the curve on keeping up with video-quality reviews such as Wavey's, FiringSquad, etc since 5.13. In which case my asking for a link post 5.13 would quickly get him up to speed when he tried to provide one.

2). I seem to recall NV swearing revenge, and further a note in a recent driver release notes that they'd worked on PV IQ. . .so DC being DC, it seemed possible that he's seen even more recent comparitive reviews on the matter than I have. In which case I get educated instead.

Either way, I'm kewl wid it, no sarcasm involved. . .

ATI's decision concerning TMUs

superguy

Tahir2

TehGoldenBee

Tahir2

DemoCoder

Tahir2

superguy

superguy

Sobek

Locally Operating

Geo

Mostly Harmless

Geo

Mostly Harmless

superguy

superguy

Tahir2

Geo

Mostly Harmless

trinibwoy

Meh

Jawed

Geo

Mostly Harmless

Mintmaster

Geo

Mostly Harmless