Why ATI ditched the R300 architechture ?

ophirv

Banned
Looking back at the R300 architechture including R350 , R420 and so on - it looks like these chips gave great performance per transistor count .

Let me explain : R420 had about 160M transistors and could perform nicely compared to nv40 which had 220M transistors .

If ATI manufactured today a card with 360M transistors using the R300 architechture it was a card with 48 TMUs ( not just ALUs like R580 ) .

I figured this out by looking at R300 and R420 which had 110M transistors and 160M transistors respectively . The difference between them is 8 TMUs and 50M transistors .

So if 50M transistors equals 8 TMUs so by adding 200M transistors to the R420 we would have a much stronger card then the R580 with 48 TMUs , ALUs and ROPs .

My feeling is that R300 architechture was very efficient and ATI should have stick to it .

And yes ! I know this is only pixel shader 2.0 but I trully think that given the added transistor count - shader model 3.0 doesn't live to the hype .
 
The R520/R580 is based off the R300 architecture. It's been modified obviously. But Im not sure I understand your complaint.
 
Also consider the memory bandwidth. The part you describe has basically 6 times the "pipes" as R300, but could it have six times the BW?
 
I think what the original poster was saying that there should have been more focusing on other aspects such as more "pipelines"...

Though he is forgetting that even if you add more, there's still a memory bandwidth limitation (and other limitations)... also.. you can't just "count" transitors when you are comparing different revisions of the same/similar architecture because particular features have been added or revised such as 3Dc and angle-independent aniso..

He's also downplayed the significance of SM3 when at this point, it is stupid not to include it. Of course at this point, nothing significant has been used yet with SM3 (performance gains is minor - some people are still looking for particular "SM3 only" effects)... it does take time for developers to play around with it more before "just using it"...

It's not that simple really as far as it is concerned.
 
Last edited by a moderator:
Jawed said:
Compare the 100m transistors of RV515:

http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=103&orderby=release_date&order=Order&cname=

to the 107m transistors of R300

http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=14&orderby=release_date&order=Order&cname=

Hint: they perform about the same, even though the former is only four pipes.

Jawed

You forgot that the manufacturing process has goten much better and now we can get twice the core clock speed .

R300 should have today twice the performance ( when you are not memory limited ) thanks to 90nm manufacturing process .

It would have been a such great low budget card ( far stronger then RV515 ) .
 
ERK's point is probably critical. Do you think you can feed 48 TMUs per clock with just 512bits of bandwidth per clock? Don't forget a bunch of ROPs, too (48, if you really stuck to R300's architecture :LOL: ). Even if manufacturing lets you clock the core twice as high, is it cheap to provide 2x faster RAM?

Deathlike notes efficiency, too. Apparently quite a few transistors were dedicated to R5x0's new memory controller and threading dispatcher, both of which serve to maximize available bandwidth. This would seemingly point to memory constraints, though I wonder if IHVs (also) do this to shift profit to their end (the GPU).

I can agree with you on the budget front (just look at how the X700 compares with the X1300P in Dave's penultimate review). But then we'll get to the situation with the GF4 and GF4MX, maintaining an older architecture at the low end for cost reasons while pushing new features at the high end. The problem is that you're going to sell more of the cheap stuff and therefore likely choke adoption of new, high-end-only features. This in turn limits the attraction of your new stuff.

I've only thought it out that far. :) Compare the 6600GT to the 9800P and X700, though, and you'll see that you can have both new features and top speed. In fact, a big selling point of the GF6 series over the rehashed RX series was the potential for better IQ and more efficiency in newer games: in essence, transistors as advertising. But, if you prefer speed over features, just buy the previous generation. X700s are still available, no?

BTW, I recall devs chafing at R300's instruction count pretty soon after NV30 was launched. R300 was good, but, after three years (!), it might be time to move on.

As for a 48 pipe card to compete with R580, I guess the 32 pipe G71 should be a good substitute. We'll see whether ATI aimed too far ahead in separating ALUs from TMUs to maximize transistor usage and memory bandwidth. I'm really curious to see how G71 and R580 will compare with the same bandwidth.
 
Well, if you look at pictures of the R520 die, you'll notice that a huge portion is taken up by the new memory controller. This new memory controller may well be one reason why the performance per clock per transistor in no AA/AF scenarios seems rather low. But, the memory controller does have its benefits. In particular, I believe it is reponsible for the very small performance hit from enabling AA that the R5xx architecture enjoys.

So ATI made design decisions that reduce some aspects of performance while improving others. This is the way things progress. You stop worrying so much about performance in past games, as that performance is going to be high no matter what, and start focusing on future games. In light of the competition from nVidia, of course, it might have been better if they made some slightly different decisions, but I suppose that depends largely upon whether or not nVidia will increase availability of the GeForce 7800 GTX 512 and how far down the road the G71 actually is (and what it will be).
 
As long as performance and efficiency increases from older to current products, I really don't understand the logic behind the question.

However if this should mostly be targeted at mainstream offerings like RV530, I think coming RV560 will give most answers to all possible questions.
 
The memory BW problem is common to ATI and NVIDIA and only 512 bit bus will be the solution . The ring bus is just an optimization - and NV performs very well without it .
 
ophirv said:
The memory BW problem is common to ATI and NVIDIA and only 512 bit bus will be the solution . The ring bus is just an optimization - and NV performs very well without it .
Actually, erm, the ring bus is mostly about moving data around on the chip, and doesn't have much to do with memory bandwidth optimizations itself. From what I can tell, the memory bandwidth optimizations of ATI's new memory controller seem to mostly be tied to more caching (and potentially more intelligent caching).

Regardless, though, there is no solution to the memory bandwidth problem. Moving to a 512-bit bus will make graphics hardware much more expensive to manufacture, and I still doubt it will ever happen. We're doing pretty well with memory bandwidth as it stands, and with the hardware that is now available, games will be able to move towards more math-heavy shaders, which will reduce the relative requirements of memory bandwidth with respect to fillrate.
 
Before R300 no one thought 256 bit bus MC will be possible .
Why 512 bit bus is not an option today or in the near future ?
 
But the R300 was FP24, and IIRC SirEric mentioned once that going FP32 would mean a rather large increase in transistor count per pipeline, which ATI felt was not worth for the R300. But for SM3.0, FP32 is required.
 
ophirv said:
The memory BW problem is common to ATI and NVIDIA and only 512 bit bus will be the solution . The ring bus is just an optimization - and NV performs very well without it .
You really need to go read the detailed R5xx architecture descriptions as this is pure ignorance.

When you stop making such outlandish statements and ask sensible questions, perhaps people will take the time to explain the finer points to you.

Jawed
 
Corwin_B said:
But the R300 was FP24, and IIRC SirEric mentioned once that going FP32 would mean a rather large increase in transistor count per pipeline, which ATI felt was not worth for the R300. But for SM3.0, FP32 is required.

My simple question is this :

IF the move from FP24 to FP32 doesn't give much improved graphics to the naked eye , what is the point of adding so much transistors per pipeline .

I have seen a lot of SM2.0 vs SM3.0 screenshots from various games and the although there was a differece it wasn't so notible .

I just think that when dealing with realtime graphics we should settle for less precision in order to get more performance .
 
One reason for moving to greater precision is that the API demands it. The justification for the API demanding it is because the API can handle very long shaders and the longr the shader, with lower precision, the greater the chance for errors to occur.
 
Dave Baumann said:
One reason for moving to greater precision is that the API demands it. The justification for the API demanding it is because the API can handle very long shaders and the longr the shader, with lower precision, the greater the chance for errors to occur.

No doubt about that; but since earlier architectures didn't have split precision for parts of the pipeline but only FP24, it must have been way more reasonable to just increase those parts to FP32 and call it a day. Yes or no?
 
ophirv said:
I have seen a lot of SM2.0 vs SM3.0 screenshots from various games and the although there was a differece it wasn't so notible .

The X3:Reunion Rolling Demo benchmark Anandtech talked about is a good example of comparing shader versions. There is a setting in there for shaders = low, medium and high and this equates to 1.1, 2.0 and 3.0. You should be able to see the difference quite well.

As an aside, when I ran it on 1.1 shaders some tests had worse frame rates than the 2.0 on my 6800GS Sli setup. Is this because it needed to do mutliple passes for some parts ?

I'd also like to say that this is a lovely benchmark to look at, big star ships must be the ultimate boys toys.

I ran the game at 1600x1200 4xAA at one point and I really couldn't see jaggies at all, even when the ships antenna etc were against a bright planet background. I'm still convinced of the need for much more than 4xAA at 1600x1200, at least on my 22inch Mitsubishi NF CRT. I'd still rather trade having the ability to run effects for the ultmiate AA.
 
Last edited by a moderator:
Ailuros said:
No doubt about that; but since earlier architectures didn't have split precision for parts of the pipeline but only FP24, it must have been way more reasonable to just increase those parts to FP32 and call it a day. Yes or no?
It probably was. But then again, this appears to be a design principle within ATI - Xenos, for instance, was developed from the ground up, with virtually no strings attached, and yet they go FP32 only, again - in fact, when I asked about partial precision they referred to it as "junk".
 
Back
Top