Why ATI ditched the R300 architechture ?

Jawed · Jan 27, 2006

I wonder how long FM spent working out which parts of the long shaders in 3DMk06 could afford to be PP.

Since PP isn't applied to all instructions by default, but sporadically - presumably because of the IQ impact - they must have invested that effort to improve NVidia performance.

Jawed

Neeyik · Jan 27, 2006

Which other IHV supports pp - S3 or XGI (or both)?

tEd · Jan 27, 2006

Neeyik said:
Which other IHV supports pp - S3 or XGI (or both)?

Some of the s3 cards i think. The latest parts though are single precision

Geo · Jan 27, 2006

The premise of the thread strikes me as techno-nostalgia run amuck.

inefficient · Jan 27, 2006

http://pc.watch.impress.co.jp/docs/2006/0127/kaigai01l.gif

Nice graph showing die size increases and transistor counts.

Interesting how the die size of the R580 at 90nm is roughly the same as the G70 at 110nm.

Pete · Jan 27, 2006

ophirv, two things.

1. I think geo's pithy post most succinctly summarizes your questions.
2. I think Dave's R580 interview might help, in that it has an ATI engineer and devrel guy basically say what we've been saying. Especially note the discussion concerning bandwidth constraints, p.3: "Certainly having more memory BW would certainly help to hit the next bottleneck, but memory technology doesn't evolve at the same rate as our core gfx does."

If you think NV performs "very well" without ATI's memory controller optimizations, then you haven't seen enough AA benchmarks. If you think settling for less precision is acceptable for better performance, that argument won't fly at the high end why spending transistors on more precision and other things nets you improvements like these. 3dfx also tried to argue that less precision was an acceptable tradeoff for higher speed. I think the diverging paths of 3dfx and NV decided that pretty firmly.

And I think we've answered your concerns pretty comprehensively. Maybe read some of B3D's past reviews of new architectures (R580, R520, G70, NV40, NV30, R300) to see what spending transistors on features and IQ instead of just more old tech gets us.

Honestly, the best answer is how the GF6 generation did against the RX generation. NV sold more cards because they matched ATI in performance but spent more transistors on more features and greater precision.

And now I'll bring in the car analogy, for the sake of completion. There's a reason car makers advertise more than just extra horsepower (safety, comfort, style), and that's because those added features help sell as much as or better than added basics.

Deathlike2 · Jan 27, 2006

ophirv said:
Before R300 no one thought 256 bit bus MC will be possible .
Why 512 bit bus is not an option today or in the near future ?

Chalnoth said:
Regardless, though, there is no solution to the memory bandwidth problem. Moving to a 512-bit bus will make graphics hardware much more expensive to manufacture, and I still doubt it will ever happen. We're doing pretty well with memory bandwidth as it stands, and with the hardware that is now available, games will be able to move towards more math-heavy shaders, which will reduce the relative requirements of memory bandwidth with respect to fillrate.

Chalnoth's post succinctly answers that. Until a 512-bit memory bus will not be a cost killer, it's not going to happen yet. I think ATI took a gamble because I believe they decided to spend money (and get a smaller profit) in order to really differentiate itself from the NV30.. but since we know that the NV30 sucked the way it did, it seemed very much a bold move at the time (and become a huge profit). I still have yet to figure out if the 1GHz of DDR2(? not sure) memory on a 128-bit memory bus was doing well (I believe it was an inefficient memory controller at the time, but please feel free to correct me if I'm wrong)

Pete · Jan 27, 2006

Seems inefficient. 6600 doubles the 5800's quads but keeps the same memory bus width, ROP count, and clocks, yet manages to be much faster with AA. They definitely tweaked something in between, b/c I don't believe 8x1 is necessarily superior to 4x2 in UT.

Hellbinder · Jan 27, 2006

If you think NV performs "very well" without ATI's memory controller optimizations, then you haven't seen enough AA benchmarks. If you think settling for less precision is acceptable for better performance, that argument won't fly at the high end why spending transistors on more precision and other things nets you improvements like these. 3dfx also tried to argue that less precision was an acceptable tradeoff for higher speed. I think the diverging paths of 3dfx and NV decided that pretty firmly.

One could argue that ATi chose "less percision" for its entire DX9 line of cards until this generation.

Geo · Jan 27, 2006

R580 is 50% bigger physically than R300. We're getting to a place where it might be technically doable at the tippy top for 512-bit. Go back and review the "we're not going to see 256-bit" threads pre-Parhelia --it is all the same old arguments.

The newest one of late is that even if we've temporarily come to a spot where the die might be big enough, we surely won't stay there as process technology improves, etc, etc, and you wouldn't want to make that kind of move and have to give it back later when your die is too small to support it..

Well, maybe. Certainly serious people I respect have put that forward. But when you take the 10,000 foot view at what's actually happening with die size the last couple years versus the hand-wringing that this can't keep up, you get a different view. And the move up in top-end price point has to come into the conversation as well, in my opinion, for whether it's practical. Maybe what wasn't practical at $399 looks a whole lot more reasonable at $650.

Xmas · Jan 27, 2006

ophirv said:
I figured this out by looking at R300 and R420 which had 110M transistors and 160M transistors respectively . The difference between them is 8 TMUs and 50M transistors .

So if 50M transistors equals 8 TMUs so by adding 200M transistors to the R420 we would have a much stronger card then the R580 with 48 TMUs , ALUs and ROPs .

The whole is greater than the sum of its parts. You cannot just add some more units of one kind and expect the performance to increase. You also have to provide all the necessary means to control and feed those units. And the complexity of some of those things is quadratic, not linear.

And I can't see anyone prefer a GPU that would only be slightly faster because it's totally bandwidth starved and much less capable over R580 which is fast enough for all older games and a far superior architecture for newer games.

ophirv · Jan 27, 2006

Thanks for answering some of my concerns about R580 .

Although I have red dave's R580 review + interview I certainly have some more reading to do !

Moloch · Jan 27, 2006

Hellbinder said:
One could argue that ATi chose "less percision" for its entire DX9 line of cards until this generation.

...And it's the card lifetime there was never a bit of difference in quality.

Deathlike2 · Jan 27, 2006

geo said:
R580 is 50% bigger physically than R300. We're getting to a place where it might be technically doable at the tippy top for 512-bit. Go back and review the "we're not going to see 256-bit" threads pre-Parhelia --it is all the same old arguments.

Well... the Parhelia was the first to have the 256-bit memory bus for consumer use... sadly it was lacking memory compression techniques to be even remotely competitive.

Well, maybe. Certainly serious people I respect have put that forward. But when you take the 10,000 foot view at what's actually happening with die size the last couple years versus the hand-wringing that this can't keep up, you get a different view. And the move up in top-end price point has to come into the conversation as well, in my opinion, for whether it's practical. Maybe what wasn't practical at $399 looks a whole lot more reasonable at $650.

Agreed.. if it were possible in decent quantities, not like the X800 PE or the GTX512, sure... I can certainly see people going for such a product...

The_Wolf_Who_Cried_Boy · Jan 28, 2006

geo said:
R580 is 50% bigger physically than R300. We're getting to a place where it might be technically doable at the tippy top for 512-bit. Go back and review the "we're not going to see 256-bit" threads pre-Parhelia --it is all the same old arguments.

The newest one of late is that even if we've temporarily come to a spot where the die might be big enough, we surely won't stay there as process technology improves, etc, etc, and you wouldn't want to make that kind of move and have to give it back later when your die is too small to support it..

Well, maybe. Certainly serious people I respect have put that forward. But when you take the 10,000 foot view at what's actually happening with die size the last couple years versus the hand-wringing that this can't keep up, you get a different view. And the move up in top-end price point has to come into the conversation as well, in my opinion, for whether it's practical. Maybe what wasn't practical at $399 looks a whole lot more reasonable at $650.

Wouldn't it be relatively easier to impliment quad or even octuple data transfers per clock to acheive your bandwidth goals rather than double the bus width again?

R520/580 is already at ~1500 pin outs, adding another 4-500, keeping it coherant while still raising clocks for future growth on an already crowded PCB seems a daunting task.

By default a 512-bit bus would also need 16 VRAM chips, would a top end board with a GB of memory still sell for US$649?

I more wonder at what point wafer bonding the whole frame buffer to the GPU in a multichip module becomes the most practial means of growth, it would certainly allow more freedom for both faster and wider busses.

ERK · Jan 28, 2006

Another potential path along the lines of the 512b memory bus is single board SLI/XF. I'm not sure which would cost more--probably about the same.
EDIT: for a given amount of processing power.

Mintmaster · Jan 28, 2006

ophirv, despite some of the seemingly demeaning comments in this thread, you brought up some very good points. While some of the other members brought up valid counter-arguments, they do not make up for the huge difference in performance per transistor.

Guys, consider this: If ATI made a part with the new memory controller, added FP blending, and used their old pipeline architecture, they'd have a faster performing part for the same die size.

All the other points in this thread are pretty moot.

-Regarding the memory controller:

Chalnoth said:
Well, if you look at pictures of the R520 die, you'll notice that a huge portion is taken up by the new memory controller. This new memory controller may well be one reason why the performance per clock per transistor in no AA/AF scenarios seems rather low. But, the memory controller does have its benefits. In particular, I believe it is reponsible for the very small performance hit from enabling AA that the R5xx architecture enjoys.

We've been through this before. The memory controller is only 8% of the die. That's not a huge portion, and it could (AFAIK) easily have been used in a design that had R300 style pixel shaders.

-FP32 is not as expensive as you guys are making it out to be. ATI added 32 FP32 math units to R520 with only 60M transistors. True, maybe they culled out unneeded areas or tweaked some parts to be smaller, but clearly this shows that it's a small part of the transistor jump between R420 and R520.

-"SM3.0 shaders" in just about every current game amounts to FP blending. Not including this feature was far and away ATI's biggest mistake last generation. I think it'll take nearly a year before you see something in games that truly needs PS3.0 or runs notably faster with it. This holds even more so for NVidia. FP24 and SM2.0 has lots of room left to make prettier games, and this is the point that ophirv is getting at.

-The "dispatcher" which keeps getting mentioned was not implemented primarily to improve efficiency, as the cost greatly outweighs the benefit. The main reason for the new dispatcher is for good dynamic branching performance. This is where the majority of the die space was consumed. You can see that NVidia's design packs higher performance per transistor in normal pixel shading scenarios, even versus R580. However, G70 will easily be 1/2 or even 1/10th the speed of ATI when dynamic branching is involved. In these cases, ATI has the performance per transistor advantage.

This last point pretty much sums it up. If ATI wasn't going for good dynamic branching performance, they'd come up with a much more compact design.

Anyways, ophirv, in the end I think you're right. ATI will pay for ditching its traditional architecture. If NVidia can get their AA/AF performance hit and quality up to ATI's level, then they will have a notable performance advantage with the same transistor count and clock speed. 90nm is the only thing saving ATI right now, and both R300/R420 and NV40/G70 have shader designs that make more business sense.

3dcgi · Jan 28, 2006

Jawed said:
You really need to go read the detailed R5xx architecture descriptions as this is pure ignorance.

When you stop making such outlandish statements and ask sensible questions, perhaps people will take the time to explain the finer points to you.

Jawed

I don't see what was so outlandish about ophirv's statement. It's valid to think of the ring bus as an optimization as a memory system can be implemented without a ring bus. There's no need to be harsh on a newcomer who's trying to learn.

Pete · Jan 28, 2006

Hellbinder said:
One could argue that ATi chose "less percision" for its entire DX9 line of cards until this generation.

I dunno. Relative to this gen, maybe, b/c I'm not sure we saw any major IQ or performance limitations b/w R420 and NV40 when it came to shaders. I was thinking more of Voodoo overplaying 16-bit.

Mint, far be it for ignorant me to say you make some good points, but you do. All in all, tho, while sticking to R300 would have made sense in many respects (including forcing devs to aim for FP24 and accomodate a more stable platform than NV's), surely GF6's sales were spurred by more than performance. Hype helped, particularly the lure of SM3 and all its attendant bullet points. ATI probably has to consider that as well when designing new parts, no? I mean, eventually they sent a 256-bit X800 to compete with a 128-bit 6600GT.

Mint, does the dispatcher's attendant improved dynamic branching lead us into Xenos and R600 territory? Again, is this a forward-looking architecture, not ideal but a compromise toward ATI's true intentions, their next big thrust?

Perhaps I'm making excuses for them, especially considering how amazingly NV turned things around from NV30 to NV40. But R520 and R580 seem pretty competitive vs. G70. I guess we wait till G71 to see whose eggs were better placed.

Subtlesnake · Jan 28, 2006

In the recent R580 interview, Eric said that "The new dispatcher was required to allow for a linearly scalable ALU architecture", so surely some modification was needed when going from the R4xx generation to the R580.

http://www.beyond3d.com/reviews/ati/r580/int/index.php?p=04

Why ATI ditched the R300 architechture ?

Jawed

Neeyik

Homo ergaster

tEd

Casual Member

Geo

Mostly Harmless

inefficient

Pete

Moderate Nuisance

Deathlike2

Pete

Moderate Nuisance

Hellbinder

Geo

Mostly Harmless

Xmas

Porous

ophirv

Moloch

God of Wicked Games

Deathlike2

The_Wolf_Who_Cried_Boy

ERK

Mintmaster

3dcgi

Pete

Moderate Nuisance

Subtlesnake

Similar threads