Why ATI ditched the R300 architechture ?

Xmas · Jan 28, 2006

Mintmaster said:
Guys, consider this: If ATI made a part with the new memory controller, added FP blending, and used their old pipeline architecture, they'd have a faster performing part for the same die size.

In some cases it would be faster, in some cases it would be slower, and in some cases it wouldn't perform at all.

-"SM3.0 shaders" in just about every current game amounts to FP blending. Not including this feature was far and away ATI's biggest mistake last generation. I think it'll take nearly a year before you see something in games that truly needs PS3.0 or runs notably faster with it. This holds even more so for NVidia. FP24 and SM2.0 has lots of room left to make prettier games, and this is the point that ophirv is getting at.

I'm pretty convinced it will take less than a year. Anyway, what matters is the public perception. And features happen to be a strong selling point.

-The "dispatcher" which keeps getting mentioned was not implemented primarily to improve efficiency, as the cost greatly outweighs the benefit. The main reason for the new dispatcher is for good dynamic branching performance. This is where the majority of the die space was consumed. You can see that NVidia's design packs higher performance per transistor in normal pixel shading scenarios, even versus R580. However, G70 will easily be 1/2 or even 1/10th the speed of ATI when dynamic branching is involved. In these cases, ATI has the performance per transistor advantage.

And that case is becoming more likely with time. Plus, as Subtlesnake says, and I pointed it out earlier, too, scalability is an issue.

Anyways, ophirv, in the end I think you're right. ATI will pay for ditching its traditional architecture. If NVidia can get their AA/AF performance hit and quality up to ATI's level, then they will have a notable performance advantage with the same transistor count and clock speed. 90nm is the only thing saving ATI right now, and both R300/R420 and NV40/G70 have shader designs that make more business sense.

Bringing AA/AF performance up would cost additional transistors again.

I think the R5xx architecture is as sensible as R300 was more than 3 years ago. Game contents change. Expectations of consumers change. And engineers do not get more experience from slaying monsters at high frame rates.

Deathlike2 · Jan 28, 2006

Xmas said:
I think the R5xx architecture is as sensible as R300 was more than 3 years ago. Game contents change. Expectations of consumers change. And engineers do not get more experience from slaying monsters at high frame rates.

That's great. So my level 49 "OpenGL Guy" will not win against a level 72 "David Kirk"? Nuts...

Pressure · Jan 28, 2006

ophirv said:
Before R300 no one thought 256 bit bus MC will be possible .
Why 512 bit bus is not an option today or in the near future ?

The Matrox Parhelia were actually the first to utilize a 256-bit memory bus.

ophirv · Jan 28, 2006

Pressure said:
The Matrox Parhelia were actually the first to utilize a 256-bit memory bus.

You are right , forgot about that .

Anyway all being said I agree that the R580 is still a great chipset and we haven't seen even half of it's true potential .

Geo · Jan 28, 2006

Pressure said:
The Matrox Parhelia were actually the first to utilize a 256-bit memory bus.

Unfortunately, the B3D chip table does not include die size info for Parhelia.

NocturnDragon · Jan 28, 2006

geo said:
Unfortunately, the B3D chip table does not include die size info for Parhelia.

12 millimeters in width and 15 millimeters in length.
The size of the core is 180 square millimeters.

source: http://forums.murc.ws/showthread.php?t=55524

neliz · Jan 28, 2006

inefficient said:
Interesting how the die size of the R580 at 90nm is roughly the same as the G70 at 110nm.

Why is it interesting? How about mathematical?

neliz · Jan 28, 2006

ERK said:
Another potential path along the lines of the 512b memory bus is single board SLI/XF. I'm not sure which would cost more--probably about the same.
EDIT: for a given amount of processing power.

Look Geo, There is another!

Mintmaster · Jan 28, 2006

Xmas said:
In some cases it would be faster, in some cases it would be slower, and in some cases it wouldn't perform at all.

Even in shadermark R520 only has around a 25% advantage over R420. There are very few situations where a theoretical 32-pipe R420 would lose if it had the new memory controller.

I'm pretty convinced it will take less than a year. Anyway, what matters is the public perception. And features happen to be a strong selling point.
...
Plus, as Subtlesnake says, and I pointed it out earlier, too, scalability is an issue.

You're taking my suggestion too literally. I don't mean completely unaltered R300 pipes. It shouldn't take that many transistors to get functional PS3.0 with poor dynamic branching. Scaling things in different ways is not an enormous task, as adding another shading stage (like a third shading unit per pipe) is possible without this radical pipeline change. The big die consumer is that ATI's pixels shaders change state much more often. They have to duplicate a lot of resources to make the batch size so small, and juggle different sections of code for each batch.

This is what I'm talking about:

sireric said:
There are two reasonable ways to deal with that: You can either have large batch sizes of pixels, in which case you hide the latency of fetches, more or less, just by doing the same thing over and over on many pixels before going to the next thing. This would be an architecture that, says, executes the same pixel shader instruction on 1000's of pixels. This works well to hide latency, and is somewhat cheap, area wise. However, it suffers granularity loss, since it has to work in large batches. This would make for a good SM2 type part. The new way, is to make small batches, but have lots of them. So you execute one instruction on a small batch (say 16 pixels), then switch to another instruction and batch until the data for the first one returns. You need to have lots of live threads in this type of architecture, and you need lots of resources (i.e. area) for it to properly hide latency. But, its advantage is that it rules from a granularity standpoint and branching (prime feature of SM3) works perfectly. That's what we did for the R5xx. I believe that the first architecture is more popular for others.

Disregarding software semantics, I don't think PS3.0 without good branching is particularly useful for improving graphics, and ATI could have outright proved this last generation if they included FP blending. However, if you want the checkmark feature, then it's not too costly. The big cost is going from poor branching performance to good branching performance. That's where you need to thoroughly reorganize the rendering pipeline.

Bringing AA/AF performance up would cost additional transistors again.

True, but I doubt it would be that much. Like I said before, the "huge" memory controller is only 8% of the die. It's more a question of whether NVidia researchers can find the same magic formula that ATI did, not one of die space.

I think the R5xx architecture is as sensible as R300 was more than 3 years ago. Game contents change. Expectations of consumers change. And engineers do not get more experience from slaying monsters at high frame rates.

Well given how R300 was a whopping improvement over existing GPUs in every way (not just PS2.0), I disagree. Nonetheless, I'm personally glad ATI did what they did with R5xx. The architecture allows new things to be done, and will advance realtime graphics and GPGPU type work as well. However, when looking at things strictly from a business point of view (i.e. performance per dollar), ATI made a sub-optimal decision. I'm 100% convinced that it will hurt their bottom line.

Dave Baumann · Jan 28, 2006

Mintmaster said:
Even in shadermark R520 only has around a 25% advantage over R420. There are very few situations where a theoretical 32-pipe R420 would lose if it had the new memory controller.

Shadermark is precisely the situation that masks the differences between the two. Fundamentally, though, the dispatcher needed to be changed because R300's was fairly fixed function with 4 levels of depenancy, which would have been no use for SM3.0.

Mintmaster · Jan 29, 2006

Dave Baumann said:
Shadermark is precisely the situation that masks the differences between the two. Fundamentally, though, the dispatcher needed to be changed because R300's was fairly fixed function with 4 levels of depenancy, which would have been no use for SM3.0.

I understand that, but resolving this issue does not require such a truly fundamental reworking of the pipeline. The point is that there really is only one reason for R520's huge transistor to performance/clock ratio: dynamic branching performance. All the other improvements of R5xx over R4xx - FP blending, PS3.0 support, fast FSAA, HQ AF, AVIVO - do not require small batches, and thus do not require gobs of die space.

Look at the quote from sireric. He is outlining this point very clearly.

Xmas · Jan 29, 2006

Mintmaster said:
Well given how R300 was a whopping improvement over existing GPUs in every way (not just PS2.0), I disagree. Nonetheless, I'm personally glad ATI did what they did with R5xx. The architecture allows new things to be done, and will advance realtime graphics and GPGPU type work as well. However, when looking at things strictly from a business point of view (i.e. performance per dollar), ATI made a sub-optimal decision. I'm 100% convinced that it will hurt their bottom line.

R300 did shine because NVidia didn't deliver and its predecessor wasn't great either. Now R520 faced G70 which was executed far better than NV30 and was a refresh of the pioneering PS3.0 chip, NV40. That gave NVidia the technology leader reputation. If ATI had just matched this with a cheap SM3.0-implementation on top of their old architecture (btw, I'm not that sure it would have been cheap),
I am rather convinced ATI would have more trouble selling it than they have with their current lineup.

ophirv · Jan 29, 2006

I guess dynamic branching is one of the features that makes the biggest difference between R580 and G70.

http://www.xbitlabs.com/images/video/radeon-x1900xtx-graphs/xbitmark.gif

Mintmaster · Jan 29, 2006

R300 shone not only versus NV30, but was a huge improvement over the previous gen. It didn't even have twice the transistors of R200, but had like 3 times the practical shading power, in FP24 to boot, amazing AA, better AF, and the list goes on.

R520 had double the transistors of R420 and was maybe 30% faster overall.

You may think a cheap PS3.0 wouldn't sell well, but remember that it would perform much faster. That sells a lot more than claiming you have the most advanced design. Look at NV15 vs R100, or NV25 vs. R200. Performance sells way more than technology for the mass market.

Anyway, R580 improved the performance per transistor by a lot, so they won't be that far behind NVidia in this respect anymore. ATI's strides in memory efficiency may negate NVidia's advantage in shader/texture performance per transistor per clock. Still, I think ATI's low end will be hit the hardest once NVidia moves to 90nm, because I believe they'll be able to undercut ATI in price. AA is less important of a factor there, too.

We'll see what happens.

AlphaWolf · Jan 29, 2006

ophirv said:
I guess dynamic branching is one of the features that makes the biggest difference between R580 and G70.

http://www.xbitlabs.com/images/video/radeon-x1000/x1800/Xbitmark_x18.gif

You might want to link to the r580 as it out performs the r520 in dynamic branching by a significant margin.

SugarCoat · Jan 29, 2006

Mintmaster said:
R300 shone not only versus NV30, but was a huge improvement over the previous gen. It didn't even have twice the transistors of R200, but had like 3 times the practical shading power, in FP24 to boot, amazing AA, better AF, and the list goes on.

R520 had double the transistors of R420 and was maybe 30% faster overall.

Click to expand...

You may think a cheap PS3.0 wouldn't sell well, but remember that it would perform much faster. That sells a lot more than claiming you have the most advanced design. Look at NV15 vs R100, or NV25 vs. R200. Performance sells way more than technology for the mass market.

Anyway, R580 improved the performance per transistor by a lot, so they won't be that far behind NVidia in this respect anymore. ATI's strides in memory efficiency may negate NVidia's advantage in shader/texture performance per transistor per clock. Still, I think ATI's low end will be hit the hardest once NVidia moves to 90nm, because I believe they'll be able to undercut ATI in price. AA is less important of a factor there, too.

We'll see what happens.

yes and almost 2/3 of it was used getting the chip up to date with SM3.0 and the memory controller which in itself was a long term investment more then anything.

people scold for using the term pipeline these days, transistor count should be no different. I dont really understand what part of you looks at a transistor number and says "well the performance should be far better". You also understand that as technology progresses into the land of Unifed architecture and beyond we're going to see more and more limitations by not only the CPU but the software itself. Developers will have to change things quite dramatically to make full theoretical performance come to light in some of these new cards as well as those coming. This is something i dont see happening for years, so i'm sure we'll see a whole host of new ways to improve current performance from ATI and Nvidia alike, for example the programmable memory controller. I think this is something Nvidia saw happening as well years ago prompting them to continue to say they support a more ballanced highly programmable architecture rather then a robust unified one.

neliz · Jan 29, 2006

Mintmaster said:
Still, I think ATI's low end will be hit the hardest once NVidia moves to 90nm, because I believe they'll be able to undercut ATI in price. AA is less important of a factor there, too.

We'll see what happens.

I wouldn't bet on that if I were you, the $99 7300GS underperformed against the $79 1300LE
http://www.hkepc.com/hwdb/gf7300gs-1.htm

L'inq even describes the 7300 as being "big and lazy"

You might actually wonder how well nV is doing with the the G7x as the low end part was released 7 months after the high end part...
The only thing going for the 7300 is one 50% more vertex shaders than it's direct competitor.

Hubert · Jan 29, 2006

dizietsma said:
The X3:Reunion Rolling Demo benchmark Anandtech talked about is a good example of comparing shader versions. There is a setting in there for shaders = low, medium and high and this equates to 1.1, 2.0 and 3.0. You should be able to see the difference quite well.
.

I was playing the game on a X800 Xl (SM 2.0b) and now on a X1800 XT (SM 3.0) Now, I might be not that perceptive but I would swear that in those effects there is no difference at all. - I do not agree with the thread starter in most things, just wanted to state that your example might not be the best one -

PS. I wonder if SM 2.0b and SM 3.0 are not both included for the High setting. That would make SM 2.0a the Medium and SM 1.1 the Low one.

neliz · Jan 29, 2006

Hubert said:
I was playing the game on a X800 Xl (SM 2.0b) and now on a X1800 XT (SM 3.0) Now, I might be not that perceptive but I would swear that in those effects there is no difference at all. - I do not agree with the thread starter in most things, just wanted to state that your example might not be the best one -

X2 was about the shadows, what about X3? X2 was heavily optimized for nV40's architecture, the shadowing etc.

Pete · Jan 29, 2006

Warning: don't stare directly at the ignorance or risk losing brain cells.

Mintmaster said:
R300 shone not only versus NV30, but was a huge improvement over the previous gen. It didn't even have twice the transistors of R200, but had like 3 times the practical shading power, in FP24 to boot, amazing AA, better AF, and the list goes on.

R520 had double the transistors of R420 and was maybe 30% faster overall.

Well, looking at Eric's quote re: dynamic branching, it's also directly responsible for R5x0's different scaling characteristics vis a vis R300. R420 was equally efficient as R300 b/c it was basically R300 * 2. R580 or even RV530 would seem to be a fairer performance/transistor point of comparison to previous architectures b/c they're more fully representative of ATI's goal for the R5x0 architecture.

R580 shines over R300/R420 with three times the (theoretical?*) shading power, in FP32 to boot, and better AF.

Pretty close, ignoring direct transistor comparisons. Considering RAM speed didn't scale equivalently (R200-R300 doubled raw bandwidth, but R420->R580 only gained 50%), you can see why they'd compensate by spending extra transistors on improving bandwidth efficiency. Could they have just gone with R300 + FP blending and a 512-bit external bus? Even I'm going to have to admit ignorance here. (Maybe NV's actions with a 32-pipe G71 is a clue?)

(Going back to orphiv's initial point: if, as you say, FP blending is the crux of "SM3" and its attendant IQ improvements, then what's the appeal of keeping R300 around at the low end with constant rehashes? Didn't we hate on GF2MX and especially GF4MX for the same reason, sacrificing a distinguishing next-gen feature (shaders, in that case) for price/performance?)

You may think a cheap PS3.0 wouldn't sell well, but remember that it would perform much faster. That sells a lot more than claiming you have the most advanced design. Look at NV15 vs R100, or NV25 vs. R200. Performance sells way more than technology for the mass market.

(Oh, right, this is the appeal.

)

I can certainly see your and orphiv's point in the low end and maybe mainstream, especially if it nets us a stable platform (and therefore maximal dev familiarity -> efficiency -> contempt) for what amounts to a console lifetime (just lower the resolution for the older/cheaper parts). But that's essentially what we got from R300 to R480, no? If performance is key, then would mainstream SM2 + FP blending = HDR parts be fast enough in, say, Far Cry to warrant excitement? Is HDR more attractive than AA+AF as an IQ enhancer or as a marketing bullet?

From a (business) perspective wider than manufacturing or man-hours, would merely adding FP blending (and, incidentally, FP32) be worth the effort? How marketable is that, considering ATI's rep of succeeding more by following than innovating? If ATI's going to jump, why not jump further? It seems they timed their jump (much) further out than NV b/c R300 bought them that time. It seems that NV jumped sooner w/NV40 b/c NV30 didn't afford them the luxury of time. Serendipitously, NV40 landed on a featureset basically as ideal and timely as R300--heck, basically R300 + FP blending a year later, no? But does ATI gain from quickly following with an NV40 clone of their own? Or does it risk time to gain potentially more (mind share, marketing impact, etc., and therefore loyalty & sales) by jumping later but further, with more innovations of their own? And does this help them better time their next jump to DXNext?

You can tell I'm a Nintendo baby from all this talk of jumping. You can tell I'm a GPU baby from all this talk. Don't be too harsh, please.

* Compare X850 to X1900 in those two graphs.

Why ATI ditched the R300 architechture ?

Xmas

Porous

Deathlike2

Pressure

ophirv

Geo

Mostly Harmless

NocturnDragon

neliz

GIGABYTE Man

neliz

GIGABYTE Man

Mintmaster

Dave Baumann

Gamerscore Wh...

Mintmaster

Xmas

Porous

ophirv

Mintmaster

AlphaWolf

Specious Misanthrope

SugarCoat

neliz

GIGABYTE Man

Hubert

neliz

GIGABYTE Man

Pete

Moderate Nuisance

Similar threads