AMD: R7xx Speculation

Status
Not open for further replies.
Multi-die doesn't need to give better performance than a single die, it just needs to be competitive performance with a competitive price.

Of course a company can always reduce the price of a multi-die product to make it competitive in price/performance ratio if it is not the top performing card on the market, but that eats into profit margins.

Really? So NVIDIA can choose to throw away all the work they did on a large die in favor of two smaller dies? All the R&D work for the large die is irrelevant? No.

-FUDie

Obviously they don't throw it away. If their multi-die approach ends up being the better performer than the single die approach, then they can place the multi-die product as their ultra high end card and the single die product as a lower end card. If the single die approach ends up being the better performer than the multi-die approach, then the single die product is the ultra high end card, while the multi-die approach doesn't make it to market.
 
Yet another reason why no one should believe any technical rumors Fuad posts unless you've seen it elsewhere first, or it's just plain common sense. He may have a source or two which gives him legitimate AMD business info, but his tech info is b.s. more often than not.

I mean, I knew HD 4870 wasn't going to feature an external 512-bit memory interface.

1) HD 4850 and 4870 use the same GPU - why would any GPU manufacturer disable half of the memory controller/channels on a native 512-bit part?

2) How the #^% did ATi manage to fit a 512-bit MC in a 256mm^2 part on 55nm?

Common. $%&*ing. Sense.

I'm pretty skeptical of the 256mm die size. Of course, it was one of those unsubstantiated rumors, but RV770 is supposedly "800M+" trans, while GT200 is ~1B.

So a 576mm die size versus 256mm doesn't stack up at all..even allowing for the process discrepancy. You're talking about over twice as big for 20% more trans.

The RV770 die is likely a lot bigger than known, and if it isn't, if RV770 gets anywhere close to GT200 then NVIDIA engineers should be ashamed..
 
Of course a company can always reduce the price of a multi-die product to make it competitive in price/performance ratio if it is not the top performing card on the market, but that eats into profit margins.
It doesn't have to. Smaller parts are cheaper to manufacture and, generally, have higher yields.

My point was that your claim that the multi-chip board having to be faster than a single chip board is wrong: The multi-chip board just needs to be competitive. Competitive does not mean slower, it means competitive: It could be faster, it could be slower, it could be exactly the same speed.
Obviously they don't throw it away. If their multi-die approach ends up being the better performer than the single die approach, then they can place the multi-die product as their ultra high end card and the single die product as a lower end card. If the single die approach ends up being the better performer than the multi-die approach, then the single die product is the ultra high end card, while the multi-die approach doesn't make it to market.
Except that you may orphan the single, large-die product by having a multi-chip solution of a smaller part with higher performance. And that would hurt sales/margins on the large-die product.

-FUDie
 
This came from AMD's Rick Bergman. It really got me wondering if AMD was just spreading FUD because they can't get to NVIDIA (could this be similar to Intel's raytracing hype?), or if they really see the future so differently from NVIDIA.

If memory serves, though, it was nVidia, not ATi, which resurrected the concept of SLI in the first place. As well, nVidia has not shirked from producing and marketing its own "2 gpus on a stick" products, which themselves may SLI. nVidia has also hopped aboard the "hybrid" graphics train and the 3x SLI train, etc. So, I mean, the idea that nVidia sees the future much differently from ATi doesn't seem terribly plausible to me for a number of reasons.

Obviously, SLI'ing purported "monster gpus" isn't going to be practical or affordable. I'm not aware at all that nVidia is forswearing and abandoning SLI in all its current concepts--rather, the opposite seems true.

My own thinking got me this far:

The most often heard argument for smaller chips (or against large monolithic chips) would be the higher yields it would give, thus saving cost.

Very possibly *dramatically* improved yields, no doubt. That would seem to be the compelling argument for it, and also explain why both companies are pursuing it so avidly.

There are a lot of arguments against that though, the most important being all the extra costs associated with a multi-chip solution, such as the double memory, extra packaging, more complex PCB, and of course the die-size for double logic (eg both chips have PCI-E communication). I can hardly imagine all these cost weigh up to the gain in yields a smaller chip would give.

Really? Then why do you think *both companies* are pursuing that strategy? If you look at gpu development from the beginning it's easy to see that having ram onboard is more costly that not having it, or much of it (eg. comparing Intel's agp-dependent i7xx series to the dominant local bus cards by nVidia and 3dfx at the time, which ran rings around Intel's cheaper approach. In the end the market overwhelmingly opted to go the more expensive route because the performance advantages were tangible and obvious.)

Simply put, the gpu makers have ample expertise in local-bus configuration for their products, and over the years the prices for the components needed for it have dramatically reduced in cost while dramatically ramping upwards in efficiency and performance. Indeed, the ram market alone is a prime example of the efficiencies that have developed over the last decade because of greatly increased demand. There was a time not too long ago when few of us could imagine a 3d-card boasting 512mbs of ram that didn't cost thousands of dollars, etc...;) There were actually a few of those produced, as I recall--though none of them sold very well.

The point is that you can trust the markets to move into the most efficient planes of operation they can manage. If there's something nVidia has done or said to date that would make you think its strategy is fundamentally different from ATi's in this regard I'd like to hear about it...;) I think that you are making assumptions which have more to do with nVidia's trailing ATi on the process front than they do with fundamental strategies.

I also read a post from Arun a few days ago saying die size wasn't any problem "unless your design team is composed primarily of drunken monkeys". From what I've read about yields and newer smaller processes the yields are more and more depending on design faults. Still, isn't that an argument against large chips, which are inherently more complex?

Exactly, which is why both companies are moving towards smaller chips used in tandem than towards monster chips used in the singular.

The second most heard argument for AMDs new multi-chip appraoch might well be that AMD will somehow connect the two chips together, so the two GPUs could essentialy work as one, also saving the costs and problems with the double memory. I wonder how this could work though. Connecting the ringbus from both chips is what some people say, but I doubt they could make any chip big enough to fit two 512bit connections (R700 still has a 512bit internal ring bus right?)

I think the important part is to understand that nVidia is in no way eschewing a similar strategy. IMO, again, nVidia is a bit behind on the process curve, and also nVidia's imperatives are not a 1:1 match for ATi's, either. But this in no way suggests to me that the two companies are fundamentally different in their approach. Different in execution, yes, but not in basic direction.

Also, if AMD wants to make the two chips work with each other like that, why not just put two chips in the same package?

I'm wondering if nVidia isn't already doing that...;) Heh...;) AMD's strategy though, as far as that goes, is to have two "real" cores, and so on. It's more difficult to do in the short run, but in the long run stands to provide more benefits. That explains *why* AMD is taking that route.

And to add to that, why is chip size becoming such a problem all of a sudden? Yes, GPUs are getting bigger and bigger in the race for top performance, but it's only a gradual increase. I would think power would become a concern long before chip size would be the limiting factor.

In this regard, current leakage, thermal characteristics, and *yields* become inseparable and indistinguishable. This is the entire concept behind multicore cpus vs "monster" single-core cpus, etc.

If so, yields and design complexity would seem to be the only advantages to the whole multi-chip story, which leads me to the question why, if multi-gpu is the future, it is not also the past?

That's easy...;) It wasn't possible to manufacture...;) Engineering ideas and concepts have always been held hostage to manufacturing limitations.


Or is multi-chip just an excuse for AMD to get at least a bit closer to NVIDIA in its post-G80 rampage?

Again, I see no sign at all that nVidia is planning to dump SLI as a viable commercial concept, and so I don't see that nVidia is any more "ahead" of ATi such that ATi needs to "draw closer." I mean, ATi's goal is not to close a gap but to create a gap between itself and its competition.
 
WaltC said:
Very possibly *dramatically* improved yields, no doubt. That would seem to be the compelling argument for it, and also explain why both companies are pursuing it so avidly.
Is there any reason to believe the % yield would increase?


The problem I see with multichip solutions going forward is one of efficiency/granularity. The optimal ALU:TEX:ROP:Bandwidth ratio does not really stay constant across all resolutions/consumer levels. So you are faced with designing your base chip to be a model of efficiency (transistor budget wise) at the low end and have it scale inefficiently to the high end or vice versa. With a dual chip solution the effect probably won't be too pronounced, but anymore than that and you would be faced with having a suboptimal offering somewhere in the product line.
 
The big thing with 4870X2 is whether AMD truly has some major improvement in getting two chips to work together up it's sleeve as speculated (I lean towards doubting that they do).

If not, the multi-chip solution is probably a loser.
 
Very possibly *dramatically* improved yields, no doubt.
This has been discussed before: not everybody buys the improved yields argument.

If you have a single die with, say, 8 clusters and 6 memory controllers for a final product witih 6 shader clusters and 5 memory controller, your yield is determined by the y(6 out of 8)*y(5 out of 6)*y(all the rest). The first two factors can easily go into the nineties. The final factor depends on the remaining area.

Let's now assume the case with, idealistically, the same performance: 2 dies with 3 out of 4 shader clusters and 2.5 out of 3 MC's.
Obviously, that's impossible, which illustrates the first problem:less flexiblity in terms of configurations.
Second, the yield of 3 out of 4 clusters will be quite a bit worse than that of 6 out of 8 and y(2 out of 2)**2 will most likely (and y(3 out of 3)**2 definitely) be worse than y(5 out of 6)
Third, there's the 'all the rest' part which you now pay double, whether you use it or not.

So, no, I don't buy the yield argument at all.

Edit: even if the smaller die of the dual die solution has a lower yield, that doesn't necessarily mean the overall solution is more expensive. After all, in case a die is total loss, the amount of area you have to pay for is also less. The crossover point is something that could easily be modeled in a spreadsheet...
 
Last edited by a moderator:
I'm pretty skeptical of the 256mm die size. Of course, it was one of those unsubstantiated rumors, but RV770 is supposedly "800M+" trans, while GT200 is ~1B.

So a 576mm die size versus 256mm doesn't stack up at all..even allowing for the process discrepancy. You're talking about over twice as big for 20% more trans.

The RV770 die is likely a lot bigger than known, and if it isn't, if RV770 gets anywhere close to GT200 then NVIDIA engineers should be ashamed..

RV770 should be south of 900m for the 256mm2 die.
G200 should be between 1.3 and 1.4b for the 576mm2 die.
Yes, those numbers do add up.

Remember, G200 has roughly twice the specs of RV770, 32ROPs, 40/80 TMU/TFUs, 512bit bus, obviously not all the specs but enough to make a difference.
 
RV770 should be south of 900m for the 256mm2 die.
G200 should be between 1.3 and 1.4b for the 576mm2 die.
Yes, those numbers do add up.

Remember, G200 has roughly twice the specs of RV770, 32ROPs, 40/80 TMU/TFUs, 512bit bus, obviously not all the specs but enough to make a difference.

All the rumors say GT200 is 1 b, not 1.4.

If we're going to be consistent about rumors, we should be consistent.

I mean maybe it is 1.4, and that would explain most of the discrepancy, but all the rumors have said 1. Only a few B3ders have decided 1.4. Which it may be, but that's not really a rumor..

The other side is the presumed 256 for RV770 has no basis either. So it could be much larger too, and that's what I originally said.
 
All the rumors say GT200 is 1 b, not 1.4.

If we're going to be consistent about rumors, we should be consistent.

I mean maybe it is 1.4, and that would explain most of the discrepancy, but all the rumors have said 1. Only a few B3ders have decided 1.4. Which it may be, but that's not really a rumor..

The other side is the presumed 256 for RV770 has no basis either. So it could be much larger too, and that's what I originally said.

There ARE rumors stating more than 1.1b, some old and some new.
They are out there.

I am simply stating rumors that make the most sense and add up...
 
And to add to that, why is chip size becoming such a problem all of a sudden? Yes, GPUs are getting bigger and bigger in the race for top performance, but it's only a gradual increase. I would think power would become a concern long before chip size would be the limiting factor.

Google for "reticle limit". It doesn't matter how quickly or how slowly you approach a brick will, it's still going to stop you in your tracks at the same point.

Power and cooling limits are certainly also a big factor, and are also related to die size (well, transistor count). Though multiple chips doesn't necessarily help you here: the current X2 and GX2 products are both 2-slot solutions so they have the same potential for cooling (which mostly depends on volume available and the system case) as single-GPU 2-slot products. And power is limited by what you can get from the wall/PSU before its limited by what you can deliver to a single GPU.

Just out of curiosity, how big would a die need to be for 512bit? Does that depend on the process used, or is that always the same? And is there some rough (linear) equation to check how big the bus can be for a given size? Or am I asking too many questions? :smile:

Wider busses need more pins coming off the chip. You can only put those pins so close together. So for a given bus width, the pins alone are going to require some minimum chip surface area. The pin density isn't related to process.

Very roughly, the equation would be something like:

(((bus width in bits) * (PINS_PER_BUS_BIT)) + (PINS_FOR_OTHER_THINGS)) / (PINS_PER_SQ_MM)

(CAPS means constant). PINS_FOR_OTHER_THINGS is power, ground, PCI-Express bus, wires to the display output circuitry, etc. You can treat these as constant, though power and ground are going to be related to the die size also.
 
PINS_PER_BUS_BIT is a constant? I heard that ring-bus mem. controllers require less pins than a traditional crossbar. Dunno why is that, though.
 
And to add to that, why is chip size becoming such a problem all of a sudden? Yes, GPUs are getting bigger and bigger in the race for top performance, but it's only a gradual increase. I would think power would become a concern long before chip size would be the limiting factor. If so, yields and design complexity would seem to be the only advantages to the whole multi-chip story, which leads me to the question why, if multi-gpu is the future, it is not also the past? Why wasn't this used earlier. Yields and complexity would be problems of all times right? Why don't we have a G80 with four chip with each four SIMDs and two extra chips with the memory controllers on it?
In a way G80 is a multi-chip solution: some of its functions were split off into the NVIO chip. There have actually been a number of multi-chip solutions in the past. Voodoo 1 was a 2-chip solution, so were the first two PowerVR products. Voodoo 2 was 3-chip (and some of 3dfx's high-end Obsidian boards had as many as 6 chips on board). 3dfx went single-chip with Banshee and Voodoo 3, but they went back to multi-chip with Voodoo 5 (which was available in 1-, 2- and 4-chip versions).
 
All the rumors say GT200 is 1 b, not 1.4.

If we're going to be consistent about rumors, we should be consistent.

I mean maybe it is 1.4, and that would explain most of the discrepancy, but all the rumors have said 1. Only a few B3ders have decided 1.4. Which it may be, but that's not really a rumor..

You can also use speculative math instead. G92 has 754M on a 323mm^2@65nm. That gives an approximate transistor density of 2.33x.

Hypothetical 576mm^2 with the exact same transistor density gives 1342M transistors.

Of course is neither the die size nor the transistor density a given value; even if the die size is correct the latter could be a tad lower. In any case a transistor density of 1.73x which 1B transistors suggests sounds unlikely at least to me.

The other side is the presumed 256 for RV770 has no basis either. So it could be much larger too, and that's what I originally said.

A 256 square millimeters with the speculated unit amount increase for RV770 would be close to ideal for AMD IMHO. Times 2x gives 512mm^2 which sounds like a reasonable X2 competitor for highest end GT200.
 
This has been discussed before: not everybody buys the improved yields argument.

If you have a single die with, say, 8 clusters and 6 memory controllers for a final product witih 6 shader clusters and 5 memory controller, your yield is determined by the y(6 out of 8)*y(5 out of 6)*y(all the rest). The first two factors can easily go into the nineties. The final factor depends on the remaining area.

Let's now assume the case with, idealistically, the same performance: 2 dies with 3 out of 4 shader clusters and 2.5 out of 3 MC's.
Obviously, that's impossible, which illustrates the first problem:less flexiblity in terms of configurations.
Second, the yield of 3 out of 4 clusters will be quite a bit worse than that of 6 out of 8 and y(2 out of 2)**2 will most likely (and y(3 out of 3)**2 definitely) be worse than y(5 out of 6)
Third, there's the 'all the rest' part which you now pay double, whether you use it or not.

So, no, I don't buy the yield argument at all.

Edit: even if the smaller die of the dual die solution has a lower yield, that doesn't necessarily mean the overall solution is more expensive. After all, in case a die is total loss, the amount of area you have to pay for is also less. The crossover point is something that could easily be modeled in a spreadsheet...

With modern GPU's I don't think it's so much the fact that smaller dies = greater yields, as much as the simple fact that you use more of your silicon wafer.

Even if you were to only squeeze in 4-8 more GPUs with a smaller die, if you're using 100k wafers, that's another 400-800k GPUs you can sell. And with a low enough price that should increase the probability they make projected sales numbers without having a lot of inventory gathering dust and losing them money. The higher the price, the bigger the gamble.

Of course, smaller die will also generally mean less performance, but AMD is targeting mainstream/performance mainstream and below as their consumer target and then seeing what they can do for the high end after that.

I really don't see ATI/AMD competing for the enthusiast until they come up with a more elegant multi-gpu rendering solution than AFR.

Until then, it looks like they are going to "play it safe" with cost effective (for both ATI and the consumer) solutions that are less of a risk than extremely large monolithic chips.

So, here's to hoping ATI/AMD's multi-GPU R&D pays off at some point. It would be nice to have at least 2 companies vying for all segments of the market again.

Regards,
SB
 
Is there any reason to believe the % yield would increase?

With modern GPU's I don't think it's so much the fact that smaller dies = greater yields, as much as the simple fact that you use more of your silicon wafer.

Even if you were to only squeeze in 4-8 more GPUs with a smaller die, if you're using 100k wafers, that's another 400-800k GPUs you can sell. And with a low enough price that should increase the probability they make projected sales numbers without having a lot of inventory gathering dust and losing them money. The higher the price, the bigger the gamble.

Of course, smaller die will also generally mean less performance, but AMD is targeting mainstream/performance mainstream and below as their consumer target and then seeing what they can do for the high end after that.


With larger area, you have 3 factors that could increase more than linearly the cost when making a chip whose area is double of another chip.

1) You can fit less than half chips per wafer, usually. Bigger wafers help but not solve this issue.
2) Damage to the wafer affects usually leads to waste a bigger wafer area. And I'm of course not speaking about reticle defects only here.
3) Even not not hitting the target frequencies with a chip usually wastes more wafer area. Of course the part sometimes can be sold as a lower spec product.

Of course redundancy helps, but this is true for all products, even the smaller chips.

I'm working everyday with other wafer types, i.e. power compontents, for which the reasoning is a litle different (as when there is a defect on a chip, you must discard it with no chance to use redundancy tricks) but as an example I can take two widely used chips, one is 84 mm^2 and the other is 156 mm^2, with the very same technology and wafer diameter. The first one costs 1.99€, the second 4.3€ which is an increase of 116% with an area increase of only 85%. And usually the more the area increases, the more than linear is the scaling.

Then of course there is the volume argument, which of course is related to the number of manufactured wafers (i.e sales on the performance parts are usually much stronger than in the enthusiast range, so it's possible to ask for a discount on the wafer price due to the volume. )
 
Last edited by a moderator:
So, no, I don't buy the yield argument at all.
ATI's fine-grained redundancy coupled with the fact that the entire memory system on RV670 is less than 10% of the die is quite a different model than you're proposing.

In RV620 the redundancy is 1:5 for the ALUs (x5 VLIW), TUs and RBEs.

In RV635 the redundancy is 1:9 for the ALUs and TUs and 1:5 for the RBEs.

In RV670 it's 1:17 for the ALUs and TUs. I'm presuming the RBEs are not monolithic, so they remain at 1:5.

If the TUs are not monolithic (still not sure about this) then they're 1:5 in all GPUs.

Jawed
 
ATI's fine-grained redundancy coupled with the fact that the entire memory system on RV670 is less than 10% of the die is quite a different model than you're proposing.
Not knowing anything else about internal redundancy, it really only makes sense to compare big chip vs multi chip within the same architecture.
 
Status
Not open for further replies.
Back
Top