AMD: R9xx Speculation

Might not make sense if 6800 is where the money is.
6900 have clearly higher margins, and if there are lets say 100.000 DrMOS chips, would you rather sell 100k 6800, or share DrMOS between 6800/6900 and get better profits and keep up the launch date? Its not brainier really, the big question is how fast TI can solve this shortage issue (if there is one in the first place, too many fake rumors).
 
Come on, it almost been a 'known' thing that XT would be 1920SP with 30 SIMDs and Pro 1536SP and 24 SIMDs. Rumours have been speculating for ages. Also, it's VLIW4

My speculation:

XT specs:
900Mhz
5Ghz
VLIW4
1920SP
30SIMD

Pro Specs:
850Mhz
4.8Ghz
VLIW4
1536SP
24SIMD

They both end up with 64 which is required for Wavefront.

Sounds reasonable, yes, but that would introduce a 30% difference from one another. Still, it would give AMD the option to price it 30% higher as well while actually having an excuse.

With these specs, the Cayman XT could turn around 50% faster than Barts XT (having 1920X1200+4XAA in mind). It would thus be, around 5% faster than the 580.

I bet it will be quite cheaper as well, so it could be just what the doctor ordered!

Pity I will have to try my best to avoid AMD from now on, thanks to their notorious HDTV gaming support. Nvidia on the other hand is too damn expensive. Crap! :(
 
Sounds reasonable, yes, but that would introduce a 30% difference from one another. Still, it would give AMD the option to price it 30% higher as well while actually having an excuse.
Just like previous designs, I doubt it will scale anywhere close to linear with simd count. So with 20% less simds but only 6% clock you could end up with a similar difference than between HD5870 and HD5850, whose 10% less simds and 15% less clock resulted in about 15% less performance (this is of course a bit simplified but you get the idea). That said I'm not sure why AMD wanted to disable so many simds, for power reasons at least it would be preferable to disable less and make the clock difference larger instead.
 
Where did 30 SIMDs come from? Admittedly I haven't been following Cayman all that closely, but I was under the assumption that Cayman XT had less vector lanes in total than Cypress XT.
 
Sounds reasonable, yes, but that would introduce a 30% difference from one another. Still, it would give AMD the option to price it 30% higher as well while actually having an excuse.

Maybe they can deactivate SIMDs independent of TMUs / tessellators etc.? :rolleyes:

If you just cut one SIMDs per module, but keep the rest of the module "intact", then the overall performance delta will stay within the "usual" ~ 20% - even if clock speeds are cut ...
 
Where did 30 SIMDs come from? Admittedly I haven't been following Cayman all that closely, but I was under the assumption that Cayman XT had less vector lanes in total than Cypress XT.
Sorry, terminology fail, I mean ALUs.
And where does your assumption come from?
With the VLIW4 architecture the units itself are probably a bit smaller so it may make sense to put roughly the same amount of ALUs (i.e. more VLIW units) in roughly the same (or only slightly higher) area.

If the general layout still closely resembles Cypress (just with two setup engines/tesselators for the two rasterizers), it doesn't make much sense to reduce the number of ALUs as they are actually quite cheap. But if the overall organization is significantly overhauled (think of shared TMUs between SIMDs within a render engine, a more nvidian approach with render engines resembling GPC or something else like that) it wouldn't make much sense to base performance speculations on the unit numbers at all so it could very well be quite a bit faster with a reduced ALU count.
 
Just like previous designs, I doubt it will scale anywhere close to linear with simd count. So with 20% less simds but only 6% clock you could end up with a similar difference than between HD5870 and HD5850, whose 10% less simds and 15% less clock resulted in about 15% less performance (this is of course a bit simplified but you get the idea). That said I'm not sure why AMD wanted to disable so many simds, for power reasons at least it would be preferable to disable less and make the clock difference larger instead.

Ah yes, I see what you mean.

Actually there's a more recent example you could use. The 6870 vs the 6850. The first has around 16% more, TMUS, ALUS and frequency and still it's only around 16-18% faster. Hmmm....!
 
Maybe they can deactivate SIMDs independent of TMUs / tessellators etc.? :rolleyes:

If you just cut one SIMDs per module, but keep the rest of the module "intact", then the overall performance delta will stay within the "usual" ~ 20% - even if clock speeds are cut ...

So far, in both the 5850 compared to the 5870 and the 6850 compared to the 6870, they cut TMUs, ALUS and frequency. The only thing that remained the same are the ROPs and still the difference was quite small. I thought that the frequency deficit would be added to the specs deficit but it seems it's not, as our friend above correctly showed. Maybe this shows something about the importance of ROPS in AMD's architecture.

I got confused about the frequency in conjunction to the specs, due to that Nvidia slide we saw today (the one that was showing where the performance increase of the GTX 580 came from).
 
Hmm... if it's 30 SIMDs for XT and 24 for PRO, how are they arranging it?

If they are doing 3 'lanes' as people have suggested... 3x10 for XT and 3x8 for PRO?

Pretty terrible rebuttal. Nothing you said had anything to do with his point.

He wasn't talking about yields or power consumption. He was talking about measured performance relative to available resources - i.e. efficiency. And yes, you can count aggregate bandwidth since the 5970 is working on multiple frames in parallel.

Any time you use dual cards you're going to suffer in efficiency - so its pointless to argue about available the raw # of resources in that context. My point was that his use of those numbers is pretty damn flawed and wrong from a how-GPUS-actually-function perspective and from a product management perspective - having 2 x dies doesn't automatically mean you're spending more money on your particular card. It's all marketing too to say 2x2GB = 4GB when we all know that's not how it necessarily works. And besides, I didn't even have to point out that having more memory bandwidth total doesn't mean you automatically perform faster - there's plenty of cards out there that are nice examples.
 
Any time you use dual cards you're going to suffer in efficiency - so its pointless to argue about available the raw # of resources in that context. My point was that his use of those numbers is pretty damn flawed and wrong from a how-GPUS-actually-function perspective and from a product management perspective - having 2 x dies doesn't automatically mean you're spending more money on your particular card. It's all marketing too to say 2x2GB = 4GB when we all know that's not how it necessarily works. And besides, I didn't even have to point out that having more memory bandwidth total doesn't mean you automatically perform faster - there's plenty of cards out there that are nice examples.

No, the numbers I quoted are crucial to the HD5970's performance and how it actually functions. It really does take 670 mm^2 of silicon from TSMC, and with all the hubbub about AMD's GPU business being wafer throughput limited, this fact is not to be overlooked. If AMD were to manufacture the 5970 in quantities sufficient to compete with GTX580, they would be severely limiting the overall amount of cards they can produce. The 5970 really does consume 256 GB/s of bandwidth - if you gave each GPU 80 GB/s so that aggregate bandwidth was equal to a 5870, I guarantee performance would suffer drastically. The 5970 really does have 2.63x the raw floating point throughput of a GTX580. With the amount of resources AMD has thrown at the 5970, to make it work business wise, it should completely dominate the GTX580. And yet it doesn't, especially on DX11 games.

Additionally, the HD5970 is very expensive to produce - it probably costs AMD more than 2x the cost of a 5870, since they had to sandwich all those components into a more sophisticated enclosure, with a better cooler, etc. I would guess they also have to use the very best Cypress dies in terms of power characteristics in order to fit their power envelope. "Addressing" the GTX580 with the 5970 would be an economic disaster for AMD.

For all the ink spilled about Nvidia's gargantuan die size, leading to economic doom and gloom for the whole company, I find it remarkable that people here champion the 5970 as a worthy answer to GTX580. As I said in my post, AMD itself doesn't want to "address" the GTX580 with the 5970 - instead Cayman will perform that job. I expect Cayman to perform rather well and give the GTX580 stiff competition - with a tremendously better business case than the 5970 ever had.
 
Let's just say that you are probably wrong about the BOM for the 5970.
As far as the pricing of GTX580 and HD5970, both are making quite a bit of margin.
 
No, the numbers I quoted are crucial to the HD5970's performance and how it actually functions. It really does take 670 mm^2 of silicon from TSMC, and with all the hubbub about AMD's GPU business being wafer throughput limited, this fact is not to be overlooked. If AMD were to manufacture the 5970 in quantities sufficient to compete with GTX580, they would be severely limiting the overall amount of cards they can produce. The 5970 really does consume 256 GB/s of bandwidth - if you gave each GPU 80 GB/s so that aggregate bandwidth was equal to a 5870, I guarantee performance would suffer drastically. The 5970 really does have 2.63x the raw floating point throughput of a GTX580. With the amount of resources AMD has thrown at the 5970, to make it work business wise, it should completely dominate the GTX580. And yet it doesn't, especially on DX11 games.

You fell, or deliberately walked into, the 'its one card so the numbers are cumulative' trap of multi-GPU-on-a-stick. Your conclusion is flawed, because your premises are invalid.

My hair is a bird.
 
No, the numbers I quoted are crucial to the HD5970's performance and how it actually functions. It really does take 670 mm^2 of silicon from TSMC, and with all the hubbub about AMD's GPU business being wafer throughput limited, this fact is not to be overlooked. If AMD were to manufacture the 5970 in quantities sufficient to compete with GTX580, they would be severely limiting the overall amount of cards they can produce. The 5970 really does consume 256 GB/s of bandwidth - if you gave each GPU 80 GB/s so that aggregate bandwidth was equal to a 5870, I guarantee performance would suffer drastically. The 5970 really does have 2.63x the raw floating point throughput of a GTX580. With the amount of resources AMD has thrown at the 5970, to make it work business wise, it should completely dominate the GTX580. And yet it doesn't, especially on DX11 games.

AMD didn't need to produce more 5970's because it's more profitable to sell a single GPU - consider the price difference between a 5870 and 5970 at launch. If the 5870 is getting a margin of $200, adding another GPU and doing a more complicated PCB and selling the card for $200 more would hurt margins.

They were throughput limited early, but even as supply issues eased, they still didn't make more 5970's - because they were making more money selling fewer Cypress in all likelihood.

Had Fermi been more of a competitor, its likely AMD would've released more 5970's - but they are still a business, so they're going to do whats best for their bottom line.

And again, its pointless to argue FLOP count - we all know that they cant be compared across different companies


Additionally, the HD5970 is very expensive to produce - it probably costs AMD more than 2x the cost of a 5870, since they had to sandwich all those components into a more sophisticated enclosure, with a better cooler, etc. I would guess they also have to use the very best Cypress dies in terms of power characteristics in order to fit their power envelope. "Addressing" the GTX580 with the 5970 would be an economic disaster for AMD.

Says who? Silicon is going to be your most expensive part, and given TSMCs issues, it likely was the most important part. But you're neglecting the fact that it's 1 PCB, only 1 card to test, and only 1 package to use, vs. 2.

And again, you're neglecting the important factor - profit margin. When Cypress XT prices went up, they were making more profit per Cypress XT than expected. So why not sell a card that has more VOLUME and possibly even higher profit margins when the enemy has no competition?

For all the ink spilled about Nvidia's gargantuan die size, leading to economic doom and gloom for the whole company, I find it remarkable that people here champion the 5970 as a worthy answer to GTX580. As I said in my post, AMD itself doesn't want to "address" the GTX580 with the 5970 - instead Cayman will perform that job. I expect Cayman to perform rather well and give the GTX580 stiff competition - with a tremendously better business case than the 5970 ever had.

Because yields aren't linear - producing a die 50% larger doesn't mean you have 50% more defects - you may well have 4x+ more because its exponentially increased.

2x Cypress may well be easier to produce than 1 x GF110 - of course, TSMC's 40nm woes hurt the case for 2 x GPUs, but that's nothing AMD could have anticipated, and frankly, it hurt Fermi and its release quite a bit

I can't believe you're rehashing an argument people have brought up and debunked months and months ago


You fell, or deliberately walked into, the 'its one card so the numbers are cumulative' trap of multi-GPU-on-a-stick. Your conclusion is flawed, because your premises are invalid.

My hair is a bird.

lol

shhh dont tell that to trinibwoy ;)
 
Last edited by a moderator:
You fell, or deliberately walked into, the 'its one card so the numbers are cumulative' trap of multi-GPU-on-a-stick. Your conclusion is flawed, because your premises are invalid.

My hair is a bird.

I disagree. Rather than flatly stating that I'm wrong, perhaps you could try persuading me to see things differently.

I have never said that multi-GPU scaling should be linear. I'm just pointing out that physically, multi-GPU setups are very demanding. It's an indisputable fact that HD5970 consumes 256 GB/s of bandwidth.
 
I disagree. Rather than flatly stating that I'm wrong, perhaps you could try persuading me to see things differently.

I have never said that multi-GPU scaling should be linear. I'm just pointing out that physically, multi-GPU setups are very demanding. It's an indisputable fact that HD5970 consumes 256 GB/s of bandwidth.


Dual GPU cards don't have 2x the BOM, not all the components are doubled.

HD5970 doesn't really consume 256 gb/s bandwidth, each GPU needs X amount of bandwidth to feed it, and with dual GPU cards they effectively have double the bandwidth, fillrates, shader power, etc. The only thing that doesn't double is the memory amounts. So a 2 gb HD5970 has only 1 gb effective memory.
 
Back
Top