AMD: R9xx Speculation

RecessionCone · Nov 10, 2010

LordEC911 said:
Let's just say that you are probably wrong about the BOM for the 5970.
As far as the pricing of GTX580 and HD5970, both are making quite a bit of margin.

I've been wrong before and I'll certainly be wrong again, thanks for weighing in.

Just so I understand your perspective: lots of people on these fora, not to mention Charlie at semiaccurate, are constantly prophesying doom and gloom for Nvidia because of their large die sizes. If one believes AMD and Nvidia can make good margins on GTX580 and HD5970, it would suggest that fixating on die size is actually not very useful - that in other words the discussion about die size has been much ado about nothing. Am I understanding your perspective correctly?

Jawed · Nov 10, 2010

RecessionCone said:
because of their large die sizes

Which is the case for the entire range, not just for the halo chip.

Psycho · Nov 10, 2010

hatter said:
Are these specs possible... 6Ghz GDDR5! http://img337.imageshack.us/i/92094335.jpg

That's more or less official. http://www.nordichardware.com/news/71-graphics/41433-amd-radeon-hd-6000-roadmap-leaked.html

But as Gipsel mentions it's the rated memory frequency - Barts is specified at 5Gbps in the same deck. On the other hand I think 5770 and 5870 (both 1200mhz) use 5Gbps chips too..

keritto · Nov 11, 2010

Mianca said:
If the 32ROPs in Barts are just for marketing and it could do almost as well with half of those - that's pretty nice and bodes well for Cayman (especially since Cayman's ROPs might even be improved over their Evergreen counterparts, as Jawed suggested).

Well, these 16 additional ROPs in Bart, aint there purely for marketing reasons. As it was mentioned, it was decision between whether they'd put 2 additional SIMDs or one additional ROP cluster in Barts. And it was noticed that 2 SIMDs would have better utilization for ~2% than 16ROPs cluster.
So 32ROPs decision more probably seems like time-saving-design decision to avoid to undertake time consuming redesign to adapt Cypress-styled GPU into properly working 16ROP design. It even sound contradictory to me how they compared 1280/16 vs 1120/32 design, but then that was in the first place, befor actual design start, decision justified on time-savings and how much additional work should be done.

Why is so rejected as unconceivable theory that Cayman could have 48ROPs? Is it so hard to nest it inside originally redesigned chip?

Jawed · Nov 11, 2010

Gipsel said:
Of course there is no real need. It just makes the balancing better as you distribute the load to a wider set of units.

What's wrong with:

Code:

A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3
A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4
A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3
A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4

Each rasteriser tile is 8x8 pixels in this example.

PSU-failure · Nov 11, 2010

keritto said:
Well, these 16 additional ROPs in Bart, aint there purely for marketing reasons. As it was mentioned, it was decision between whether they'd put 2 additional SIMDs or one additional ROP cluster in Barts. And it was noticed that 2 SIMDs would have better utilization for ~2% than 16ROPs cluster.

Exact meaning was performance was even, +/- 2%.

In fact, Barts is probably a 16 SIMDs part with 2 disabled even on the XT variant, as it wouldn't be that better...

1- probably no more than 10% more performance clock/clock
2- higher power draw (more active transistors)
3- hey, boss! Why not disable some SIMDs entirely, so that yield end up closer to 100% than ever before? (nVidia redundancy approach, more or less, and perhaps disabled SIMDs could draw almost no current)

mczak · Nov 11, 2010

Psycho said:
That's more or less official.
But as Gipsel mentions it's the rated memory frequency - Barts is specified at 5Gbps in the same deck. On the other hand I think 5770 and 5870 (both 1200mhz) use 5Gbps chips too..

Yep. I really wonder though what frequency Cayman XT will actually achieve. It seems plausible that it won't quite be 1.5Ghz (which incidentally would be needed to have the same bandwidth as GTX580), but something like 1.4Ghz would be less than 20% increase over HD5870. At least this time they probably won't use the Redwood PHY

.

keritto said:
Well, these 16 additional ROPs in Bart, aint there purely for marketing reasons. As it was mentioned, it was decision between whether they'd put 2 additional SIMDs or one additional ROP cluster in Barts. And it was noticed that 2 SIMDs would have better utilization for ~2% than 16ROPs cluster.

That's not how I read it. Adding 2 additional simds just was about 2% slower than adding the 16 additional rops. So using some educated guess (taking simd scaling into account etc.) this looks to me like the 16 additional rops are good for a bit more than 10% increase in performance. That's not a whole lot, but the ROP/bandwidth ratio is a bit higher than on HD5870 (higher core clock/lower mem clock), and it should only add about 12mm² or so to the die size, so this was definitely worth it.

Why is so rejected as unconceivable theory that Cayman could have 48ROPs? Is it so hard to nest it inside originally redesigned chip?

You mean just 3 rop blocks per memory partition instead of 2? Well I guess that would be possible in theory (but none of the rumors suggested that). But you also have to keep in mind that memory bandwidth hasn't gone up a whole lot (max possible is 25% against HD5870 if they really would achieve 1.5Ghz, more realistic is probably something like 20%), so the bandwidth might just not be there for any additional rops to really help much (though I guess more z fillrate could help a bit - I don't think more color fillrate would help in any case).

Squilliam · Nov 11, 2010

Hey guys im having a dumb here, what does it mean when it says 'radial' and 'blower'. What are the practical differences between the two? Is one better? Cheaper?

Anyway wheres the 2GB SKU? How come some official looking slides say 1GB and others say 2GB?

Gipsel · Nov 11, 2010

Jawed said:
What's wrong with:

Code:

A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3 A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4 A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3 A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4

Each rasteriser tile is 8x8 pixels in this example.

Small triangles (which are problematic for both, the rasterizers as well as the ROPs) tend to cluster on the screen

. It's therefore preferable to use smaller checkerboard patterns for a more even distribution of the load.
Your example would group 4x4 tiles of 8x8 pixels (at least, maybe you were thinking of stripes) for one rasterizer. The probability that one rasterizer is completely busy and the second one has nothing to do is much higher in that case (or you would need much larger buffers). Neighbouring tiles definitely should be assigned to different rasterizers. But one may make the ROP tiles larger than that. It depends a bit on the buffers/color caches there. But generally the same rule should apply.

eastmen · Nov 11, 2010

RecessionCone said:
I've been wrong before and I'll certainly be wrong again, thanks for weighing in.

Just so I understand your perspective: lots of people on these fora, not to mention Charlie at semiaccurate, are constantly prophesying doom and gloom for Nvidia because of their large die sizes. If one believes AMD and Nvidia can make good margins on GTX580 and HD5970, it would suggest that fixating on die size is actually not very useful - that in other words the discussion about die size has been much ado about nothing. Am I understanding your perspective correctly?

This is my understanding. You have X amount of space on a wafer. With a 550mm2 chip you can make Y amount of chips. With a 330mm2 chip you can make Z amount of chips. So you can already make more of Z than of Y. To compound this Y will have alot more problem chips than Z.

So Z will cost less than Y.

The benfit of Amd's stratagy is this.

You have competetive cards made with a single Z . So in this example you get the 5870 with perfect dies. Cores with few problems become the 5850 and those with alot of problems become 5830. You then get the benfit of using two of the perfect cores to create you halo product a product that if made of a single huge core wouldn't be priced well or be made in enough quanity.

You then have what happened to Nvida. They made a single huge chip that had alot of problems. So with nvidia they had to start with problem chips. So you had the gtx 480 which was a problem chip. You have the gtx 470 which was even more crippled and then you had the gtx 465 (i think it was) that had even more problems. Yes the gtx 480 was faster than the 5870. however it cost more to make as nvidia got fewer of those than amd got of the 5870 and it wasn't fast enough to compete with the 5970 .

And don't forget that you have what the wafer costs. So if a wafer costs $5,000 and amd gets 500 chips each chip costs $10 bucks. If nvidia only gets 100 chips then its $50 a chip. So nvidia's product could end up costing $40 more right off the bat. Throw in higher power usage that the gf100 had and you need better cooling and better components to supply the power and you also need more layers on your pcb which all continues to add to the cost.

now i have no idea how much a wafer costs or what the yields are for each chip. I was just using a basic example

Look at whats happening now. We are finally getting the gtx 580 which is what the 480 should have been while amd has been able to introduce barts which offers what 90% of the performance with less than 80% of the die space ? This is a great deal for amd as they are able to enter lower markets with a smaller chip . The chip is smaller than cypress which is smaller than the geforce 104(gtx 460) Cayman is coming which is going to be bigger than cypress but should still be smaller than the gtx 580 and will offer perhaps similar performance , but since its smaller it seems amd can slap two on a board and be even faster than the gtx 580.

I really don't care. I wont spend more than $200/250 on a video card ever again. I much rather just wait a year and get the new performance card.

LordEC911 · Nov 11, 2010

Entropy said:
Yield, or put another way, return rate.
Thus quoth Dave Baumann:

Good point. I definitely know that is a part of the reason they don't go right up to the limit but I wasn't taking that into consideration at the time.

RecessionCone said:
I've been wrong before and I'll certainly be wrong again, thanks for weighing in.

Just so I understand your perspective: lots of people on these fora, not to mention Charlie at semiaccurate, are constantly prophesying doom and gloom for Nvidia because of their large die sizes. If one believes AMD and Nvidia can make good margins on GTX580 and HD5970, it would suggest that fixating on die size is actually not very useful - that in other words the discussion about die size has been much ado about nothing. Am I understanding your perspective correctly?

I don't think I, nor many people here, think that way though there are definitely some. I am more of the thought process that if they keep pushing up towards the limit they have some tough times ahead, as seen by G200 and GF100. Obviously, with Nvidia making most of their money in the workstation/professional market and the GPGPU stuff slowly increasing, they are able to live off those large margins but as far as the desktop market, performance and enthuisast is not where the big money is and you need those OEM contracts.

keritto said:
Why is so rejected as unconceivable theory that Cayman could have 48ROPs? Is it so hard to nest it inside originally redesigned chip?

Because since the ROPs are tied to the MC and most of us seem to not believe they are going with a 384 bus with +240Gbps of bandwidth.

Squilliam said:
Hey guys im having a dumb here, what does it mean when it says 'radial' and 'blower'. What are the practical differences between the two? Is one better? Cheaper?

Anyway wheres the 2GB SKU? How come some official looking slides say 1GB and others say 2GB?

Well radial = blower, i.e. the reference fan/cooler design on most of their highend series, HD58x0, HD48x0, etc.

I would assume the axial fans are the ones commonly found in the non-reference models and are most likely cheaper and seem to offer better cooling performance for lower sound levels but also are not commonly designed to exhaust air out of the back of the case.

Silent_Buddha · Nov 11, 2010

RecessionCone said:
I've been wrong before and I'll certainly be wrong again, thanks for weighing in.

Just so I understand your perspective: lots of people on these fora, not to mention Charlie at semiaccurate, are constantly prophesying doom and gloom for Nvidia because of their large die sizes. If one believes AMD and Nvidia can make good margins on GTX580 and HD5970, it would suggest that fixating on die size is actually not very useful - that in other words the discussion about die size has been much ado about nothing. Am I understanding your perspective correctly?

Die size does matter, but you making some seriously flawed assumptions with your use of die size in comparing the impact on something the size of Cypress versus GF110 (5970 versus GTX 580).

This example I'm using obviously isn't going to be with regards to actual dize size or die size ratios between Cypress and GF110, but...

Lets say you have one die that is 2x the size of another die. And assume that there's no redundancy mechanism for dealing with potential defects.

So only 1 defect is enough to ruin a die. So lets say there's just one defect in the area taking up by the large die. And lets say 2x of the smaller die take up the same wafer area. Only now, only 1 of those smaller dies is defective and the other one is perfectly fine. So 1x failed big die = 1x failed small die and NOT 2x failed die as you are assuming. Already you're getting better yields with the smaller die. This is a very simplistic case, obviously but it's only to prove a point.

As such potential yields for the larger die are going to be far worse than the yields for a smaller die. Hence you'll get more product as a percentage of total wafer size the smaller your dies are. Which is why there's an exponentially higher percentage of dies with defects per wafer the larger your die size is.

Thus on the same wafer with the same number of defects scattered across the wafer, you will almost always get more than 2x the number of smaller dies (that are 1/2 the size of the larger die) versus larger dies.

So in this case. With regards to margins, yields, cost of manufacture, etc. 1x GF100 doesn't equal 2x Cyrpess since on the same wafer in theory you'll get more than 2x good Cypress cores for every 1 GF100 core. The ratio can be adjusted up or down depending on how well you've designed the chip to be tolerant of potential defects.

Regards,
SB

Squilliam · Nov 11, 2010

^ Thanks.

Squilliam · Nov 11, 2010

Silent_Buddha said:
Die size does matter, but you making some seriously flawed assumptions with your use of die size in comparing the impact on something the size of Cypress versus GF110 (5970 versus GTX 580).

This example I'm using obviously isn't going to be with regards to actual dize size or die size ratios between Cypress and GF110, but...

Lets say you have one die that is 2x the size of another die. And assume that there's no redundancy mechanism for dealing with potential defects.

So only 1 defect is enough to ruin a die. So lets say there's just one defect in the area taking up by the large die. And lets say 2x of the smaller die take up the same wafer area. Only now, only 1 of those smaller dies is defective and the other one is perfectly fine. So 1x failed big die = 1x failed small die and NOT 2x failed die as you are assuming. Already you're getting better yields with the smaller die. This is a very simplistic case, obviously but it's only to prove a point.

As such potential yields for the larger die are going to be far worse than the yields for a smaller die. Hence you'll get more product as a percentage of total wafer size the smaller your dies are. Which is why there's an exponentially higher percentage of dies with defects per wafer the larger your die size is.

Thus on the same wafer with the same number of defects scattered across the wafer, you will almost always get more than 2x the number of smaller dies (that are 1/2 the size of the larger die) versus larger dies.

So in this case. With regards to margins, yields, cost of manufacture, etc. 1x GF100 doesn't equal 2x Cyrpess since on the same wafer in theory you'll get more than 2x good Cypress cores for every 1 GF100 core. The ratio can be adjusted up or down depending on how well you've designed the chip to be tolerant of potential defects.

Regards,
SB

I just want to add to this that in addition to the physical costs of making each die/board you also have the NRE (Non recoverable expenditure) from the actual research and development costs in making the chip. Now with a sweet spot strategy instead of having one large SKU which might sell 5M chips and one smaller SKU which might sell 15M chips for instance, say GF100 and 104 you instead have one chip which allows you to amortize the NRE over say 20M chips for both tiers of product.

mczak · Nov 11, 2010

LordEC911 said:
Because since the ROPs are tied to the MC and most of us seem to not believe they are going with a 384 bus with +240Gbps of bandwidth.

There are currently 2 rops per memory partition. I think 3 rops instead should be possible, but I'm not sure it would help a lot.

Silent_Buddha said:
Die size does matter, but you making some seriously flawed assumptions with your use of die size in comparing the impact on something the size of Cypress versus GF110 (5970 versus GTX 580).

That's all true, though HD5970 isn't really that die area efficient indeed. If you look at HD6870CF results, you'll notice it's easily quite a bit faster in almost everything with only very slightly higher power draw. AMD could IMHO quite easily do a 2xHD6870 solution instead (might require chip binning or VERY slightly lower clocks along with slightly reduced voltage) which would cut "combined die size" quite a bit (to GF110 levels actually), and still have the same (or faster) performance with the same power draw as the HD5970. The only drawback I'd see with such a solution would be that it wouldn't overclock that well... But of course such a product would be pointless with the imminent Cayman release (even if Antilles is released a bit later). FWIW I don't think Antilles will be very efficient using that "perf/combined die area" measure neither. Seems a safe bet AMD will need to reduce clocks (and maybe even disable some parts), possibly to a larger degree than what was necessary for HD5970. But of course the point of this solution won't be that it's efficient. The one and only goal of this product is that it's the fastest graphic card.

ECH · Nov 11, 2010

New rumor has it that the 6950 and 6970 will be released on Novemeber 29th
source

RobertR1 · Nov 11, 2010

I thought they'd try to get them out before black friday, esp if the date is that close to BF.

gkar1 · Nov 11, 2010

RobertR1 said:
I thought they'd try to get them out before black friday, esp if the date is that close to BF.

The 29th is Cyber Monday

keritto · Nov 11, 2010

PSU-failure said:
In fact, Barts is probably a 16 SIMDs part with 2 disabled even on the XT variant, as it wouldn't be that better...
(...)
3- hey, boss! Why not disable some SIMDs entirely, so that yield end up closer to 100% than ever before? (nVidia redundancy approach, more or less, and perhaps disabled SIMDs could draw almost no current) .

I used to embrace that 1280SPs theory also in pre-release Barts times.
But then AMD doesn't need to design redundancy in their rather small chips (255mm²) after all as nv does need with close to 400mm² GF104 chip and their older and bigger bros. And G80-GF100 design approach are fairly different than RV670->Barts ("RV940") design approach. So, now i could settle with AMD simulated both versions 1280/16, 1120/32, and even 1280/32 which was rejected with ultimate goal for smaller chip. And 32ROPs needed least design optimizations over 1280/16 part, and design outperformed main competitor GTX460 even with 1120SPs. (ofc, we excluding to mention here insanely good overclock abilities of GF104 chip (25%+ ) which then arose as 240W TDP parts instead 160W, and Barts couldnt be clocked that well on air even if they theoretically could sustain that high TDP

)
I'd back-upped that to Jawed's notes about strong ROPs ties to Setup engines in HD5000 series (and all R600 onwards design), and assumptions that we'd even saw same thing in "redesigned 4-VLIW engine" Caymans.

Unknown Soldier said:
That or TSMC screwed them over again.

In fact, that might not be so hard to believe considering how gratuitous nVidia payoff tsmc to push on faster transition to 28nm node which they desperately needed for their GPUs.

And considering that AMD should and would go to GloFo 28nm node for most of their value products (NG Bobcat APUs -Krishna/Wichita, and for most of their mobile products following these mobile "2nd Gen dx11 GPUs") and that's the most profitable product spot

Harison said:
Dont think so, even if by some miracle TSMC will have live 28nm production line in Q1 2011, its capacity will be low, hence unusable for the mass production chips.

Well, it ain't totally false

if we could judge by this
[Apr2010] Virage Logic launches 28nm IP suite at TSMC event
And capacities should certainly be good enough for GF119 part (96SPs GF104 alike, 64-bit). With probable die size around 70mm² considering its 40nm GF108 counterpart is 116mm².

ZerazaX · Nov 11, 2010

ECH said:
New rumor has it that the 6950 and 6970 will be released on Novemeber 29th
source

They said their sources were from the internet? Let me guess... Fudzilla?

AMD: R9xx Speculation

RecessionCone

Jawed

Psycho

keritto

Jawed

PSU-failure

mczak

Squilliam

Beyond3d isn't defined yet

Gipsel

eastmen

LordEC911

Silent_Buddha

Squilliam

Beyond3d isn't defined yet

Squilliam

Beyond3d isn't defined yet

mczak

ECH

RobertR1

Pro

gkar1

keritto

ZerazaX

Similar threads