AMD: R9xx Speculation

Non-multiple of a power of 2 hardware thread size doesn't play ball with power of 2 LDS bank count.

EDIT: Well it's not strictly true, i.e. 80 is a multiple of 16. But you also have a problem with 1024 work items per work group D3D11 requirement.
 
Non-multiple of a power of 2 hardware thread size doesn't play ball with power of 2 LDS bank count.

EDIT: Well it's not strictly true, i.e. 80 is a multiple of 16. But you also have a problem with 1024 work items per work group D3D11 requirement.
Are you talking about the maximum number of threads per thread group? This is just a maximum, you could easily use "odd" sizes (of course, performance might suffer depending on the chip). I can't see why that should be a problem, the driver can split this as it sees fit. At worst, the driver could use wavefront size 64 and just not use the rest of the simd (of course, that's wasting resources - but considering we heard claims of the 4D shaders being 98.5% as fast as the 5D shaders, it would still be nearly as fast per simd as on Cypress).

Mind you, I'm not saying I'd like that design - but nothing in the leaks so far really suggests a radical departure from simd organization. And if you want to share L1 cache for TMUs you probably still could do that (e.g. by linking together 2 quad tmus from 2 adjacent simds or something like that).
 
Last edited by a moderator:
It's possible to guestimate the die-size of both configuration, considering Cypress die-size and rumors.
Cypress has a die of 324 mm^2. 1/3 is the space taken by the SIMDs (info that comes from RV770). So in Cypress 108 mm^2 are taken by 1600sp. If the 25% increase space efficiency is true, than in N.I 1280 sp (320x4) can fit in 81 mm^2 and perform as the 1600sp of Cypress. Double that gives, 162 mm^2 for SIMD. A 20% of increase in complexity of the uncore/fixed function unit (TMU/rops), gives about a die-size of 420-430 mm^2 for 2560sp/96TMU/48rops and 380-390 mm^2 for 1920sp/96TMU/48 rops.
By the way, Cayman ROPs can't be 32.. because 32/3 give 10,6 rops per RPE :D
And the same for the SIMD number..640 can't fit 3 RPE.

If you conclude 420-430 mm^2 for 2560sp/96TMU/48rops.
How much power will it eat/suck on 40nm ??

ATI strategy is to make (efficient) power saving GPU's!!!!!

And in order to make just under 200W TDP on 40nm using high clocks - "as ATI always has done from past", GPU has to stay small in size and 2560sp will not work since GPU will be to big.
 
Last edited by a moderator:
If you conclude 420-430 mm^2 for 2560sp/96TMU/48rops.
How much power will it eat/suck on 40nm ??

ATI strategy is to make (efficient) power saving GPU's!!!!!

And in order to make just under 200W TDP on 40nm using high clocks - "as ATI always has done from past", GPU has to stay small in size and 2560sp will not work since GPU will be to big.

If ~330mm^2 was the sweet spot for maximum practical size when 40nm was young then its quite possible that 430mm^2 might be about the same in terms of production difficulty and expense as compared to when Cypress was released.

Theres nothing which says that after 12 months on 40nm a larger GPU isn't still within the sweet spot of power, performance and cost. The same can be said for the 6 core Phenom processors. AMD released that revision on the same process as the original Phenom II and yet the die size is much larger. They still remain within the same TDP of the original inspite the obvious increase in both clocks and core counts.
 
If ~330mm^2 was the sweet spot for maximum practical size when 40nm was young then its quite possible that 430mm^2 might be about the same in terms of production difficulty and expense as compared to when Cypress was released.

Theres nothing which says that after 12 months on 40nm a larger GPU isn't still within the sweet spot of power, performance and cost. The same can be said for the 6 core Phenom processors. AMD released that revision on the same process as the original Phenom II and yet the die size is much larger. They still remain within the same TDP of the original inspite the obvious increase in both clocks and core counts.

Yeah, but AMD/GloFo invest heavily on improving current node, while TSMC tend to use half nodes. That's why GloFo announced 28HPP -- it didn't need to do that when AMD was the sole customer.

The yield may improve on TSMC-40G and you might be rid of double vias, etc. But over 400sqmm? I'm not too optimistic about that.
Yes, it might work, it might be that AMD is still using double vias which contributes to the larger die size. But it would be one more step away from the "sweet spot" for sure.
 
Yeah, but AMD/GloFo invest heavily on improving current node, while TSMC tend to use half nodes. That's why GloFo announced 28HPP -- it didn't need to do that when AMD was the sole customer.

The yield may improve on TSMC-40G and you might be rid of double vias, etc. But over 400sqmm? I'm not too optimistic about that.
Yes, it might work, it might be that AMD is still using double vias which contributes to the larger die size. But it would be one more step away from the "sweet spot" for sure.

I think the clue would be Ontario in this instance. They are getting excellent density on that process with Ontario so it is quite possible TSMC has done significant work to improve the process.

So perhaps the chips are a little larger, especially Barts. However if Ontario is an indication of the density they can get on a mature 40nm, looking at this from another angle I think Cayman might even be as 'little' as 350mm^2.
 
AMD to debut Radeon HD 6000-series in October
Facing the upcoming Nvidia's Fermi-series GPUs and its entry-level GeForce GT 430, aiming for an October launch, AMD has decided to announce its latest generation Radeon HD 6000-series GPU globally on October 19. As the company has successfully spun off its foundry division and received the US$1.25 billion settlement penalty from Intel, AMD decided to increase its promotion budget and will host its AMD Technical Forum and Exhibition 2010 show along with its Radeon HD 6000-series debut conference in Taiwan. For the event, AMD will send several top executives to visit Taiwan and meet its local partners, as well as explain the company's fourth-quarter product roadmap to the Asia Pacific media.
 
I think the clue would be Ontario in this instance. They are getting excellent density on that process with Ontario so it is quite possible TSMC has done significant work to improve the process.

So perhaps the chips are a little larger, especially Barts. However if Ontario is an indication of the density they can get on a mature 40nm, looking at this from another angle I think Cayman might even be as 'little' as 350mm^2.

First of all, Zacate is 74sqmm@400mtrans(est.), that's worse than any AMD GPU@40nm node (mind you Babcat core isn't meant to be a high perf/freq one, so higher density is expected, also Zacate has less IO pins).

That aside, a ~70sqmm part is by no means any indication of what's to expect on a ~400sqmm part. Remember there was a GT218 which worked fine, even GT216/215 are good. Now take a look at GF100/104, can you say the same to those?
 
That aside, a ~70sqmm part is by no means any indication of what's to expect on a ~400sqmm part. Remember there was a GT218 which worked fine, even GT216/215 are good. Now take a look at GF100/104, can you say the same to those?
Well... if GT21x are any good, GF104 is good too, as what's wrong with it is just as wrong with its older cousins : performance.

GF104 has about the same power characteristics as Cypress with a similarly sized die, but less performance, something already seen at the time GT215 launched, and by extrapolation for all the GT21x (40nm vs 55nm, but similar perf/mm² and perf/watt).

In fact, GF104 is better than both GF106 and any GT21x while being bigger. GF100 is worse, but it's another story as the arch has been considerably tweaked for the derivatives, so it could be totally process-issues-independant.


Considering Cayman, it's hard to estimate as it's tied to the improvements over Evergreen arch : Cypress has the worst efficiency of the family (adding to GF100's deception), but if NI shows a better scalability, the die could be even bigger and "gaming" perf/watt comparable or better.


We don't know drivers internals either, so it's possible an architectural change could lead to a lower CPU dependancy (lighter SM5/hardware runtime translation), as it's one of the areas where R600+ GPUs have always lagged behind.
 
Is anyone else here NOT excited about this launch? :(

Please all remember that this launch represents the death of the ATi brand, and show a little respect and reverence.
tombstone.gif
 
@NoX

Not 100%, but that's what I'm going with at the moment. Makes more sense to me than:

6850 Barts Pro
6870 Barts XT
6930 Cayman Pro
6950 Cayman XT
6970 Antilles Pro (Cayman Pro X2)
6990 Antilles XT (Cayman XT X2 )

If you do that you just minimized your halo high performance products and marginalized the performance difference between the single gpu and dual gpu solutions. Which you might do if you had much lower than +50% performance deltas from single card to crossfire-on-a-stick configurations. If Antilles XT/Pro features Cayman XT/Pro ASIC's downclocked to hit TDP targets then they might not be 'X2' (like Hemlock vs Cypress XT isn't, in stock 5970 clock configuration).

While I do expect the Antilles products to be 'Black Editions' with 'unlocked' overclocking and lower core voltages & clocks, I don't think its a good idea to do for products in single numbering scheme.

Plus the places reporting Barts as 6800 keep getting other things wrong that makes me doubt all of it.

@Digi

If AMD had systematically canned ATI engineers, designers etc. over the last four years then I might agree. ATI brand is fading away to give more prominence to Radeon and FirePro - which is a good thing, I think.
 
Is anyone else here NOT excited about this launch? :(

Please all remember that this launch represents the death of the ATi brand, and show a little respect and reverence.
tombstone.gif

I'm a bit meh on it myself. Dumping the rising ATI brand on graphics cards in favor of a currently rather down/mediocre AMD brand on graphics all to try to boost the perception of the AMD brand seems horribly misguided and shortsighted, IMO.

So rather than mourning the loss of ATI, I'm more left shaking my head at inept PR.

Regards,
SB
 
I'm a bit meh on it myself. Dumping the rising ATI brand on graphics cards in favor of a currently rather down/mediocre AMD brand on graphics all to try to boost the perception of the AMD brand seems horribly misguided and shortsighted, IMO.

So rather than mourning the loss of ATI, I'm more left shaking my head at inept PR.

Regards,
SB

The point is to make it clear that the GPU in Llano and other fusion products is the same as the very successful ones found in Radeons. Which, by the way, is the brand that most people know, and AMD is keeping it.

Seems like a pretty smart move to me.
 
Is anyone else here NOT excited about this launch? :(

Please all remember that this launch represents the death of the ATi brand, and show a little respect and reverence.
tombstone.gif

Asking such a question in this thread, is like going into the church Sunday morning and asking everyone if they aren't excited about the idea/concept of God. :LOL:

Actually, you will find more believers here. People go to church for various reasons!
 
Asking such a question in this thread, is like going into the church Sunday morning and asking everyone if they aren't excited about the idea/concept of God. :LOL:
Actually in my case it's more like the high priest asking the congregation that, but I get your analogy. ;)

Still, it worries and burdens me...so I bring it up again. :oops:
 
Actually in my case it's more like the high priest asking the congregation that, but I get your analogy. ;)

Still, it worries and burdens me...so I bring it up again. :oops:

I'm fairly excited (I may upgrade this generation), but think about it. There's some actual competition from Nvidia. 40nm should have got better yields and so there will be better supply and pricing. Although the b0rking of 32nm means this gen is still on last year's process, it's still been a year since Evergreen, and this is a full refresh to the next generation product.

I'm expecting more than the people who seem to think because it's still 40nm, we're just going to get the same performance, die size and prices from last year. I think it will be better than that, or else AMD would simply have keep churning out the 5xxx series for another 6-12 months until 32nm is here.

AMD have tasted a year of advantage over Nvidia who are really only just getting the Fermi family up and running - I'm sure they'd like to pull the same trick again.
 
Back
Top