AMD: R9xx Speculation

Wow! AMD really fought hard for every sq.mm in Barts.
He actually means PHY. Both Redwood and Juniper are (or or less) pad limited 128-bit designs, given the smaller engine on Barts the higher speed PHY is not required anyway.
 
Saving die space is a bonus. Normally you only save on cheaper RAM. This is the right move for a part in this market, because you don't want to pay for 4.8GHz memory anyway.
Well but they are indeed using parts rated for 5Ghz anyway it seems...
Whatever AMD did, the total savings are substantial. Removing 30% of the SIMDs would only save around 10% of Cypress.
Good point. Some key figures Cypress vs. Barts:
Die size: 334mm² - 255mm²: -24%
Transistors: 2.15B vs. 1.7B: -21%
SIMDs: 20 vs. 14: -30%

The trouble is we don't know the size of a simd (though Dave essentially said 2 simds are as big as 16 rops...). Based on the numbers quoted for rv770 from Jawed, taking into account scaling to 40nm and that they are possibly slightly more complex, I'd reckon around 6-7mm²? So skipping 6 of them would indeed only save around 35-40mm², i.e. only half the effective difference.
There are 3 identified other sources of savings:
- MC designed for lower clock
- removal of DP
- removal of second Xfire port
How much those 3 items saves is anyone's guess (the tesselation/thread management improvements OTOH also probably have some hardware cost, though I'd guess it's small). It is also worth noting imho that transistor density went up a bit.
 
Well but they are indeed using parts rated for 5Ghz anyway it seems...
I'm pretty sure IHVs will find a way to save cost by clocking at 4.2GHz.

The trouble is we don't know the size of a simd (though Dave essentially said 2 simds are as big as 16 rops...). Based on the numbers quoted for rv770 from Jawed, taking into account scaling to 40nm and that they are possibly slightly more complex, I'd reckon around 6-7mm²? So skipping 6 of them would indeed only save around 35-40mm², i.e. only half the effective difference.
Yeah, that's basically what I did in saying 10%, but maybe we underestimated it a bit. Could be 50 mm2, or 15% of the die.

Still, there are things that can cost die space, too: faster tessellation, higher clock speed, better CrossFire scaling. All in all, Barts is pretty impressive.
 
Oh, looked it up and actually not that much. Seems HD5850 lowered mem clock from 1000Mhz to 900Mhz, so hardly worth it. Maybe that's the reason HD68xx don't bother at all.
Maybe they simply do not think that bothering with power saving during UVD operation (and validating it) is worth the cost?

I had two different HD4670 cards, their UVD clocks were identical to load clocks, and one had DDR3-800, the other GDDR3-1000. The UVD memory clocks were identical to load clocks, even though its apparent that they could have underclocked the memory on the faster card.
Also, there does not seem to be a difference between the HD4650 and HD4670 in UVD capabilities, so it should at least be possible to clock HD4670 in UVD mode down to HD4650's clocks.

The same argument works with the HD5770 - if the HD5570's memory bandwidth (900MHzx2x128bit) is enough for perfect UVD/HPTC operation, then it should be possible to downclock HD5870 to 225MHzx4x256bit for the memory.
So it does seem that AMD simply do not consider it worthwhile to conserve memory power during video decoding; but this is a complete guess and they probably have excellent reasons for this.

One cannot assume that they could downclock the cores that much, even if they have four times as many shaders, because the UVD unit may require a minimum frequency to work.
 
It is also worth noting imho that transistor density went up a bit.
For CPUs, various parts (caches, cores, memory controllers) have different transistor densities.
So did the density go up because they removed the less dense units, or because they removed some extra redundancy due to a more mature 40nm process?
 
I measure 36.8mm² of GDDR5 interface (just area for interface pads, "PHY") on RV770. Say 40% bigger than that in Cypress? That's around 15mm².

Hints point to the 17th, redundancy, VLIW-5 block being dropped in Cypress. Texturing is also a bit more complex. LDS is double the capacity, has atomics logic and a more complex read/write infrastructure.

The cores in RV770 are ~41% of the die. The dual-engine nature of Barts must cost quite a bit of extra die space in comparison with RV770's layout, but there's no real way to work that out. No idea how much that would scale with core count.
 
So did the density go up because they removed the less dense units, or because they removed some extra redundancy due to a more mature 40nm process?
:LOL: We all forgot, Cypress gained girth due to doubled vias, supposedly - so that's another thing we can't account for in Barts's diet.
 
The struggle over HAWX 2 seems to originate from the argument whether to use Adaptive Tessellation or not:

1-Nvidia & Ubisoft say "No"
2-AMD says "Yes"

http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=fr&tl=en&u=http%3A%2F%2Fwww.hardware.fr%2Farticles%2F804-17%2Fdossier-amd-radeon-hd-6870-6850.html

Personally , I think Adaptive Tessellation could be the perfect replacement for LOD ,at least it should solve the annoying problem of geometry and detail pop-ups that plague all games nowadays .
 
The struggle over HAWX 2 seems to originate from the argument whether to use Adaptive Tessellation or not:

1-Nvidia & Ubisoft say "No"
2-AMD says "Yes"

http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=fr&tl=en&u=http%3A%2F%2Fwww.hardware.fr%2Farticles%2F804-17%2Fdossier-amd-radeon-hd-6870-6850.html

Personally , I think Adaptive Tessellation could be the perfect replacement for LOD ,at least it should solve the annoying problem of geometry and detail pop-ups that plague all games nowadays .

I run the benchmark and i think they are using adaptive tessellation. It would make no sense because nVidia is promoting it with their tessellation demos - oh and Stone Giant is using it, too.

Maybe AMD thinks it could be more aggressive.
 
Last edited by a moderator:
The struggle over HAWX 2 seems to originate from the argument whether to use Adaptive Tessellation or not:

1-Nvidia & Ubisoft say "No"
2-AMD says "Yes"

http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=fr&tl=en&u=http%3A%2F%2Fwww.hardware.fr%2Farticles%2F804-17%2Fdossier-amd-radeon-hd-6870-6850.html

Personally , I think Adaptive Tessellation could be the perfect replacement for LOD ,at least it should solve the annoying problem of geometry and detail pop-ups that plague all games nowadays .
Wasn't the "Adaptive" part the initial goal of doing tesselation to start with?
 
So whats with the new FSAA option ? Will we get it on the 5x00 series ? I don't see either of these cards as a replacement for my 5850 at 5870 speeds if anything its a side step and currently i don't need anything more powerfull for my gaming. It be nice if the new fsaa comes to the 58x0 series , it seems like a really nice thing for eyefinity resolutions
 
Nvidia and their fans are really something :oops: Both AMD and Nvidia LOVE tessellation, hell - AMD was the very first to the market (and like 10 years before anyone else if we count first incarnation), the difference comes in benefits vs going overboard (which doesnt really benefit users). Do you see any difference between very good and extreme tessellation in games? My bet is you dont in most cases, if at all.

I find it amusing that as an AMD fan you aren't appalled that in this very thread AMD is telling you that it made more sense to reduce the default filtering quality on 5800 cards because raising the other cards (and the 6800 series) up to the 5800's level would affect more users.....I'm still trying to figure out what Dave was thinking by even making that statement in public. :) I guess the 5800's texture quality was also overboard and unnecessary.

Anyway, you are proving my point. AMD is complaining/talking/whining about overblown tessellation workloads. Besides the irony that they found themselves in this position in the first place I'm saying instead of talking they should be out leveraging their 7 generations of tessellation hardware experience to influence devs in doing it the right way. You know, doing more than just talking.
 
Back
Top