AMD: R9xx Speculation

True, but still they are not entire innocent either. With Cayman launch, we can CLEARLY see that Cypress was designed for time to market, with a DX11 and specially tesselation implementation which was not up to par. Yes, Cypress had a smaller die, but at what cost? Its kinda hypocrit to promote DX11 and tesselation (specially tesselation was heavely promoted by AMD like the second coming, to shush months after, when Fermi launched) with a solution that seemed like not being ideal for it. In short, IMO, AMD is reaping the tempest they sow, no more no less. Kinda ironic also how we all made fun of nVIDIA supposed lack of "Plan B".
Cypress was ideal for a tessellation workload that AMD predicted would be in use.
Fermi, and the preponderance of tessellation benchmarks and one or so games that for various reasons go beyond that, did not turn out to match AMD's vision.
It happens.

Ultimately, going with the small die strategy might not have really been the advantage some people like to tout.
It has manufacturing advantages. Most everything else is marketing.
What is possibly more important is that AMD does not have the same ability to leverage the professional market for high ASP products as Nvidia.
I would be curious what money AMD would make it if made a chip the size of Fermi, since it would most likely not have a revenue profile significantly different from what it has right now.
 
ATI's shader architecture is as efficient as ever at shading pixels from cars, the environment, and post-processing. Unfortunately, it's much slower at resolution-independent stuff, like drawing reflection/shadow maps and cranking out triangles for the main scene, which points to far slower geometry processing.
Nice analysis, good of you to actually lay it out clearly.

I'm utterly bemused by Cayman.

Not only is ms per frame worse in HD6970 than HD6870, but ns per pixel is only 36% faster in HD6970 while theoretically it is 68% faster (assuming VLIW-5 and VLIW-4 are equivalent, knock off 10% if you like).

It just seems utterly broken, struggling to 18% faster than HD6870 at 2560x1600 in this game. It's 53% bigger, what the hell is going on in there?
 
However, the fact that Cayman and Barts have the same per-frame time tells you that it could easily be a driver issue. Your graph from the DXSDK sample shows Cayman performing slower than Cypress at high tesselation levels. My numbers show parity between Cayman and Barts for Dirt 2.

It could be drivers or it could be a hardware bug. Either way, I don't think ATI is hitting architectural limitations.

The performance in AMDs SubD11 SDK-Sample has been reported to AMD beforehand and they said, they're working on a new driver in order to improve certain tessellation workloads. In some, however, Caymans tessellation hardware already works as advertised, seeing a healthy increase over Cypress'. So I agree that with regard to tessellation, coming drivers might drastically improve certain workloads. But I am not so certain that a substantial increase in overall performance through improved drivers is very likely.
 
Just to be clear the tesselation in the Dirt2 benchmark is just few seconds from the whole run. So majority of those frames run without to much geometry. Certainly a easy task for 800 Mtriangles/s and 1600 Mtriangles/s.
How do you know? Can you run it in wireframe?

The per-frame time is 5.0 ms for the 580, 7.6 ms for the 460, and 13.7 for the 450. It's ~8ms for the 5870, 6870, and 6970. In a different test of the same game, the per-frame time increases from 9ms to 13ms with DX11 enabled. All evidence is pointing to geometry.
 
So I agree that with regard to tessellation, coming drivers might drastically improve certain workloads. But I am not so certain that a substantial increase in overall performance through improved drivers is very likely.
I think improvement in some tessellation workloads is a large part of what ATI needs (HAWX, Dirt2, LP2 among others). The other games dragging down averages are Battleforge and StarCraft2, where ATI takes a huge AA hit.
 
How do you know? Can you run it in wireframe?
Dirt2 is using tessellation only on water, crowds and cloth. If none of them are in view you can rightfully assume there's no tessellation going on. The amount of tessellation obviously depends on your chosen benchmark run.

Our test you linked was the first one, where I think more crowd and cloth was being visible, before we switched to the Malaysia track, where's less tessellation used - percentage-wise. Here's a youtube video from our benchmark: http://www.youtube.com/watch?v=OZ86KHKkA58

I think improvement in some tessellation workloads is a large part of what ATI needs (HAWX, Dirt2, LP2 among others). The other games dragging down averages are Battleforge and StarCraft2, where ATI takes a huge AA hit.
Taking into account the unigine heaven 2.1 results we have published, it seems there's no simple answer as to what kind of workloads will profit from driver improvements. I've also benchmarked (but not published) TessMark, which purely derives it's scores from tessellation prowess. There, Cayman did not show improvements over Cypress in all tessellation levels (amplification factors of 8, 16, 32 and 64).
 
Last edited by a moderator:
I've said that already in another context, I do not think 2gbit vs 1gbit ram chips makes more than a tiny difference in power consumption. In fact I would expect a 320bit interface using 10 1gbit chips to draw more power than a 256bit interface using 8 2gbit chips, if it were running at the same clock/voltage (which it is not if you compare the GF110 cards to Cayman cards, but you get the idea).

Yup, Tech Report shows an idle power difference between 5870 1 gig and 5870 2 gig of 12 watts. And at load (they use L4D2) of 39 watts. Definitely not a tiny difference.

Speaks well of the idle power improvements they've made at idle. Where 6970 is 11 watt less than 5870 2 gig. But at load it's only 11 watts less.

I wonder about one thing. Why on Earth HD 4890 was able to hit thi 200 USD price tag at the time it was the highest performing single GPU card from AMD, but neither 5870, nor 6970 can. :oops: And they are not even approaching such price levels. :oops:

AMD and Nvidia being in a ruinous price war at the time triggered historically low prices for their respective top end chips. Both companies were hurt a lot by that price war, and it's extremely unlikely that either one will go to those levels again.

It's not a question of whether or not 5870 or 6970 can go to those price levels, they can. But more a matter of AMD not wanting to go down to those levels. Nvidia could obviously force them to but it's obvious they don't want to go down to those levels either.

There may be some price jockeying in the coming months as demand starts to slow, but I doubt 6970 will be hitting the 200 or lower price point unless Nvidia wants to take a large hit to their margins and they've already indicated they aren't willing to do that this round, or at least this quarter.

Regards,
SB
 
Taking into account the unigine heaven 2.1 results we have published, it seems there's no simple answer as to what kind of workloads will profit from driver improvements. I've also benchmarked (but not published) TessMark, which purely derives it's scores from tessellation prowess. There, Cayman did not show improvements over Cypress in all tessellation levels (amplification factors of 8, 16, 32 and 64).

This is really strange, as there are 2x tessellation units plus possibly also the Barts tessellation improvements and I would expect that synthetic tests should show this better, not the other way around. Is the load balancing controlled by the driver and it is in some cases broken, or?
 
Consider me as puzzled as the next guy about this question. At first I thought my system might be borked somehow, but AMD did not point me in that direction, so I guess they could at least reproduce the general trend.
 
Nice analysis, good of you to actually lay it out clearly.

I'm utterly bemused by Cayman.

Not only is ms per frame worse in HD6970 than HD6870, but ns per pixel is only 36% faster in HD6970 while theoretically it is 68% faster (assuming VLIW-5 and VLIW-4 are equivalent, knock off 10% if you like).

It just seems utterly broken, struggling to 18% faster than HD6870 at 2560x1600 in this game. It's 53% bigger, what the hell is going on in there?
Thanks.

In the per pixel load, BW and ROPs will play a role. I've added 5770 OC results to the spreadsheet and get 4.35ns/pix, so the the 6870 is 56% faster per pixel despite only 40% more SIMDs. If you compare the 6970 with the 5770 OC, it has 1.96x ROP, 2.3x SIMD rate and BW, 1.88x FLOPs, and uses all this to render at 2.13x the speed.

Everything seems in order on the per-pixel side, and beating the GTX 580 there is no mean feat. The per-frame time really is the killer here. We're waiting from Dave to see if the command processor is what he meant by front end, but that would be a real shame. To be clear, I expect that some of the per frame time is raster related, as shadow/reflection maps aren't 100% geometry limited, but if much of the rest is command processor limited then that would be quite baffling.
 
The domain shader is basically a vertex shader with a compact input stream. If you can buffer enough vertices to run a VS, then you can buffer more than enough barycentric pairs to run a DS.
I agree with that.

I think with the GF110 Nvidia have surprised ATi and because Cayman was already so far into the design and even manufacturing stage when performance indications of GF110 started to leak they had to pump up the clocks really high to stay competitive and they still fell short of GTX580.
Cayman was designed before GTX480 launch, let alone GTX580.

I wonder about one thing. Why on Earth HD 4890 was able to hit thi 200 USD price tag at the time it was the highest performing single GPU card from AMD, but neither 5870, nor 6970 can. :oops: And they are not even approaching such price levels. :oops:
40nm is expensive. 55nm was cheap.
 
So the question is can the odd performance issues seen between 6970 and the 5870 be improved upon with better drivers?
 
Just plugged my card and in Endless City there is a huge improvement compared to Cypress for sure.
[1920x1200 - Fraps average of 20FPS for stock HD6970 Phenom X6 3.45/4.0GHz compared to 13FPS for 1GHz HD5870 on I7 4.2GHz]

Double checked TessMark but as mentioned before it scores are same as on Cypress for some reason. Maybe OpenGL4 is not yet aware of new Cayman capabilities?

If any of you have a quick bench requests feel free to PM me. I might not have time to do a lot today but weekend is already reserved for that :p.
 
Last edited by a moderator:
Just like both GTX580 and HD6970 are irrelevant for 95% of gamers, because majority of them won't buy anything better/more expensive than GTX460 or HD6800.

But anybody, who buys >$500 GPU every year (including CF/SLI users) won't likely use it with $150-200 LCD panel. Not mentioning, that this advantage is quite well scaled by CrossFire, where HD6970 matches or slightly beats SLIed GTX580, while being $300 cheaper.

Not true ;)
 
Double checked TessMark but as mentioned before it scores are same as on Cypress for some reason. Maybe OpenGL4 is not yet aware of new Cayman capabilities?
OpenGL is behind DX for Cayman's improvements; there's only benchmarks for OGL tessellation at the moment. There will be improvements for OpenGL in the new year.
 
Does that mean, the dual front end can only be capitalized upon if the driver explicitly controls the command streams accordingly instead of the hardware taking care of itself in this regard?
 
Does that mean, the dual front end can only be capitalized upon if the driver explicitly controls the command streams accordingly instead of the hardware taking care of itself in this regard?
I believe the fixed function hardware will take care of itself, but I don't know that Tessmark is setup bound in the first place. That could be tested with an OpenGL vertex or triangle test (there should be one somewhere, I'm sure).

BTW - the driver has been updated and should bring improvements in a few of the DX SDK tests. Download it again and give it a go.
 
....

BTW - the driver has been updated and should bring improvements in a few of the DX SDK tests. Download it again and give it a go.

You mean HotFix driver available for everyone on AMD's website or Press driver only? And to think I've downloaded HotFix driver 1h ago :devilish:.

PS. Love the PowerTune already, whoever had that idea at AMD deserves a cake!
 
I believe the fixed function hardware will take care of itself, but I don't know that Tessmark is setup bound in the first place. That could be tested with an OpenGL vertex or triangle test (there should be one somewhere, I'm sure).

BTW - the driver has been updated and should bring improvements in a few of the DX SDK tests. Download it again and give it a go.

Thanks! Will try first thing tomorrow morning. :)

edit: I think it's the official hotfix being updated. The EMEA-press FTP stil shows the same files as a few days ago.
 
Last edited by a moderator:
Could we assume that PowerTune on Antilles, will enable one GPU to use more than 250W in scenarios of high inter frame dependencies?
 
Back
Top