AMD: R9xx Speculation

Cayman seems to have still big room for improvements. Although AMD improved the front-end, it still falls a little behind GF110 while offering 70% higher INT8 fill-rate and MADD ALU peak.
 
Now that's actually interesting.

HD6970 v HD5870:
  • Fluid 3D texture - 196 v 42 = 467%
  • Fluid 2D texture array - 116 - 44 = 264%
  • Mandelbrot Vector - 58 - 61 = 95%
  • Mandelbrot Scalar - 29 - 26 = 112%
  • QJulia Ray Tracing - 56 - 65 = 86%
The first two tests might be texture cache bound - in which case there's something radical going on.

The other three tests are pure math but with a lot of branching. I guess we're seeing a combination of short clauses of instructions and a high proportion of transcendental instructions. Julia set should have more intense branching.

I haven't looked at the compilation for this application but I suspect these fractals result in clauses that are short, which would tend to increase the demerit of VLIW-4 for transcendentals, e.g. 3 instruction clause on Cypress versus a 4 instruction clause on Cayman is a 33% slowdown per work item.

Rightmark Mineral and Fire shaders would be another good test for transcendentals, but without branching.

Can anyone get the extreme tests to run on NVidia? That Geeks3D page shows NVidia hardware failing for some reason.
 
Strange. It's massively faster in the fluid test, while it's the same speed in the branching dominated mandelbrot and julia raymarch, where it should perform better with the additional simds. IIRC those codes don't even use that many transcendentals.
 
130253tz0ffiyt7tfec8ee.jpg
Interesting that Cayman remains atleast 2x or better than Cypress at all levels where as Barts drops off to Cypress levels after a factor of 10.
 
I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)
 
For better understanding of ComputeMark scores here is my HD5770 @880/1300 (can't get memory higher to match Cayman), same CPU but clocked a tad slower (CPU doesn't influence this score anyway).



Uploaded with ImageShack.us
computemark8801300.png
 
For better understanding of ComputeMark scores here is my HD5770 @880/1300 (can't get memory higher to match Cayman), same CPU but clocked a tad slower (CPU doesn't influence this score anyway).



Uploaded with ImageShack.us
computemark8801300.png

So, compared to the scores over aat geeks3d.com this would imply that compute mark is basically NOT compute bound? I mean, your 5770 gets basically the same scores as a 5870 with twice as much compute horsepower.
 
I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)
And the fact that it's a bit more than 2x as fast from factor 16 upwards is probably mostly due to the slightly higher core clock.
 
So, compared to the scores over aat geeks3d.com this would imply that compute mark is basically NOT compute bound? I mean, your 5770 gets basically the same scores as a 5870 with twice as much compute horsepower.

Comparing the 5770 and 5870

computemark_hd5870_1920x1080_extreme.jpg


It seems the 5770 has a higher Fluid3D:Tex - 66 compared to the 5870's 42

Hence the scores look similar.
 
Now that's actually interesting.

HD6970 v HD5870:
  • Fluid 3D texture - 196 v 42 = 467%
  • Fluid 2D texture array - 116 - 44 = 264%
  • Mandelbrot Vector - 58 - 61 = 95%
  • Mandelbrot Scalar - 29 - 26 = 112%
  • QJulia Ray Tracing - 56 - 65 = 86%
The first two tests might be texture cache bound - in which case there's something radical going on.

The other three tests are pure math but with a lot of branching. I guess we're seeing a combination of short clauses of instructions and a high proportion of transcendental instructions. Julia set should have more intense branching.

I haven't looked at the compilation for this application but I suspect these fractals result in clauses that are short, which would tend to increase the demerit of VLIW-4 for transcendentals, e.g. 3 instruction clause on Cypress versus a 4 instruction clause on Cayman is a 33% slowdown per work item.

Rightmark Mineral and Fire shaders would be another good test for transcendentals, but without branching.

Can anyone get the extreme tests to run on NVidia? That Geeks3D page shows NVidia hardware failing for some reason.

Very interesting
 
Comparing the 5770 and 5870

http://www.ozone3d.net/public/jegx/201006/computemark_hd5870_1920x1080_extreme.jpg

It seems the 5770 has a higher Fluid3D:Tex - 66 compared to the 5870's 42

Hence the scores look similar.

So, that makes either the score provided for HD 5870 or the benchmark itself even more un-bound by pure compute. Or do I miss the obvious explanation as to why a HD 5770 should outperform a HD 5870 by more than 50%?

edit:I am just running this ComputeMark 2 on a GTX 580. Seems like the drivers have been holding Fermi back quite a bit. Plus, it doesn't crash on Extreme any more.

 
Last edited by a moderator:
I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)
For me the most interesting thing about this graph is actually the lower-than-2x improvement for tesselation levels 3-5. Must be some overhead somewhere?
 
Cayman seems to have still big room for improvements. Although AMD improved the front-end, it still falls a little behind GF110 while offering 70% higher INT8 fill-rate and MADD ALU peak.
The front-end is still quite far behind GF110 as far as I can tell. More like GF104, at least as far as tesselation goes. That's not necessarily a bad thing, though.
 
Back
Top