AMD: R9xx Speculation

eastmen · Dec 12, 2010

lol if that slide is right its making me want a 6970.. But i can wait

AnarchX · Dec 12, 2010

Cayman seems to have still big room for improvements. Although AMD improved the front-end, it still falls a little behind GF110 while offering 70% higher INT8 fill-rate and MADD ALU peak.

Kef · Dec 12, 2010

fellix said:
And some DC5.0 benching - ComputeMark. Quite slow...

Hmm, it's 91% _faster_ than HD5870. I wouldn't call it slow.

http://www.geeks3d.com/20100606/gpu-computing-directcompute-computemark-2-1-gtx-480-vs-hd-5870/

fellix · Dec 12, 2010

Kef said:
Hmm, it's 91% _faster_ than HD5870. I wouldn't call it slow.

http://www.geeks3d.com/20100606/gpu-computing-directcompute-computemark-2-1-gtx-480-vs-hd-5870/

Correct, I haven't noticed the resolution setting.

Jawed · Dec 12, 2010

Now that's actually interesting.

HD6970 v HD5870:

Fluid 3D texture - 196 v 42 = 467%
Fluid 2D texture array - 116 - 44 = 264%
Mandelbrot Vector - 58 - 61 = 95%
Mandelbrot Scalar - 29 - 26 = 112%
QJulia Ray Tracing - 56 - 65 = 86%

The first two tests might be texture cache bound - in which case there's something radical going on.

The other three tests are pure math but with a lot of branching. I guess we're seeing a combination of short clauses of instructions and a high proportion of transcendental instructions. Julia set should have more intense branching.

I haven't looked at the compilation for this application but I suspect these fractals result in clauses that are short, which would tend to increase the demerit of VLIW-4 for transcendentals, e.g. 3 instruction clause on Cypress versus a 4 instruction clause on Cayman is a 33% slowdown per work item.

Rightmark Mineral and Fire shaders would be another good test for transcendentals, but without branching.

Can anyone get the extreme tests to run on NVidia? That Geeks3D page shows NVidia hardware failing for some reason.

Psycho · Dec 12, 2010

Strange. It's massively faster in the fluid test, while it's the same speed in the branching dominated mandelbrot and julia raymarch, where it should perform better with the additional simds. IIRC those codes don't even use that many transcendentals.

Arty · Dec 12, 2010

ZerazaX said:

Interesting that Cayman remains atleast 2x or better than Cypress at all levels where as Barts drops off to Cypress levels after a factor of 10.

no-X · Dec 12, 2010

I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)

Lightman · Dec 12, 2010

For better understanding of ComputeMark scores here is my HD5770 @880/1300 (can't get memory higher to match Cayman), same CPU but clocked a tad slower (CPU doesn't influence this score anyway).

Uploaded with ImageShack.us

fellix · Dec 12, 2010

Jawed said:
Can anyone get the extreme tests to run on NVidia? That Geeks3D page shows NVidia hardware failing for some reason.

GF100 has some performance woes with 3D textures under DC.

CarstenS · Dec 12, 2010

Lightman said:
For better understanding of ComputeMark scores here is my HD5770 @880/1300 (can't get memory higher to match Cayman), same CPU but clocked a tad slower (CPU doesn't influence this score anyway).

Uploaded with ImageShack.us

So, compared to the scores over aat geeks3d.com this would imply that compute mark is basically NOT compute bound? I mean, your 5770 gets basically the same scores as a 5870 with twice as much compute horsepower.

TKK · Dec 12, 2010

no-X said:
I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)

And the fact that it's a bit more than 2x as fast from factor 16 upwards is probably mostly due to the slightly higher core clock.

Unknown Soldier · Dec 12, 2010

CarstenS said:
So, compared to the scores over aat geeks3d.com this would imply that compute mark is basically NOT compute bound? I mean, your 5770 gets basically the same scores as a 5870 with twice as much compute horsepower.

Comparing the 5770 and 5870

It seems the 5770 has a higher Fluid3D:Tex - 66 compared to the 5870's 42

Hence the scores look similar.

Unknown Soldier · Dec 12, 2010

Jawed said:
Now that's actually interesting.

HD6970 v HD5870:

Fluid 3D texture - 196 v 42 = 467%

Fluid 2D texture array - 116 - 44 = 264%

Mandelbrot Vector - 58 - 61 = 95%

Mandelbrot Scalar - 29 - 26 = 112%

QJulia Ray Tracing - 56 - 65 = 86%

The first two tests might be texture cache bound - in which case there's something radical going on.

The other three tests are pure math but with a lot of branching. I guess we're seeing a combination of short clauses of instructions and a high proportion of transcendental instructions. Julia set should have more intense branching.

I haven't looked at the compilation for this application but I suspect these fractals result in clauses that are short, which would tend to increase the demerit of VLIW-4 for transcendentals, e.g. 3 instruction clause on Cypress versus a 4 instruction clause on Cayman is a 33% slowdown per work item.

Rightmark Mineral and Fire shaders would be another good test for transcendentals, but without branching.

Can anyone get the extreme tests to run on NVidia? That Geeks3D page shows NVidia hardware failing for some reason.

Very interesting

Broken Hope · Dec 12, 2010

There was an updated build of GPU-Z

CarstenS · Dec 12, 2010

Unknown Soldier said:
Comparing the 5770 and 5870

http://www.ozone3d.net/public/jegx/201006/computemark_hd5870_1920x1080_extreme.jpg

It seems the 5770 has a higher Fluid3D:Tex - 66 compared to the 5870's 42

Hence the scores look similar.

So, that makes either the score provided for HD 5870 or the benchmark itself even more un-bound by pure compute. Or do I miss the obvious explanation as to why a HD 5770 should outperform a HD 5870 by more than 50%?

edit:I am just running this ComputeMark 2 on a GTX 580. Seems like the drivers have been holding Fermi back quite a bit. Plus, it doesn't crash on Extreme any more.

mczak · Dec 12, 2010

no-X said:
I think this is result of the 2 geometry engines... (= 2x performance over Cypress + some additional improvements here and there, which were implemented already in Barts)

For me the most interesting thing about this graph is actually the lower-than-2x improvement for tesselation levels 3-5. Must be some overhead somewhere?

mczak · Dec 12, 2010

AnarchX said:
Cayman seems to have still big room for improvements. Although AMD improved the front-end, it still falls a little behind GF110 while offering 70% higher INT8 fill-rate and MADD ALU peak.

The front-end is still quite far behind GF110 as far as I can tell. More like GF104, at least as far as tesselation goes. That's not necessarily a bad thing, though.

DavidGraham · Dec 12, 2010

Broken Hope said:
There was an updated build of GPU-Z

Wizzard posted a new build of GPU-Z (0.4.9) yesterday , which supports 6900 .

Sadly that just confirms the number of SPs (1536)

.

simbus82 · Dec 12, 2010

DavidGraham said:
Wizzard posted a new build of GPU-Z (0.4.9) yesterday , which supports 6900 .

Sadly that just confirms the number of SPs (1536) .

GPU-Z is an ARCHIVE!

it can't read nothing from GPU.

AMD: R9xx Speculation

eastmen

AnarchX

Kef

fellix

Jawed

Psycho

Arty

KEPLER

no-X

Lightman

fellix

CarstenS

Moderator

TKK

Unknown Soldier

Unknown Soldier

Broken Hope

CarstenS

Moderator

mczak

mczak

DavidGraham

simbus82

Similar threads