AMD Radeon R9 Fury X Reviews

@Jawed said:

Utilization figures in some of the tests like the Techreport's show a sizeable gap between theoretical and realized bandwidth, and AMD's color compression appears to be about as helpful as last-gen Nvidia compression.
There's no evidence for that conclusion.

Maybe in an ideal world it wouldn't be limited in certain scenarios, but at least some measurements show that utilization is far enough from peak to be a factor.
A mem copy type test that's correctly written could provide some evidence.
 
@fellix said:

It's hard to judge about the frame-buffer compression efficiency, if the GPU isn't bottlenecked by the available bandwidth, at least in fill-rates.
 
@3dilettante said:

There's no evidence for that conclusion.
The Techreport figures for the 780 Ti compressed and uncompressed show a gain of ~13%. FuryX shows a gain of ~16%.
Fury X switches from a ~34% to ~24% shortfall.
The 780 Ti switches from a ~41% to 33% shortfall.

Maxwell shows 50-60% gains with a compressed figure that exceeds theoretical peak.
I guess I could amend my statement to say that there is limited evidence that AMD's compression appears to be incrementally better than Nvidia's first-gen compression.

A mem copy type test that's correctly written could provide some evidence.
A mem copy using the ROPs and maybe with a way to turn off their compression hardware on demand would be illuminating.
 
@RobertR1 said:

Ended up getting an EVGA 980ti SC+ ACX2.0+

For me to have gone with FuryX, it needed to beat the 980Ti handily to offset the driver advantage and developer relations that nvidia has established. It did neither so it's virtually impossible to justify the price for it. Too bad. It's been since x1900xt since I've had an Ati card.
 
@Jawed said:

What don't you like about the texture bandwidth test exactly?
Can't tell if it's working as intended. It's not even clear what's intended.

The one site that's using it thought it was testing one thing. Now with the Fury X results they're left scratching their heads:

Also, our understanding from past reviews was that the R9 290X was limited by ROP throughput in this test. Somehow, the Fury X speeds past the 290X despite having the same ROP count on paper. Hmm. Perhaps we were wrong about what limited the 290X. If so, then 290X may have been bandwidth limited, after all—and Hawaii apparently has no texture compression of note. The question then becomes whether the Fury X is also bandwidth limited in this test, or if its performance is limited by the render back-end. Whatever the case, the Fury X only achieves 387 GB/s of throughput here, well below the 512 GB/s theoretical max of its HBM-infused memory subsystem
I'm puzzled why Scott's been left to flounder, to be honest.

In the back of my mind is a test that was written years ago as a bandwidth test in OpenCL. The test was faulty. But the blame was put on AMD, for years as it happens.

The results on Tonga for this test aren't available as far as I'm aware. They might help us understand.
 
@Rys said:

Can't tell if it's working as intended. It's not even clear what's intended.
I explained to you earlier in the thread what it's doing, and that it's working as intended as a bandwidth test. By that I mean it unequivocally counts the bandwidth ingress and egress correctly. Whether it achieves peak is something else, but it's working as intended for the rendering it does.
 
@Jawed said:

Whether it achieves peak is something else
Precisely.

No-one knows, but most people reading the chart assume that peak is being measured. The test is so ill defined that Scott doesn't know what it it's doing and several people round here continue to interpret it as doing something quite specific, for which there is no evidence.
 
@Rys said:

It's highly likely that peak, or close to it, is being measured, for what it's worth. The test reliably measures well on a (small, but important) set of platforms where I can actually verify it's doing so. I'm sorry you feel the test is ill defined even after I've defined it for you.
 
@3dilettante said:

Can't tell if it's working as intended. It's not even clear what's intended.

The one site that's using it thought it was testing one thing. Now with the Fury X results they're left scratching their heads:
There's at least two things that come to mind. One is that they seemed to think Hawaii had framebuffer compression (three things if they didn't mean framebuffer compression).
The other is that we'd have to page Andrew Lauritzen to this thread, since the earlier review of the Titan X credited him in its comments with the ROP limitation diagnosis.

Whether that was some kind Hawaii-specific ROP bottleneck or a different issue is hanging in the air.
 
@Jawed said:

It's highly likely that peak, or close to it, is being measured, for what it's worth. The test reliably measures well on a (small, but important) set of platforms where I can actually verify it's doing so. I'm sorry you feel the test is ill defined even after I've defined it for you.
Well I suppose when someone verifies it on AMD hardware we'll have progress.
 
@Ethatron said:

It's highly likely that peak, or close to it, is being measured, for what it's worth. The test reliably measures well on a (small, but important) set of platforms where I can actually verify it's doing so. I'm sorry you feel the test is ill defined even after I've defined it for you.

The precise thing it seems to be measuring is the all-zeros delta-block mode, which is equivalent to testing zip on a file with all zeros. Zip is really bad at it, and one might think it sucks badly, definitely, absolutely; especially if all-zeros and all-random are the only two cases investigated, as I understand it from the B3D suite. But zip is definitely not bad, it takes a while to compete with it in the not-very-complex-algorithm league.
It would be very helpfull to test with various all-randommessages.txtscript.rbthread-restore-thread.txtthread-restore.txtvariance [variance being 0, 1, 2, 3, 7, 15, 31, 63, etc.]. That way you can actually catch which block-modes the thing supports, and you could detect if Maxwell simply has a better best-case than Fury and nothing else, or when it is better.
ASTC can anchor a block to the neighbour, it would technically be possible (with a custom block compression definition, not ASTC) to store an entire black frame-buffer in one texel + the buffer's mode-field.
 
@Jawed said:

Courtesy of:

http://stackoverflow.com/questions/5149544/can-i-generate-a-random-number-inside-a-pixel-shader

and:

https://www.shadertoy.com/new

I wrote a randomcolour pixel shader:

Code:
// http://stackoverflow.com/questions/5149544/can-i-generate-a-random-number-inside-a-pixel-shader

const vec2 k = vec2(23.1406926327792690,2.6651441426902251);

float rnd9( vec2 uv ) { return fract( cos( mod( 123456780., 1024. * dot(uv,k) ) ) ); }

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec2 uv = fragCoord.xy / iResolution.xy;
    float i = rnd9(uv);
    float j = rnd9(uv * i);
    float k = rnd9(uv * j);
    fragColor = vec4(i,j,k,1.0);
}

Why can't we use something like that to test fillrate and compression?

Saving that as a PNG for a 1920x1080 image produces a file that's 5954769 bytes. That's 2.87 bytes per pixel, which includes PNG file format overhead (a few thousand bytes as far as I can tell). So, there is some compressibility in this image. Enough to defeat GPU delta colour compression?
 
@AlexV said:

Courtesy of:

http://stackoverflow.com/questions/5149544/can-i-generate-a-random-number-inside-a-pixel-shader

and:

https://www.shadertoy.com/new

I wrote a randomcolour pixel shader:

Code:
// http://stackoverflow.com/questions/5149544/can-i-generate-a-random-number-inside-a-pixel-shader

const vec2 k = vec2(23.1406926327792690,2.6651441426902251);

float rnd9( vec2 uv ) { return fract( cos( mod( 123456780., 1024. * dot(uv,k) ) ) ); }

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec2 uv = fragCoord.xy / iResolution.xy;
    float i = rnd9(uv);
    float j = rnd9(uv * i);
    float k = rnd9(uv * j);
    fragColor = vec4(i,j,k,1.0);
}

Why can't we use something like that to test fillrate and compression?

Saving that as a PNG for a 1920x1080 image produces a file that's 5954769 bytes. That's 2.87 bytes per pixel, which includes PNG file format overhead (a few thousand bytes as far as I can tell). So, there is some compressibility in this image. Enough to defeat GPU delta colour compression?
As far as pRNGs go, one can do much much better for the purpose of this test (i.e. come up with something that is not so expensive because reasons / SO). For example, Park-Miller should be relatively fine, and uber cheap. Beyond that I agree with the idea.
 
@Ethatron said:

I don't think you need any generator at all, you just need 64 values with maximum variance between neightbours. Or some other constant number if the assumption of 8x8 is wrong.
 
Back
Top