XGPU real shader ops

nAo · Aug 27, 2005

Jawed said:
Bottom of left column on page 6 for discussion of FP texture fetch overheads. That's for FP32. I don't have the data for the texture fetch overheads of other texture formats.

It seems they're basing their figures on GPUBench and GPUBench is, imho, someway flawed when it tries to test how many instructions are needed to hide a texture fetch, since AFAIK it uses a test that employs only ALU instructions that depends upon one or more texture fetches results, without non-dependent ALU instructions at all.
So it's true NV40/G70 can't hide that kind of latency but it also true that in most cases, in real world shaders, one has a quite different ALU instructions usage

Jawed · Aug 27, 2005

I was under the impression that GPUBench uses independent MADs - the texture fetches are simply there to consume bandwidth and the benchmark is used to determine at what point the GPU switches from bandwidth-limited to compute-limited.

http://graphics.stanford.edu/projects/gpubench/test_fetchcosts.html

Jawed

nAo · Aug 27, 2005

Jawed said:
I was under the impression that GPUBench uses independent MADs - the texture fetches are simply there to consume bandwidth and the benchmark is used to determine at what point the GPU switches from bandwidth-limited to compute-limited.

http://graphics.stanford.edu/projects/gpubench/test_fetchcosts.html

Jawed

I think such a test would not make any sense, since it's not interesting to know how bandwith I can consome if I'm not going to use that bandwith, moreover a decent compiler would rip off every texture fetch instruction that generates an unused return value.

Jawed · Aug 27, 2005

Yes, you're right the compiler would optimise-out junk texture-fetches. Should have thought of that.

Reading the source for fetchcosts it seems that it issues MADs to use the texture-fetch results (but only to use up spare texture-fetches if dependent texturing is also being used). Successive MADs are r0,r0,r0,r0.

I think these are the instructions issued when multi-texturing is off and no dependent-texturing is being used, for the 3-texture fetch case with no extra instructions.

TEX r0, fragment.texcoord[0], texture[0], RECT
TEX r1, fragment.texcoord[0], texture[0], RECT
TEX r2, fragment.texcoord[0], texture[0], RECT
MAD r0,r0,r1,r2
MAD r0,r0,r2,r0
MAD result.color,r0,r0,r0

But that's from a dry-run in my head

as I don't have a C compiler.

That code seems to prevent dual-issues in NVidia.

Jawed

Jawed · Aug 28, 2005

Ha! of course I can just run the benchmark (well, my Radeon SDR 32MB DX7 card can't) to produce the code:

Code:

fetchcosts -v -n -m 0 -x 30 -f 3 -a single -i 2 -t -d 0
 
PARAM C0=program.env[0];
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0, fragment.texcoord[0], texture[0], RECT;
TEX R1, fragment.texcoord[0], texture[0], RECT;
TEX R2, fragment.texcoord[0], texture[0], RECT;
MAD R0, R0, R1, R2;
MAD result.color, R0, R0, R0;
END

Jawed

XGPU real shader ops

nAo

Nutella Nutellae

Jawed

nAo

Nutella Nutellae

Jawed

Jawed

Similar threads