XGPU real shader ops

Jawed said:
Bottom of left column on page 6 for discussion of FP texture fetch overheads. That's for FP32. I don't have the data for the texture fetch overheads of other texture formats.
It seems they're basing their figures on GPUBench and GPUBench is, imho, someway flawed when it tries to test how many instructions are needed to hide a texture fetch, since AFAIK it uses a test that employs only ALU instructions that depends upon one or more texture fetches results, without non-dependent ALU instructions at all.
So it's true NV40/G70 can't hide that kind of latency but it also true that in most cases, in real world shaders, one has a quite different ALU instructions usage ;)
 
Jawed said:
I was under the impression that GPUBench uses independent MADs - the texture fetches are simply there to consume bandwidth and the benchmark is used to determine at what point the GPU switches from bandwidth-limited to compute-limited.

http://graphics.stanford.edu/projects/gpubench/test_fetchcosts.html

Jawed
I think such a test would not make any sense, since it's not interesting to know how bandwith I can consome if I'm not going to use that bandwith, moreover a decent compiler would rip off every texture fetch instruction that generates an unused return value.
 
Yes, you're right the compiler would optimise-out junk texture-fetches. Should have thought of that.

Reading the source for fetchcosts it seems that it issues MADs to use the texture-fetch results (but only to use up spare texture-fetches if dependent texturing is also being used). Successive MADs are r0,r0,r0,r0.

I think these are the instructions issued when multi-texturing is off and no dependent-texturing is being used, for the 3-texture fetch case with no extra instructions.

TEX r0, fragment.texcoord[0], texture[0], RECT
TEX r1, fragment.texcoord[0], texture[0], RECT
TEX r2, fragment.texcoord[0], texture[0], RECT
MAD r0,r0,r1,r2
MAD r0,r0,r2,r0
MAD result.color,r0,r0,r0

But that's from a dry-run in my head
icon_confused.gif
as I don't have a C compiler.

That code seems to prevent dual-issues in NVidia.

Jawed
 
Ha! of course I can just run the benchmark (well, my Radeon SDR 32MB DX7 card can't) to produce the code:

Code:
fetchcosts -v -n -m 0 -x 30 -f 3 -a single -i 2 -t -d 0
 
PARAM C0=program.env[0];
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0, fragment.texcoord[0], texture[0], RECT;
TEX R1, fragment.texcoord[0], texture[0], RECT;
TEX R2, fragment.texcoord[0], texture[0], RECT;
MAD R0, R0, R1, R2;
MAD result.color, R0, R0, R0;
END

Jawed
 
Back
Top