NVIDIAs claims on R300 are wrong?

Wait a minute...
VS_2_0 supports 256 static instructions and not 1024. This means you are able to write shaders up to 256 instructions long. You can execute (much) more instructions if you use loops.
It also does not support max 256 constants but it supports min 256 constants. This means you are granted 256 constants but hardware is free to expose more.
You could check that D3D 9 API spec file that leaked a wile ago ;).
 
Well, i think it is wrong... it should be 160, although some clarifications would be welcome!

Well, in fact, NVIDIA seems to be right here after all. The answer could be right in Waveys great review at page 3:

For instance, as already mentioned DX9 supports up to 16 texture inputs per pass, 32 Texture Address Instructions can be used and 64 vector + 64 scalar colour instructions are available.

Here it is where the 160 instruction number comes from (32+64+64). But AFAIK, DirectX 9.0 does not allow you to make a difference between scalar and vector instructions, hence the max number of instructions you indicate in the caps must be independent of this. Therefore, ATI can only expose 96 instructions in DirectX 9.0.

It might be different in OpenGL, as ATI can define its extension themself, so the 160 instruction claim might be legit for OpenGL, but not for D3D.

What do you think?
 
DirectX has always made the distinction between scalar and vector. It's just that they must be issued "in parallel" to get the doubling effect.

For example, DirectX8 allowed 8 color ops, however, each color op could be broken into two instructions on the scalar and vector pipeline, e.g.

dp3 r0.rgb, r1.rgb, r2.rgb
add r0.a, r1.a, r2.a

these two can execute in parallel, so technically you can do 16 different color ops I DirectX if you can arrange them to operate in parallel. Also, those two instructions will execute in the same cycle.


However, it's kind of trickly to claim that this means you have pixel shaders or length 16, or ATI's pixel shaders of length "160" since there are restrictions on the types of operations you can use to reach the 160 figure. For example, if you were building a compiler to generate pixel shaders from arbitrary high level code, you wouldn't be able to max-out the instruction slots to 160 every time, since your translation might not have alot of scalar ops at all. Perhaps the R300 is more flexible than this in the pixel shader.
 
You need to co-issue instructions so:
dp3 r0.rgb, r1.rgb, r2.rgb
+add r0.a, r1.a, r2.a
But even this way... You can't co-issue anymore in PS_2_0...
 
Yes, the number indicated in the caps must be the non-parallel instruction count, as the coder can't know what the real instruction count limits (including parallel vec/scalar ops) of the hardware is. And in fact, co-issuing is gone with PS2.0, so this feature would be limited to PS 1.4. Does PS 1.4 allow you the use of this amout of instructions?

How about OpenGL? Is co-issuing possible there?
 
I think there will still be co-issue, you just don't have explicit control over it anymore.

let's say a pixel pipe has 2 vector SIMD fmad units, 1 divider, and 1 scalar ALU. Theoretically, it could issue 2 dp4s, 1 RCP, and 1 scalar op in a single cycle.

It would then be up to the driver to take a given shader, and to rearrange the instructions so that no stalls occur (e.g. don't put 3 dp4s in a row, always interleave a scalar op, etc)
 
Back
Top