NV30 Integer / FP shader operations

Dave Baumann

Gamerscore Wh...
Moderator
Legend
So, we've established that NV30 has an integer and a floating point path at least. I'm hazy on this point: if a PS1.1/1.3 shader operates on NV30 will it use the (fast) integer path or the slower FP path?
 
DaveBaumann said:
So, we've established that NV30 has an integer and a floating point path at least. I'm hazy on this point: if a PS1.1/1.3 shader operates on NV30 will it use the (fast) integer path or the slower FP path?

As far as I can tell the integer path is more or less the old GF4 register combiners so I would say PS 1.1/1.3 = fast integer. Once we talk PS 1.4 I sure that we are at the FP path.

BTW: Didn't 3dmark 2001SE show this in a fast Pixel Shader (PS 1.1) and slower Advanced Pixel Shader (PS 1.4)?
 
I agree with LeStoffer. There are 2 register cominers per pipe in the NV30. This will yield twice the performance of fp with int ops.
 
DaveBaumann said:
So, we've established that NV30 has an integer and a floating point path at least. I'm hazy on this point: if a PS1.1/1.3 shader operates on NV30 will it use the (fast) integer path or the slower FP path?

The integer format from NV30 (called fx12) is a 12bit fixed point format (sign + 1bit integer + 10bit fraction). This is an extended version of the legacy mode supported by the register combiners (the register combiners used a 9bit S8 representation), so I guess it will be used most of the time in current apps (fixed fragment pipeline, register combiner mode/fragment combiner mode, and upto PS 1.3).

Only in 'modern' cases the floating point formats will be used: nvidia new OpenGL extensions, PS 2.0, ARB_fragment_program.

I'm dubious on whether the fp16 is really x2 speed of the fp32, I think that the Doom3 NV30 codepath uses this fx12 format, and the speed wins Carmack cites are not from using a fp16 vs. fp32. Methinks that the fp16 format is there just for bandwidth saving purposes and to double the number of temporary registers.

For information on NV30 format & operation precision, see Section 3.11.4.1: Fragment Program Storage Precision and Section 3.11.4.2: Fragment Program Operation Precision from NVIDIA OpenGL Extension Specification for CineFX
 
Thanks for the info Tonyo...

(fixed fragment pipeline, register combiner mode/fragment combiner mode, and upto PS 1.3).

Any idea about PS 1.4? Do you think the integer formats are extended to support PS 1.4 calls, or do you think that the NV30 PS 2.0 paths (floating point) 'emulate' PS 1.4?

In short, do you think PS 1.4 utilized integer, or floating point formats?
 
Actually pixel shaders upto PS1.3 contains two kind of instructions: texture addressing instructions, and color instructions.

The texture addressing instructions are executed on the new FP pipeline, while the color instructions are using the integer register combiners.

Doom3 might execute some operation on the FP pipeline while others trough the register combiners since those are effectively operating "paralelly".

PS1.4 is likely using the FP pipeline only, because PS1.4 requires the support of pixel shader register values up to 8.0 and GFFX register combiners support upto 2.0 only. (GF1-4 supported 1.0).
 
Joe DeFuria said:
Any idea about PS 1.4? Do you think the integer formats are extended to support PS 1.4 calls, or do you think that the NV30 PS 2.0 paths (floating point) 'emulate' PS 1.4?

In short, do you think PS 1.4 utilized integer, or floating point formats?

I'm not very familiar with D3D (and I want to keep it that way for religious reasons :devilish: ).

You can see a pixel shader as a program with a first segment of texture lookup instructions (generation of texture coordinates) and a second segment of texture application instructions (mixing texels with color & fog).

With this view in mind, AFAIK PS 1.4 is just a two-phase PS 1.3 (having two phases allows you to do dependent texture reads). Each phase is composed of a texture lookup and a texture application segment and the second phase can use any result from the first phase.

I believe that the texture lookup phase will be done in fp16 or fp32 (the same should happen to the texture lookup phase of PS < 1.4) and the texture application phase will be done in fx12.

The only reason that would force to use fp in the texture application part is if they have to export a numerical 'inter-instruction' range bigger than [-2,2] (although they could do some juggling and use some fx12 fraction bits as integer bits).
AFAIK the numerical range of texture application operations in the pixel shader is a D3D queriable parameter, being [-1,1] the minimum range. I don't think PS 1.4 demands a bigger range.

I don't think NVIDIA has preserved its previous "dial-and-knob-configurable" texture shader hardware (which was in charge of texture lookup in GF3 and GF4), that's why I think they will be doing the texture lookup as an integral part of the fragment program.

EDIT: Corrected NV30 numerical range to [-2,2] after reading Hyp-X post :oops:
 
Yes, PS1.4 is what I'm getting at. Seeing as the integer pipeline is most likely to represent GF3/4 then that isn't going to go up to PS1.4. PS2.0 obviously goes via the FP pipeline and PS1.4 will probably need to be emulated via the PS2.0/FP pipeline for NV30.
 
Looking at the DX8.1 / DX9 docs I can't found the requirement that MaxPixelShader value has to be at least 8 for PS1.4 so it seems I remembered it wrong.
It only states that the texture registers are required to have that range.

So it seems that the GFFX could do the 1st phase in FP16 and the 2nd phase in integer.
Still it might cause compatibility problems with programs that expect the range to be in [-8, 8].
 
Hyp-X said:
PS1.4 is likely using the FP pipeline only, because PS1.4 requires the support of pixel shader register values up to 8.0 and GFFX register combiners support upto 2.0 only. (GF1-4 supported 1.0).

I wasn't sure about the range issue, so I checked MSDN:
Range
The range is the maximum and minimum register data value. The ranges vary based on the type of register. The ranges for some of the registers can be queried from the device caps using GetDeviceCaps.
Code:
Name     Type                 Range                                          Versions
cn       Constant register    -1 to +1 All versions  
rn       Temporary register   - MaxPixelShaderValue to + MaxPixelShaderValue All versions 
tn       Texture register     - MaxPixelShaderValue to + MaxPixelShaderValue 1_1 to 1_3  
tn       Texture register     - MaxTextureRepeat to + MaxTextureRepeat       1_4  
vn       Color register                                                      1_4

Early pixel shader hardware represents data in registers using a fixed-point number. This limits precision to a maximum of approximately eight bits for the fractional part of a number. Keep this in mind when designing a shader.

For pixel shader version 1_1 to 1_3, MaxTextureRepeat must be a minimum of one. For 1_4, MaxTextureRepeat must be a minimum of eight.
http://msdn.microsoft.com/library/d...r1_X/architecture/pixelshaderarchitecture.asp

So the [-8,8] range is only for texture registers, which means that you can still use fx12 for the color calculations (the compiler can derive the operation range from the destination register).
 
So the [-8,8] range is only for texture registers, which means that you can still use fx12 for the color calculations (the compiler can derive the operation range from the destination register).

Cool.

So then it appears that PS 1.4 could be done partially (color calcs) via fx12. I guess the next questions are:

1) Whether the NV30 architecture is flexible enough to do it that way,. (Or if it's using floating point for texture access must it use floating point for color calcs, due to hardware flexibility limitation.)

2) Irrespective if NV30 is that flexible, would doing the texture access via FP pipeline introduce a performance bottleneck? (Performance wise for NV30, does it matter if color calcs are integer or floating point, since the texture access is already floating point?)
 
In DX8 and DX9 PS1.0-1.4 modes of the shadermark program the results that brent posted showed the FX doing much worse in the tests using PS1.4 (lower than a 8500/9000 infact).

With the FXs current slowness on its FP shown by the PS2.0 against the 9700s, i would think that its also doing 1.4 on or partially on fp.
 
Joe DeFuria said:
1) Whether the NV30 architecture is flexible enough to do it that way,. (Or if it's using floating point for texture access must it use floating point for color calcs, due to hardware flexibility limitation.)

Reading the OGL extension spec suggests that it can be done.
What the current D3D drivers do is another question.

2) Irrespective if NV30 is that flexible, would doing the texture access via FP pipeline introduce a performance bottleneck? (Performance wise for NV30, does it matter if color calcs are integer or floating point, since the texture access is already floating point?)

That might depend on the actual shader code.
For example if the PS1.4 had more than 1/3 of it's arithmetic instructions in the first phase than it might be a bottleneck. (Assuming that FP is half as fast.)

If you don't want to do dependent reads you don't have to have any arithmetic instruction in the first phase.
 
That diagram really doesn't list two usable register combiners per pipe. It just looks like it's saying the FP and legacy pipes (which do apear to be separate there) use separate register combiner hardware.
 
Chalnoth said:
That diagram really doesn't list two usable register combiners per pipe. It just looks like it's saying the FP and legacy pipes (which do apear to be separate there) use separate register combiner hardware.

FYI: I read it the same way.
 
Tonyo said:
DaveBaumann said:
So, we've established that NV30 has an integer and a floating point path at least. I'm hazy on this point: if a PS1.1/1.3 shader operates on NV30 will it use the (fast) integer path or the slower FP path?

The integer format from NV30 (called fx12) is a 12bit fixed point format (sign + 1bit integer + 10bit fraction). This is an extended version of the legacy mode supported by the register combiners (the register combiners used a 9bit S8 representation), so I guess it will be used most of the time in current apps (fixed fragment pipeline, register combiner mode/fragment combiner mode, and upto PS 1.3).

NV30 using 12 bit integers internally indicates, IMO, that data goes down the fp16 path but in denormalized form.

So integer calcs would be *exactly* the same speed as fp16.

Cheers
Gubbi
 
Tonyo said:
The integer format from NV30 (called fx12) is a 12bit fixed point format (sign + 1bit integer + 10bit fraction). This is an extended version of the legacy mode supported by the register combiners (the register combiners used a 9bit S8 representation), so I guess it will be used most of the time in current apps (fixed fragment pipeline, register combiner mode/fragment combiner mode, and upto PS 1.3).

hmm, it just cought my attention: the format either does not have a designated integer bit or it does not span the [-2, 2] range. if it had a designated integer bit the format's range would be (-2, 2), or pecisely, given the 10 fraction bits, [-1.9990234375, 1.9990234375].
 
Back
Top