What type of vliw pixel processing format does NV30 use?

Actually Mintmaster, each vertex program processor in the vertex shader array of the NV30 is actually more flexible and powerful (at least in instruction set) than a corresponding pixel program processor in the NV30. Reason being that each vertex unit in the array is its own scalar processor, with its own instruction set. Operations between the processors are not necessarily in simd format. This more easily allows splitting the work load of programs (designating between vector and scalar ops). It probably also means that each vertex processor in the array itself can issue a few operations internally in parallel along with following branch, loop, etc. commands from vertex array command processor. The indiviual pixel units in each of the pixel program processors (which are not really array, but more limited logical blocks) are probably either fmacs (more limited than a dsp) or simpler scalar processors, with a smaller more limited logic/instruction set than a dsp (sort of like a mini-dsp). The reason the pixel pipelines are advertised more frequently probably corresponds to the improvement made in them over the previous generation; also, the fact that the pixel is the main building block of graphics in real-time today (from my limited amount of knowledge).
 
In instruction set and program length, yes, the vertex shader is a bit more flexible. However, the main problem is getting information out of a vertex shader and back into it again. With a pixel shader, you have textures to do this data routing, and it works very well. I was probably wrong when I suggested in my previous post that performance was the reason to do the demo with the pixel shader.

This is why I'm eagerly awaiting the next generation of video cards, where we have textures in the vertex shader. We can then go full circle: vertex shaders feeding pixel shaders and back to affect the geometry in vertex shaders again. Suddenly geometry deformation can be offloaded from the CPU, and we can do some crazy stuff...
 
Interestingly, I've read more into the following contained in the Digit-Life article on the Geforce FX, in reference to its data issuing capabilities:

"But we have multiple "no's" exactly for the pixel shaders. First of all, the performance of commands drops down twice (at least) while processing floating-point data compared to integer data (this is a pure computational performance without accounting for losses caused by increased data volumes). A pixel processor of the GeForce FX can execute up to two integer and one floating-point command per clock, i.e. it acts as a superscalar processor in case of integer operations. This maintains an acceptable speed comparable with solutions using stages in case of execution of shaders 1.x. "

From this we can conclude (if the information can be assumed accurate) that the NV30 can issue 2 integer or 1 floating point command per clock (I guess each fp command can be split into to smaller commands for fp16 mode?). How would the NV30 respond to a shader which contains both scalar and vector ops? Will it be less prepared for this than the R300, which can issue a vector (simd) fp op and scalar fp op per clock (in the pixel program processor)?.

Unless the NV30 executes a VLIW floating-point instruction, which can break up into an simd vector and a scalar, it will remain uncompetitive, clock for clock, with the competing architecture. It seems the processor supports sin/cos, exp2, and other complex math functions which would warrant not only an array of 4 fmacs but also a special function unit of some sort. Does anyone surmise if somthing of the sort will be present in the NV30?
 
Thanks for reminding me about that article, Luminescent. That's where I read about the NV30 I12 format:

Contrary to a vertex processor which always works with the F32 data format, a pixel processor (like in R300 and in NV30) supports three formats- F32, F16 and integer I16 (R300) / I12 (NV30). The latter two formats are not just useful for compatibility with old shaders 1.x but also provide speed gains in calculations.

If you were going to do some Z-related stuff in the pixel shader (e.g. custom shadow map algorithms, image space outlining), wouldn't I12 be rather insufficient? Even I16 is more or less a lower limit for artifact free Z stuff. As I mentioned before, ATI's shading paper said the normals in their car app also needed high precision:
Thus, we can store x and y in two channels of a 16-16 texture map and derive z in the pixel shader from +sqrt (1 – x 2 – y 2 ). This gives us much higher precision than a traditional 8-8-8-8 normal map (even 10 or 11 bits per channel is not enough for this particular shader) for the same memory footprint.

I suppose Digit-Life could be wrong about the I12 stuff, but why would they even assume it was different from ATI if they didn't hear otherwise?

When NV30 was announced, I was quite sure that their pixel shader was more flexible and powerful (albeit in a rather limited, PS 1.4 vs PS 1.3 sort of way). However, taking this info along with Luminescent's argument into account, it seems like NV30 has a couple of drawbacks compared to R300. Well, I guess these are somewhat moot points, anyway.
 
I was just looking at the nv30specs.pdf file from nVidia, and there's a table that shows I16 support (I think...), under the NV_float_buffer extension description:

Sized Base
Int. Format Int. Format Component Name / Type-Size
------------------- --------------- ---------------------------
ALPHA4 ALPHA A/U4
ALPHA8 ALPHA A/U8
ALPHA12 ALPHA A/U12
ALPHA16 ALPHA A/U16
LUMINANCE4 LUMINANCE L/U4
LUMINANCE8 LUMINANCE L/U8
LUMINANCE12 LUMINANCE L/U12
LUMINANCE16 LUMINANCE L/U16
LUMINANCE4_ALPHA4 LUMINANCE_ALPHA A/U4 L/U4
LUMINANCE6_ALPHA2 LUMINANCE_ALPHA A/U2 L/U6
LUMINANCE8_ALPHA8 LUMINANCE_ALPHA A/U8 L/U8
LUMINANCE12_ALPHA4 LUMINANCE_ALPHA A/U4 L/U12
LUMINANCE12_ALPHA12 LUMINANCE_ALPHA A/U12 L/U12
LUMINANCE16_ALPHA16 LUMINANCE_ALPHA A/U16 L/U16
INTENSITY4 INTENSITY I/U4
INTENSITY8 INTENSITY I/U8
INTENSITY12 INTENSITY I/U12
INTENSITY16 INTENSITY I/U16
R3_G3_B2 RGB R/U3 G/U3 B/U2
RGB4 RGB R/U4 G/U4 B/U4
RGB5 RGB R/U5 G/U5 B/U5
RGB8 RGB R/U8 G/U8 B/U8
RGB10 RGB R/U10 G/U10 B/10
RGB12 RGB R/U12 G/U12 B/U12
RGB16 RGB R/U16 G/U16 B/U16
RGBA2 RGBA R/U2 G/U2 B/U2 A/U2
RGBA4 RGBA R/U4 G/U4 B/U4 A/U4
RGB5_A1 RGBA R/U5 G/U5 B/U5 A/U1
RGBA8 RGBA R/U8 G/U8 B/U8 A/U8
RGB10_A2 RGBA R/U10 G/U10 B/U10 A/U2
RGBA12 RGBA R/U12 G/U12 B/U12 A/U12
RGBA16 RGBA R/U16 G/U16 B/U16 A/U16
* COLOR_INDEX1_EXT COLOR_INDEX CI/U1
* COLOR_INDEX2_EXT COLOR_INDEX CI/U2
* COLOR_INDEX4_EXT COLOR_INDEX CI/U4
* COLOR_INDEX8_EXT COLOR_INDEX CI/U8
* COLOR_INDEX16_EXT COLOR_INDEX CI/U16
* DEPTH_COMPONENT16_SGIX DEPTH_COMPONENT Z/U16
* DEPTH_COMPONENT24_SGIX DEPTH_COMPONENT Z/U24
* DEPTH_COMPONENT32_SGIX DEPTH_COMPONENT Z/U32
* HILO16_NV HILO HI/U16 LO/U16
* SIGNED_HILO16_NV HILO HI/S16 LO/S16
* SIGNED_RGBA8_NV RGBA R/S8 G/S8 B/S8 A/S8
* SIGNED_RGB8_
UNSIGNED_ALPHA8_NV RGBA R/S8 G/S8 B/S8 A/U8
* SIGNED_RGB8_NV RGB R/S8 G/S8 B/S8
* SIGNED_LUMINANCE8_NV LUMINANCE L/S8
* SIGNED_LUMINANCE8_
ALPHA8_NV LUMINANCE_ALPHA L/S8 A/S8
* SIGNED_ALPHA8_NV ALPHA A/S8
* SIGNED_INTENSITY8_NV INTENSITY I/S8
* DSDT8_NV DSDT_NV DS/S8 DT/S8
* DSDT8_MAG8_NV DSDT_MAG_NV DS/S8 DT/S8 MAG/U8
* DSDT8_MAG8_ DSDT_MAG_
INTENSITY8_NV INTENSITY_NV DS/S8 DT/S8 MAG/U8 I/U8
FLOAT_R16_NV FLOAT_R_NV R/F16
FLOAT_R32_NV FLOAT_R_NV R/F32
FLOAT_RG16_NV FLOAT_RG_NV R/F16 G/F16
FLOAT_RG32_NV FLOAT_RG_NV R/F32 G/F32
FLOAT_RGB16_NV FLOAT_RGB_NV R/F16 G/F16 B/F16
FLOAT_RGB32_NV FLOAT_RGB_NV R/F32 G/F32 B/F32
FLOAT_RGBA16_NV FLOAT_RGBA_NV R/F16 G/F16 B/F16 A/F16
FLOAT_RGBA32_NV FLOAT_RGBA_NV R/F32 G/F32 B/F32 A/F32

But no, this is far from conclusive. Still, it does seem very strange that either nVidia would support higher than 10-bit precision for an integer format, as that's the size of the mantissa for FP16 (which should allow for I8/I10 support with little to no extra transistors).

Has anybody yet found any official data from nVidia or ATI that describes higher-precision integer support?

Update:
After looking at the FP32 format, it now does seem likely that I12 could be supported with few extra transistors. After all, since FP32 has a mantissa of 23, the FP16 hardware must support at least a mantissa of 12, meaning I12 may not require special hardware after all...
 
And one last thing. Why bother to use I12 on the NV30 when FP16 should be just as fast? I12 would probably also be a very bad idea for a storage format, due to memory alignment.
 
Hmm. It could be like ATI's support of FP32, though, which is really FP24, i.e. 16-bit mantissa, 8-bit exponent (I think), and stored in FP32 format. NVidia could be doing the same thing, calculating in I12 and storing in I16 format.

16-bit integer support is easy for R300 due to the 16-bit mantissa of it's pixel pipes. For NV30, it's hard to say what's going on, but I think you're right in suggesting they added an extra bit or two of precision to the mantissa of the FP16 unit for use with both I12 and FP32. I would have hoped they would have I16 or I24 support with half performance, though, just like FP32.

Note: I'm including the sign bit as part of the mantissa.
 
But still, why support the formats at all? They definitely can't be for legacy support. Why not just use floating-point?
 
I16 seems like it would be a useful format. FP16 is a bit worse than even I12 when the exponent isn't used, like in normal maps or linear Z-buffers, so I16 is much better. FP32 could be used, but that requires twice the memory.

Since there are definate advantages of using I16 on R300, I don't think NV30 has much of an option about whether to support it or not. If you're asking "why support I12", I don't know. Maybe for the marginal space savings, assuming alignment isn't a problem once you tile it. Maybe NV30 does actually slow down with I16. That would give I12 formats a purpose.

Oh yeah, remember that floating point textures can't be filtered on either R300 or NV30 (I'm pretty sure about this, anyway). A variety of integer formats are quite important for this reason, although I would think I16 is fine for all higher precision texturing needs.
 
Back
Top