Pixel Shader 2.0 Instructions Precision

Nick

Veteran
Hi,

Does anyone know where I can find accurate information about the precision of ps 2.0 operations?

I've heard that ATI uses a 24-bit floating-point format in the Radeon 9700 series and Nvidia optionally uses a 16- or 32-bit format with the Geforce FX, but this is not what I'm interested in. I need to know the relative error of the instructions like for example pow. According to the DirectX 9.0 SDK it is "full precision", but what does this mean? Do they mean it's a very good approximation, like a maximum relative error of 0.0001, or the full mantissa is IEEE correct with rounding? Or is it just implementation dependent and changeable trough the drivers? Full precision sounds a bit weird to me since that requires lots of extra silicon and clock cycles, and I can't imagine that precision is more important than performance for these graphics cards meant for gaming.

All information will be kindly appreciated.
 
Sorry, when I first saw the post I did a search since I could have sworn I saw a mention in discussion by one of the ATI employees concerning the final precision of some operations in a discussion of look up tables.

Perhaps you could do some searches, maybe with, "lut" or "precision", and you might have better results. Or maybe someone "in the know" will give you a definitive answer (Hmm...I hope my vowel deficiency wise crack didn't send andy running away in disgust...).
I don't think you should be too sure that the accuracy of results aren't to the limits of calculation precision, since you asked for a guess.
 
Nick,
I can't help with pixel shaders, as I haven't read that far through the DX specs (Life's too short!), but, IIRC, 'full precision' in the vertex shader shader spec usually seemed to mean that the resulting mantissa was accurate to around 21-22'ish bits. This was in contrast to VS 1.0 for which some results only provided a 'seed' accurate to about 10-12bits which could then be improved with, say, Newton iteration.


EDIT/UPDATE: The low accuracy specs applied to the (partial) exponent and log instructions. I think everything else had to be full precision.
 
If I understand what you're saying correctly and thus IIRC (don't have time to search, so you'll have to take my "if I recall correctly" words :) ) :

"Full Precision" : at least 21-bits for almost all intructions
"Partial Precision" : at least 10bits for almost all instructions

[edit]Okay, now you've got me interested... I'll check and report back.
 
Thanks a lot guys!

So it's very likely that complex instructions like pow use a fast approximation which isn't accurate to the last bit. Thats all I needed to know...
 
Nick said:
Thanks a lot guys!

So it's very likely that complex instructions like pow use a fast approximation which isn't accurate to the last bit. Thats all I needed to know...
I stopped being lazy and went and looked at "pow".

In the VS, it's a 'macro' instruction (typically log2, mul, exp) and the DST "Precision is not lower than 15 bits".

The PS documentation is a bit harder to understand, but it too looks as though it uses a macro, though I couldn't see any specified precision.
 
If you guys check out this post from sireric, you'll see precision is maintained in complex ops with the R3XX architecture (at least the ones executed in a single cycle).
 
Nick said:
Thanks a lot guys!

So it's very likely that complex instructions like pow use a fast approximation which isn't accurate to the last bit. Thats all I needed to know...
Yes, for functions like pow (and sqrt), DX9 has a "definition" for worst-case tolerance of error. Maybe it's to provide for performance and "good enough" for most type of uses, or maybe it's a good starting point for a few Newton/Rhapson iterations to compute a full-precision version of the function (if that's the kind of thing that excites you).

IEEE only defines precision requirements for the basic arithmetic operations (add, subtract, multiply, divide), defining them as the correctly rounded full-precision version of the exact result. For positional calculations using matrices, you can structure things to use only these operations, so you can have an exact expectation of the result.

All CPU/GPU architectures tend to take liberties with the non-basic instructions. For example, with RISC CPU's that have a muladd instruction there is always debate about how/when the computation should be rounded.

Looking a bit more into your original post :
Full precision sounds a bit weird to me since that requires lots of extra silicon and clock cycles, and I can't imagine that precision is more important than performance for these graphics cards meant for gaming.
I can't say I fully agree with that : I think the lack of repeatability in this generation of hardware could be a potentially huge mistake. What we have here is essentially a computational device where the results of each computation are not well-defined... because when we write codes we just kinda hope that things work out right on most hardware... bummer.
 
Back
Top