_pp Going Forward

Dave Baumann · Feb 22, 2005

I'm saying that the silicon costs in the future may mean they are more likely to implement them in FP32.

Simon F · Feb 22, 2005

DaveBaumann said:
And the early talks of Shader Model 4 suggested a Integer instruction set alongside a Floating Point. I suspect that the transistors saving gains in the few FP16 specific instructions there are at the moment will be fairly negligable in future processors.

I always got the feeling that the principle reason for the FP16 was that it effectively doubled the number of temporary registers and storage is, in a way, more expensive than floating point operations.

Remi · Feb 24, 2005

I can only speak for myself, and I don't think I'm very representative of developers in general, even less of game developers... That being said... There's no way I'm going to give up mixed precision - not until it gives 0% performance improvement. About 95% of my graphic ps code is fp16, and I'm very happy with it, thanks.

Geo · Feb 24, 2005

Remi said:
I can only speak for myself, and I don't think I'm very representative of developers in general, even less of game developers... That being said... There's no way I'm going to give up mixed precision - not until it gives 0% performance improvement. About 95% of my graphic ps code is fp16, and I'm very happy with it, thanks.

A second fact! (Contradictory, but oh well). Thanks for playing. . .

DegustatoR · Feb 25, 2005

In general, until it's giving a performance boost it's gonna be used by developers or at least for shader replacement in drivers

Zengar · Mar 1, 2005

Partial precision is more then sufficient for normal computation and i make heavy use of it because of free normalize. Never noticed any artifacts.

Ok, I'm using opengl. Optional precision switches are much easier there then in directx(you can simply specify different macros for type definition)

Geo · Mar 1, 2005

Zengar said:
Partial precision is more then sufficient for normal computation and i make heavy use of it because of free normalize. Never noticed any artifacts.

Ok, I'm using opengl. Optional precision switches are much easier there then in directx(you can simply specify different macros for type definition)

Have you done any testing to see how much of a difference it makes for the GF6 line? Just curious. . .

Zengar · Mar 1, 2005

Actually, I didn't see any difference :?
I don't think I was shader bound in my small demos however.

Maybe I'll do some testing in the future, but now I'm working on other projects. And waiting for EXT_framebuffer_object drivers to come out ;-)

Ostsol · Mar 1, 2005

Zengar said:
And waiting for EXT_framebuffer_object drivers to come out ;-)

Gah! I hadn't noticed that the spec was finally posted! Woot! 8)

Zengar · Mar 1, 2005

You should visit opengl.org more often

Ostsol · Mar 1, 2005

True. I used to visit that site and its forums every day. . . :?

Remi · Mar 1, 2005

geo said:
Have you done any testing to see how much of a difference it makes for the GF6 line? Just curious. . .

I know this question isn't addressed to me, but as I have my figures with me...

Code:

GF6800U(NV40) - Compiler v66.93

Precision    Cycles  R Regs used
- mixed        611     11
- all fp16     600     11
- all fp32     679     20

So, about halving the registers gives about 10% of perf increase. Not that much, sure. It could probably be said that it's not worth the trouble, and there are definitely some cases where it's true.
The problem for me is that it's so easy to use mixed precision when coding that it's practically free, and a 10% perf. increase for free is a pretty good deal by my books...

Pete · Mar 1, 2005

That's a decent improvement, both in speed and registers. I have to wonder if the transistors saved by ditching FP16 could be used to both increase the register space and implement FP32 versions of specialized FP16 functions.

Geo · Mar 2, 2005

Remi said:
geo said:

Have you done any testing to see how much of a difference it makes for the GF6 line? Just curious. . .

Click to expand...

I know this question isn't addressed to me, but as I have my figures with me...

Code:

GF6800U(NV40) - Compiler v66.93 Precision Cycles R Regs used - mixed 611 11 - all fp16 600 11 - all fp32 679 20

So, about halving the registers gives about 10% of perf increase. Not that much, sure. It could probably be said that it's not worth the trouble, and there are definitely some cases where it's true.
The problem for me is that it's so easy to use mixed precision when coding that it's practically free, and a 10% perf. increase for free is a pretty good deal by my books...

Thanks for sharing. . .

It's that "easy to use" bit that has been, to a degree, a bone of contention. Not so much that it isn't easy to code. . but that some have reported it takes a bit of time (particularly cumulatively) to make the analysis of whether any given particular case requires FP32 to avoid banding, artifacting, etc. Since you don't seem to have this problem, perhaps you can also share how you make that decision on a case-by-case basis? Do you have one or more "rules of thumb" you apply on the fly to avoid case-by-case test-and-analyze-results, and (if so) how reliable do you find those rules? Do you ever see the results and have to go back and un-partial?

KimB · Mar 2, 2005

Pete said:
That's a decent improvement, both in speed and registers. I have to wonder if the transistors saved by ditching FP16 could be used to both increase the register space and implement FP32 versions of specialized FP16 functions.

Since the hardware needs to be able to convert between FP16 and FP32 anyway (to read/write textures textures), I seriously doubt the added transistors to allow FP32 registers to store two FP16 values are terribly significant.

Remi · Mar 2, 2005

Geo said:
It's that "easy to use" bit that has been, to a degree, a bone of contention. Not so much that it isn't easy to code. . but that some have reported it takes a bit of time (particularly cumulatively) to make the analysis of whether any given particular case requires FP32 to avoid banding, artifacting, etc.

That's the mistake IMHO. You analyse the problem which your program have to solve as a preparation work for the coding phase. And - guess what? That's it. In the real world, you don't later re-analyse your code to proove that it effectively do what you intended it to do. Instead, you do unit tests. It maybe less intellectualy rewarding, but it's far more effective (budget-wise). Therefore I don't do code analysis for graphic code, I just do unit tests - including test cases to check the effective precision.

Geo said:
Since you don't seem to have this problem, perhaps you can also share how you make that decision on a case-by-case basis? Do you have one or more "rules of thumb" you apply on the fly to avoid case-by-case test-and-analyze-results, and (if so) how reliable do you find those rules? Do you ever see the results and have to go back and un-partial?

Hum, Let's see... I'll try to be explain it all as cleary as possible (which might require some length)...

The executive's summary:

As a general rule, fp16 is good for colors, directions and local distances. fp32 is required mainly for big texture's coordinates and for long distances (which are dealt with mainly in the VS anyway, they are not that common in a PS).

The coder's details:

I rely on the preprocessor to define my own types. I define different types for colors, distances, etc. Depending on what the type contains, I then define more precise types which include a hint on the relative precision they'll need, such as longDist for long distances. My relative precision hints are trivial/short/long for integer types, or real/full for floating-point types.

Code:

        Type / Precision:    Low        Normal     High
        ----------------     ---------  ---------  --------
        Integral             trivial    short      long
        Floating Point                  real       full

That's the general idea.

I use a type-mapping (or type-"patchboard") block, controled by a... yes, you guessed it, a type-control block!

The type-control block, near the begining of my source file, allows me to control which real precision is mapped to my relative precisions. By default trivial means fx12, short/real means fp16 and long/full means fp32. In this block, I can force them all to be fp16 or fp32 by just uncommenting one #define.

The "patchboard" block follows, it does the detailled mapping.

Here's what the control block looks like:

Code:

        //
        //        PERFORMANCE VS. IQ - (TYPE CONTROL)
        //

        // Define to use performance mixed precision (trade accuracy for perf)
        #define PERFORMANCE_MIXED_PRECISION

        // Define to force the precision everywhere. Default is mixed.
        //#define FULL_PRECISION
        //#define HALF_PRECISION
        //#define FIXED_PRECISION

Those #defines trigger blocks of #defines (in the patchboard) which defines the precision for each one of my types, like this for instance:

Code:

        #define color                    half
        #define color2                   half2
        #define color3                   half3
        #define color4                   half4
        #define color3x3                 half3x3
        #define color4x4                 half4x4

These blocks are controled by mere #ifs to match a specific type with a specific precision.

Here's the type-"patchboard" (the main excerpts):

Code:

        Note: for clarity I have left only the basic type and removed the 
              definitions of the subtypes (such as color2, color3, etc).

        //
        //        TYPE MAPPING (OR TYPE "PATCHBOARD")
        //

        #ifdef FULL_PRECISION

        // All FP32
        #define realreal                 float
        #define lowColor                 float
        #define color                    float
        #define absPos                   float
        #define trivVec                  float
        #define shrtVec                  float
        #define longVec                  float
        #define trivTexCoor              float
        #define shrtTexCoor              float
        #define longTexCoor              float
        #define triv                     float
        #define real                     float
        #define full                     float

        #elif defined(HALF_PRECISION)

        // All FP16
        ... (same as above, replace "float" by "half")        

        #elif defined(FIXED_PRECISION)

        // All FX12
        ...  (same as above, replace by "fixed" - I don't really use it those days but I kept it just in case)

        #else 

        // short = FP16, long = FP32, and triv(ial) = FX12.
        // colors are half.
        #define realreal                 half
        #define lowColor                 fixed
        #define color                    half
        #define absPos                   float
        #define trivVec                  fixed
        #define shrtVec                  half
        #define longVec                  float
        #define trivTexCoor              fixed
        #define shrtTexCoor              half
        #define longTexCoor              float
        #define triv                     fixed
        #define real                     half
        #define full                     float

        #endif

        (note: I map "fixed" to "half" on systems whthout fx12 support)

This defines various types with a hint on precision, such as lowColor which is a low-fidelity color.

In my code, I'll never use directly the language's types but instead the more specific ones which I have #defined. I control the precision on a case-by-case basis by choosing my specific type, for instance I'll use "lowColor" for a color which really don't need precision or "color" for one which requires a normal precision.

If in the control block, I #define nothing (all commented out), I got mixed precision. Defining FULL_PRECISION forces fp32 everywhere and defining HALF_PRECISION forces fp16 everywhere.

When I code, it's usually with mixed precision, I don't force fp32 or fp16 everywhere constantly.

From time to time, I define FULL_PRECISION to force fp32 everywhere in order to make sure I haven't introduced a significant difference.

When I code something that I feel is really agressive, I can quickly code one safe path and quarantine my "agressive" type-optimised path in a #if PERFORMANCE_MIXED_PRECISION block. By just defining PERFORMANCE_MIXED_PRECISION or not in the control block, I can choose if I'll run the risky path or the safe one, and see by myself if there's a difference in the rendering or not.

Hence I don't really have to make a lot of decisions when coding - I already have "factored" them with the patchboard. I am just more explicit in my types choices, choosing a color, a direction, a distance; and eventually specifying if it's a short distance or a long one for instance. There's an important side benefit: that actually makes the code a lot clearer to read, to the point where to me, that reason alone is good enough to keep those type redefinitions, even if there were no performance improvement.

Usage:

With that, working with "performance mixed precision" is a piece of cake.

I usually code with PERFORMANCE_MIXED_PRECISION, going back to the more cautious mixed precision when I want to be sure that one of the "aggressive" optimizations doesn't come into play. Again, from time to time, I define FULL_PRECISION to force fp32 everywhere in order to make sure I haven't introduced a significant difference. Finally, when unit-testing, I always compare the two modes.

I still do that manually (only the paranoid will survive

) but I'm now rather confident that a mere automated compare would suffice, ringing a bell when there's too much difference in the two pics.

Really nothing complex. A type mapping, controled by a few #defines. More specific types, making the code more readable. That's all it takes!

The limits:

- if you're not hand-coding your shaders but generating them, depending on your generator, this might be more delicate. I think I would try to identify the potentialy most frequent sources of problems and have the generator write out a few versions of the shader: one full fp32 for reference and type-variations at those spots, run them all in your unit tests with triggers set on image comparison to the reference image. That is assuming the unit tests runs are automated, which of course is the case in all good software houses, isn't it?

- that's not going to solve other problems. If one codes optimally for only one architecture and believes that using mixed precision will solve bad latency hidding troubles in the other one, he might find appropriate to reconsider.

Pfew! It's been some time since I've posted something that long! Need to recover... Quick! A beer or something!!!

Disclaimer: The code included in this post... ...no warranty... ... at your own risk... ...citing is allowed in the event of a successfull implementation, somewhere in the long-and-never-read list of thanks... etc.

KimB · Mar 2, 2005

Remi said:
The limits:

- if you're not hand-coding your shaders but generating them, depending on your generator, this might be more delicate. I think I would try to identify the potentialy most frequent sources of problems and have the generator write out a few versions of the shader: one full fp32 for reference and type-variations at those spots, run them all in your unit tests with triggers set on image comparison to the reference image. That is assuming the unit tests runs are automated, which of course is the case in all good software houses, isn't it?

Well, I think this is the main issue. I don't expect most shaders in games in the next few years will be hand-written in GLSL, HLSL, or Cg. Rather they will be using some sort of higher-level system that makes use of a shader library. The only problem with using automated test/reference images is that the artist may not have any idea a priori what situation would most exacerbate the problems of the shader.

Making good, optimized shader libraries that make use of partial precision without quality loss or large amounts of extra required artist time is a challenge, but not an insurmountable one.

Remi · Mar 2, 2005

Chalnoth said:
Making good, optimized shader libraries that make use of partial precision without quality loss or large amounts of extra required artist time is a challenge, but not an insurmountable one.

Indeed.

The more experienced someone will be with mixed precision, the less it'll be a challenge to him...

Geo · Mar 2, 2005

Remi--

Sweet! Speaking of beer, I owe you one next time you're in the neighborhood.

Remi · Mar 3, 2005

Count me in! Of course there is just one remaining insignificant little detail... neighborhood. If that's near St Louis, it may take some time as we're not on the same continent, but who knows... The world is such a small place...

_pp Going Forward

Dave Baumann

Gamerscore Wh...

Simon F

Tea maker

Remi

Geo

Mostly Harmless

DegustatoR

Zengar

Geo

Mostly Harmless

Zengar

Ostsol

Zengar

Ostsol

Remi

Pete

Moderate Nuisance

Geo

Mostly Harmless

KimB

Remi

KimB

Remi

Geo

Mostly Harmless

Remi

Similar threads