I'm saying that the silicon costs in the future may mean they are more likely to implement them in FP32.
I always got the feeling that the principle reason for the FP16 was that it effectively doubled the number of temporary registers and storage is, in a way, more expensive than floating point operations.DaveBaumann said:And the early talks of Shader Model 4 suggested a Integer instruction set alongside a Floating Point. I suspect that the transistors saving gains in the few FP16 specific instructions there are at the moment will be fairly negligable in future processors.
Remi said:I can only speak for myself, and I don't think I'm very representative of developers in general, even less of game developers... That being said... There's no way I'm going to give up mixed precision - not until it gives 0% performance improvement. About 95% of my graphic ps code is fp16, and I'm very happy with it, thanks.
Zengar said:Partial precision is more then sufficient for normal computation and i make heavy use of it because of free normalize. Never noticed any artifacts.
Ok, I'm using opengl. Optional precision switches are much easier there then in directx(you can simply specify different macros for type definition)
Gah! I hadn't noticed that the spec was finally posted! Woot! 8)Zengar said:And waiting for EXT_framebuffer_object drivers to come out ;-)
I know this question isn't addressed to me, but as I have my figures with me...geo said:Have you done any testing to see how much of a difference it makes for the GF6 line? Just curious. . .
GF6800U(NV40) - Compiler v66.93
Precision Cycles R Regs used
- mixed 611 11
- all fp16 600 11
- all fp32 679 20
Remi said:I know this question isn't addressed to me, but as I have my figures with me...geo said:Have you done any testing to see how much of a difference it makes for the GF6 line? Just curious. . .
So, about halving the registers gives about 10% of perf increase. Not that much, sure. It could probably be said that it's not worth the trouble, and there are definitely some cases where it's true.Code:GF6800U(NV40) - Compiler v66.93 Precision Cycles R Regs used - mixed 611 11 - all fp16 600 11 - all fp32 679 20
The problem for me is that it's so easy to use mixed precision when coding that it's practically free, and a 10% perf. increase for free is a pretty good deal by my books...
Since the hardware needs to be able to convert between FP16 and FP32 anyway (to read/write textures textures), I seriously doubt the added transistors to allow FP32 registers to store two FP16 values are terribly significant.Pete said:That's a decent improvement, both in speed and registers. I have to wonder if the transistors saved by ditching FP16 could be used to both increase the register space and implement FP32 versions of specialized FP16 functions.
That's the mistake IMHO. You analyse the problem which your program have to solve as a preparation work for the coding phase. And - guess what? That's it. In the real world, you don't later re-analyse your code to proove that it effectively do what you intended it to do. Instead, you do unit tests. It maybe less intellectualy rewarding, but it's far more effective (budget-wise). Therefore I don't do code analysis for graphic code, I just do unit tests - including test cases to check the effective precision.Geo said:It's that "easy to use" bit that has been, to a degree, a bone of contention. Not so much that it isn't easy to code. . but that some have reported it takes a bit of time (particularly cumulatively) to make the analysis of whether any given particular case requires FP32 to avoid banding, artifacting, etc.
Hum, Let's see... I'll try to be explain it all as cleary as possible (which might require some length)...Geo said:Since you don't seem to have this problem, perhaps you can also share how you make that decision on a case-by-case basis? Do you have one or more "rules of thumb" you apply on the fly to avoid case-by-case test-and-analyze-results, and (if so) how reliable do you find those rules? Do you ever see the results and have to go back and un-partial?
Type / Precision: Low Normal High
---------------- --------- --------- --------
Integral trivial short long
Floating Point real full
//
// PERFORMANCE VS. IQ - (TYPE CONTROL)
//
// Define to use performance mixed precision (trade accuracy for perf)
#define PERFORMANCE_MIXED_PRECISION
// Define to force the precision everywhere. Default is mixed.
//#define FULL_PRECISION
//#define HALF_PRECISION
//#define FIXED_PRECISION
#define color half
#define color2 half2
#define color3 half3
#define color4 half4
#define color3x3 half3x3
#define color4x4 half4x4
Note: for clarity I have left only the basic type and removed the
definitions of the subtypes (such as color2, color3, etc).
//
// TYPE MAPPING (OR TYPE "PATCHBOARD")
//
#ifdef FULL_PRECISION
// All FP32
#define realreal float
#define lowColor float
#define color float
#define absPos float
#define trivVec float
#define shrtVec float
#define longVec float
#define trivTexCoor float
#define shrtTexCoor float
#define longTexCoor float
#define triv float
#define real float
#define full float
#elif defined(HALF_PRECISION)
// All FP16
... (same as above, replace "float" by "half")
#elif defined(FIXED_PRECISION)
// All FX12
... (same as above, replace by "fixed" - I don't really use it those days but I kept it just in case)
#else
// short = FP16, long = FP32, and triv(ial) = FX12.
// colors are half.
#define realreal half
#define lowColor fixed
#define color half
#define absPos float
#define trivVec fixed
#define shrtVec half
#define longVec float
#define trivTexCoor fixed
#define shrtTexCoor half
#define longTexCoor float
#define triv fixed
#define real half
#define full float
#endif
(note: I map "fixed" to "half" on systems whthout fx12 support)
Well, I think this is the main issue. I don't expect most shaders in games in the next few years will be hand-written in GLSL, HLSL, or Cg. Rather they will be using some sort of higher-level system that makes use of a shader library. The only problem with using automated test/reference images is that the artist may not have any idea a priori what situation would most exacerbate the problems of the shader.Remi said:The limits:
- if you're not hand-coding your shaders but generating them, depending on your generator, this might be more delicate. I think I would try to identify the potentialy most frequent sources of problems and have the generator write out a few versions of the shader: one full fp32 for reference and type-variations at those spots, run them all in your unit tests with triggers set on image comparison to the reference image. That is assuming the unit tests runs are automated, which of course is the case in all good software houses, isn't it?
Indeed.Chalnoth said:Making good, optimized shader libraries that make use of partial precision without quality loss or large amounts of extra required artist time is a challenge, but not an insurmountable one.