FP16? But it's the current year!

While it is relatively easy to just look at the number formats and proclaim that something is sufficient or not, the underlying application here is not numerical analysis, but gaming. And it seems to me that in order for a numerical error to actually matter, a pixel error has to be:
1. Large enough to be readily detectable
2. Be consistent over time, as errant pixels showing up in a single frame is extremely unlikely to get noticed
3. The pixel error pretty much needs to correlate with similar errors on its neighbours to create a larger area that is objectionably anomalous (and consistently so over time).

And to judge that, you need hands on experience with performing precision experiments with actual games. Just thinking about it, it would seem you could get away with a lot. But that's just being an armchair expert, no better than just performing the numerical analysis.
How does it play out in reality?
Yes, changing data types "blindly" to fp16 one shader at a time and validating the end result visually between each change is a valid strategy for porting game shaders to fp16. If the error can't be seen visually, there's no problem. Of course this assumes that your test images/sequences have good enough coverage of different corner cases. Testing a daylight scene would behave completely different than a night scene. Highly specular materials can also be problematic (GGX math at fp16 can output larger values than 65504 -> infinite -> problems).

However, understanding the basics of floating point helps when you encounter problems. Most problems are caused either by catastrophic cancellation or by adding/subtracting values of vastly different magnitudes. If you can isolate operations that are prone to these issues, you end up hitting lot less problems. Fp16 math is generally good enough for calculating sRGB8 and HDR10 image output, as long as these problematic operations are separated and executed at fp32.
 
Fp16 math is generally good enough for calculating sRGB8 and HDR10 image output, as long as these problematic operations are separated and executed at fp32.

Maybe that is another good fit to ND style use of different variatios of an Uber shader for every on-screen tile.
 
@sebbbi Have to admit I thought there would be automated tools to check for level of variance.
Benchmark that then checks by xor'ing images to give a level of variance or something, then someone one can eyeball it afterwards if it's too high.
 
@sebbbi Have to admit I thought there would be automated tools to check for level of variance.
Benchmark that then checks by xor'ing images to give a level of variance or something, then someone one can eyeball it afterwards if it's too high.
Automated testing is a good idea. Pick important locations around levels and take fp16 & fp32 screenshots. If you already have a testing script that loads all levels and performs other tests (such as memory consumption), it is easy to integrate this as part of the same test. Having a key mapped to instantly switch between fp32<->fp16 shaders at runtime is a great feature as well. And another key to show difference (error x4 for example). If you (or a tester) spot something ugly, you can just press a key to confirm whether it was caused by fp16 or something else.

If you use ifdefs for half types, it is trivial to compile two versions of each shader.
Code:
#ifdef ENABLE_16_BIT_FLOAT
#define HALF min16float
#define HALF2 min16float2
#define HALF3 min16float3
#define HALF4 min16float4
#else
#define HALF float
#define HALF2 float2
#define HALF3 float3
#define HALF4 float4
#endif
 
It wouldn't be surprising to see a driver optimization setting/replacement that demoted FP32 to FP16 based on some hueristics either. A method of improving performance on older games.
 
It wouldn't be surprising to see a driver optimization setting/replacement that demoted FP32 to FP16 based on some hueristics either. A method of improving performance on older games.

It's not unprecedented, although the overriding heuristic was sometimes whether it made the vendor's card do better when competitively benchmarked, or something to that FX effect.
 
It's not unprecedented, although the overriding heuristic was sometimes whether it made the vendor's card do better when competitively benchmarked, or something to that FX effect.
It does seem likely we'll be needing more IQ reviews in the near future. I'm kind of curious how much FP16 already exists that could simply execute more efficiently? I'm sure some devs have packed FP16 registers. Stands to reason compiling for FP16 execution given FP16 registers would be fairly straightforward.
 
Back
Top