FP16? But it's the current year!

Yes, but explicit half type still doesn't mean gpu has to actually execute at fp16. Why would it, if fp32 is faster (on specific hardware)? It's also not something that's required for SM6.
Having an explicit half type however is a good idea. You wouldn't want to run stuff at int8 by accident.
 
Why? What makes you think driver will even expose half precision to DX? As it stands now this is a feature for CUDA. For DX low precision hints are simply ignored.
The opposite, not thinking my performance will tank but instead I won't benefit from it but I still paid a hefty premium - for a feature that should have been there if this is going to be the future state.
 
I remember you said once, that if a programmer knows what he is doing, even lower bit depth Integers can do the job just fine for a lot of the workloads. Maybe as the cheap wins in silicon become harder to achieve, architectural changes that open more doors for devs to shave off bits and bytes off of their code might be a great part of the performance gains of the future. All layman speculation over here though.
Personally I dont like 8 bit integer ALU or 10/12 bit limited range fixed point ALU. Have to be too careful and workaround too much around limited precision and range (simultaneously). Small bit depth types are however perfect for memory storage.

For int math (ALU), 16 bit is enough for surprisingly many cases (int is lossless, so range is all that matters). Fp16 is also often enough for general purpose math (unlike 10/12 bit fixed point types which are highly situational). Range is rarely an issue for fp16 (+-65504), and precision is good enough when working with 8 bit inputs/outputs. Of course you need fp32 ALU for position math (transforms, etc) and modern lighting math, but fp16 is perfect for post processing, LDR math before lighting (colorize, etc) and for normal vector related math, etc, etc.
 
Shouldn't be difficult for the compiler to promote to FP32. Performance is obviously less, but the compilers would have determined that to be the better solution. Even at half rate there should still be some bandwidth and memory savings.
Sure but then your back to losing the specific optimisations that the engine/post processing required with the FP16 operations.
Look how physics can be broken in game engines (not the same thing but just an example, and I am not suggesting all physics for this), but I guess it depends what the focus and implementation is, at a minimum as hinted by Sebbi possibly messing up AA, maybe lighting-shadows abnormalities/etc.
Cheers
 
Last edited:
Sure but then your back to losing the specific optimisations that the engine/post processing required with the FP16 operations.
Look how physics can be broken in game engines (not the same thing but just an example, and I am not suggesting physics for this), but I guess it depends what the focus and implementation is, at a minimum as hinted by Sebbi possibly messing up AA, maybe lighting-shadows abnormalities/etc.
Cheers
Opportunity cost issue. You can't lose the optimizations if they never would have existed in the first place. I'm not sure we'll be seeing any effects that exist solely because of the existence of FP16. The current optimizations look to be doing the same task in less time with less resources. Promoting FP16 to FP32 should never mess up the rendering unless you somehow relied on lossy math for some sort of randomness. The results may look ever so slightly different, but the variation should be small. Maybe it explodes your register usage and performance tanks, but again what was the alternative? FP16 should always provide equal or better results considering alternatives.
 
Is there someplace I can read up on using FP16 correctly?
Read 32 bit float articles targeted towards scientific audience. For them fp32 is "half precision" compared to fp64. This article seems to have it all, but is way too deep for most rendering programmers: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

The most important thing in float math is to avoid catastrophic cancellation. This means subtracting two (big) values from each other that are close to each other. This operation commonly happens when you use world space coordinates in your math, and calculate vectors between two points. For example localLight->surfacePixel, camera->surfacePixel, vertex1->vertex2 (edge math in world space), etc. The solution for this is to avoid doing math in world space. Just don't do it. It causes problems even on fp32 if your world is big enough. The first thing you should do is subtract camera position from all world space data (*). Absolutely no math before this. This way your floating point error is localized around the camera. Closer to camera = less error, further away = more error. Perspective projection makes everything smaller at distance, normalizing the error, assuring that no matter what distance, the error is always smaller than some subpixel fraction.

The previous trick is not enough for all fp16 cases. Instead of performing math in camera centered space, you'd sometimes want to perform math in surface local space. Subtract surface coordinate from the other position data. Do this subtract in fp32 (to minimize catastrophic cancellation) and rest of the operations in fp16.

In general you should avoid adding two floating point numbers together if their magnitude differs a lot. Example: Time counter should never be floating point (fp32). Accumulated time gets large and the added time per frame stays small. The precision of the add gets lower and lower, and frame time (animations) start to judder in a few hours. Similarly if you are adding multiple light sources together, you should not simply add them one at a time in a loop. Instead you should first add lights pairwise, and then these results pairwise, etc. This results in equal amount of add operations. But if we assume that the lights are roughly the same intensity, the adds always are performed between two numbers that are roughly the same magnitude, reducing the floating point error. Or simply you could use fp32 light accumulation counter and do the heavy math in fp16. You only perform a single fp32 add per light. However modern GGX lighting math requires fp32 precision in some math, but again you can carefully isolate the fp32 math from the fp16 math if you know the relative magnitudes of the operands. It is tricky if you borrow a lighting formula from a paper and don't understand exactly how it works.

Optimizing for fp16 is kind of similar than optimizing data storage. You need to know you data and do some analysis. People have been optimizing their storage (memory bandwidth) for ages. Slides 33-36 of this presentation are a good example (error analysis on compressed normal): https://michaldrobot.files.wordpress.com/2014/05/gcn_alu_opt_digitaldragons2014.pptx

(*). Subtracting camera position from world space positions itself causes catastrophic cancellation. But you do it only once and before any other math. If your world is large, I recommend using uint32 for positions instead of fp32. 3x uint32 (xyz) can present the whole earth at a few millimeter precision (including all the space inside earth). You only need a single integer ALU instruction (subtraction) to convert world space coordinates to camera space coordinates. Follow it by a single float multiply-add to scale the coordinate accordingly. Integer subtraction is full rate on all GPUs. No catastrophic cancellation at all.
 
Last edited:
FP64 won't be needed for real time graphics for a long time, if ever.
Agreed. People should use int32 world space coordinates instead (bounded around the game world). Integer coordinates guarantee even precision around the game world. Float precision is only good in the middle of the game world. Further away you go, lower the precision becomes. This is a testing nightmare. Int32 guarantees 256x better minimum precision than fp32 (regardless of the world size). This allows 256x256x256 (3d) larger game worlds than fp32 (that is 65536x more square miles of terrain). This is more than enough for most games. No need program silly world shifting hacks to make it work.
 
Now that I didn't know. Just to be 100% clear, I was referring to precision in shaders. Your info is definitely new to me, though! Comes with being a laywoman, etc.
 
Now that I didn't know. Just to be 100% clear, I was referring to precision in shaders. Your info is definitely new to me, though! Comes with being a laywoman, etc.
Floating point is floating point everywhere. Same problems in float code in CPU and in GPU. Time counters obviously are mostly handled by CPU side, but some engines have GPU side animation based on time counters as well and/or GPU side particle simulation. Floating point (fp32) world space positions are problematic for both CPU and GPU when the world size increases. As I said earlier, shader code can sidestep this problem somewhat by using camera centered coordinate space.

But in general fp16 ALU is more applicable to shaders (rendering) as big part of per pixel work is done on LDR data [0,1] range (or [-1,1] range for normal vectors). Even for HDR color processing, the final data output is 8 bits (or 10 bits for HDR10). fp16 is sufficient for most of the math. But the developer has to be much more careful and aware of the numeric ranges. Fp16 optimization is definitely not easy to do right.

This is also the reason why programmers working on scientific simulations prefer pure fp64. You could get the same results with mixed fp32 & fp64 code, but it is much harder to get it right. Same is true for mixed fp16 & fp32 code.
 
All existing games (except a few HDR games) output image at 8 bits per channel (RGB8). Input textures are also commonly 8 bit per channel (and BC compressed = lower quality as 8 bit).

Does this mean that HDR games will be able to take less advantage of FP16 in the future? That would be sadly ironic given that future hardware will enable double rate FP16 just as we're moving into the era of HDR games.
 
FP16 has 11 significant bits so it is enough for HDR10, not enough for HDR12. But then there's also INT16.
HDR10 and HDR12 are not linear. HDR curve is closer to floating point curve (logarithm) than linear (uint16 normalized). Fp output is definitely better than normalized integer output (linear brightness). Exponent bits matter (except for the highest bits as HDR standard max brightness doesn't reach 65504). Fp16 sign bit is obviously useless. Fp16 output should be fine for HDR12.

Fp16 math on HDR12 output is debatable. I would prefer at least 1 bit more ALU precision compared to storage/output precision (otherwise you get no rounding). Temporal supersampling (8x) + jittered rounding recover roughly 3 bits of color depth. Fp16 math before temporal pass and fp32 math after it should be fine for most cases.
 
Now that I didn't know. Just to be 100% clear, I was referring to precision in shaders. Your info is definitely new to me, though! Comes with being a laywoman, etc.

I suppose there may be some blog and forum litterature in the context of mobile games?
 
Last edited:
HDR10 and HDR12 are not linear. HDR curve is closer to floating point curve (logarithm) than linear (uint16 normalized). Fp output is definitely better than normalized integer output (linear brightness). Exponent bits matter (except for the highest bits as HDR standard max brightness doesn't reach 65504). Fp16 sign bit is obviously useless. Fp16 output should be fine for HDR12.
You're right. I was only thinking of the final output... Slight tangent here: what are actual display formats that HDR displays accept?
 
You're right. I was only thinking of the final output... Slight tangent here: what are actual display formats that HDR displays accept?
You create DXGI_FORMAT_R16G16B16A16_FLOAT swap chain. 1.0 is mapped to 80 nits. Driver converts to correct format (HDR10 / HDR12).

"On the software side, your application must be able to run in full-screen exclusive mode, and create an fp16 swap chain. These are necessary in order to provide the full precision data to the display driver, so that it can provide the high precision data to the display. If your application does not run in full-screen exclusive mode, the desktop compositor will strip the extra range and precision necessary for HDR. It is important to understand that this is a temporary restriction as Microsoft announced plans for OS support for HDR."

More info:
https://developer.nvidia.com/displaying-hdr-nuts-and-bolts
 
HDR10 and HDR12 are not linear. HDR curve is closer to floating point curve (logarithm) than linear (uint16 normalized). Fp output is definitely better than normalized integer output (linear brightness). Exponent bits matter (except for the highest bits as HDR standard max brightness doesn't reach 65504). Fp16 sign bit is obviously useless. Fp16 output should be fine for HDR12.

Fp16 math on HDR12 output is debatable. I would prefer at least 1 bit more ALU precision compared to storage/output precision (otherwise you get no rounding). Temporal supersampling (8x) + jittered rounding recover roughly 3 bits of color depth. Fp16 math before temporal pass and fp32 math after it should be fine for most cases.
This touches on something I have thought about.
While it is relatively easy to just look at the number formats and proclaim that something is sufficient or not, the underlying application here is not numerical analysis, but gaming. And it seems to me that in order for a numerical error to actually matter, a pixel error has to be:
1. Large enough to be readily detectable
2. Be consistent over time, as errant pixels showing up in a single frame is extremely unlikely to get noticed
3. The pixel error pretty much needs to correlate with similar errors on its neighbours to create a larger area that is objectionably anomalous (and consistently so over time).

And to judge that, you need hands on experience with performing precision experiments with actual games. Just thinking about it, it would seem you could get away with a lot. But that's just being an armchair expert, no better than just performing the numerical analysis.
How does it play out in reality?
 
Read 32 bit float articles targeted towards scientific audience. For them fp32 is "half precision" compared to fp64. This article seems to have it all, but is way too deep for most rendering programmers: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

The most important thing in float math is to avoid catastrophic cancellation. This means subtracting two (big) values from each other that are close to each other. This operation commonly happens when you use world space coordinates in your math, and calculate vectors between two points. For example localLight->surfacePixel, camera->surfacePixel, vertex1->vertex2 (edge math in world space), etc. The solution for this is to avoid doing math in world space. Just don't do it. It causes problems even on fp32 if your world is big enough. The first thing you should do is subtract camera position from all world space data (*). Absolutely no math before this. This way your floating point error is localized around the camera. Closer to camera = less error, further away = more error. Perspective projection makes everything smaller at distance, normalizing the error, assuring that no matter what distance, the error is always smaller than some subpixel fraction...

Thanks Sebbbi, I read that document a long time ago. Your other advice is appreciated.
 
Back
Top