DavidGraham
Veteran
These bits are very interesting:Advanced API Performance: Shaders | NVIDIA Technical Blog
This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Shaders play a critical…developer.nvidia.com
So NVIDIA states that Turing can actually benefit from FP16 math (it's twice the FP32 rate), but Ampere (and consequently Ada), doesn't care. Probably because of their 2xFP32 design.Don’t assume that half-precision floats are always faster than full precision and the reverse.
- On NVIDIA Ampere GPUs, it’s just as efficient to execute FP32 as FP16 instructions. The overhead of converting between precision formats may just end up with a net loss.
- NVIDIA Turing GPUs may benefit from using FP16 math, as FP16 can be issued at twice the rate of FP32.
Seems ROV remains expensive even on NVIDIA hardware, I am guessing that's why the feature didn't gain wide spread adoption within DX12. It's even far more expensive on AMD hardware.Don’t use raster order view (ROV) techniques pervasively.
- Guaranteeing order doesn’t come for free.