AVX-512 is awesome. 16 wide (32b) float/int vectors. Scatter + gather. Execution masks. AVX-512 BW adds 8bit / 16bit operations (32 wide, 64 wide). Very useful for example for string processing. AVX-512 CDI has lots of nice instructions that help in many algorithms and make autovectorization easier.Would your assessment of potential usage in games be different if AVX-512 were universally supported and unified into a single version? Or do you think it's just not useful in games?
AVX-512 is perfect fit for SPMD-style languages (ISPC). AVX had only full width float ops, AVX2 added full width integer ops (needed for address calculation, etc) and gather, but still lacked scatter and execution masks. AVX-512 adds both and widens everything by 2x.
AVX-512 is also much better for autovectorization. Also full width integers, gather, scatter allow each loop iteration to read/write data from independent locations (not limited linear loops with no pointers / indirections). AVX-512 CDI and execution masks further help to solve autovectorization problems. AVX-512 is the first time it would be actually possible to autovectorize generic loops. AVX/AVX2 was only good for trivial cases.