D
Deleted member 13524
Guest
While it may not be dedicated hardware, it isn't necessarily using general compute shader cores. Considering the limitation (Turing and Ampere) it's far more likely that it's doing it on the Tensor cores. Something that doesn't generally get fully utilized in games.
Tensor cores consist of tiny ALUs designed for handling small variables (maximum FP16), focused on ML inference. Why would these be good for decompression of large datasets? Is there any research pointing to successfully using tensor cores for data decompression?
Same goes for INT8 and INT4 processing. Quad-rate INT8 just means it performs four INT8 operations in parallel, not that a single INT8 operation runs 4x faster.
I'm yet to see how data decompression can become infinitely parallel like graphics rendering or NN learning or NN inference.
For example, the Switch got a significant boost in loading times when Nintendo allowed the CPU cores to go up to 1.9GHz in loading screens. That's a boost coming from higher single-threaded performance. If the TX1 GPU's 256 shader cores with 2xFP16 throughput were any good for data decompression then it would have been implemented, as the shader cores are AFAIK mostly unused during loading screens.
Dual issue FP32 just means more threads working in parallel, not higher single-threaded performance on FP32. The previous point still stands.Another possibility is the dual issue fp32. Considering that may or may not get leveraged much in games, it's possible that NV could use some of that capability for DirectStorage without hindering game performance.