Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

While it may not be dedicated hardware, it isn't necessarily using general compute shader cores. Considering the limitation (Turing and Ampere) it's far more likely that it's doing it on the Tensor cores. Something that doesn't generally get fully utilized in games.

Tensor cores consist of tiny ALUs designed for handling small variables (maximum FP16), focused on ML inference. Why would these be good for decompression of large datasets? Is there any research pointing to successfully using tensor cores for data decompression?

Same goes for INT8 and INT4 processing. Quad-rate INT8 just means it performs four INT8 operations in parallel, not that a single INT8 operation runs 4x faster.
I'm yet to see how data decompression can become infinitely parallel like graphics rendering or NN learning or NN inference.

For example, the Switch got a significant boost in loading times when Nintendo allowed the CPU cores to go up to 1.9GHz in loading screens. That's a boost coming from higher single-threaded performance. If the TX1 GPU's 256 shader cores with 2xFP16 throughput were any good for data decompression then it would have been implemented, as the shader cores are AFAIK mostly unused during loading screens.


Another possibility is the dual issue fp32. Considering that may or may not get leveraged much in games, it's possible that NV could use some of that capability for DirectStorage without hindering game performance.
Dual issue FP32 just means more threads working in parallel, not higher single-threaded performance on FP32. The previous point still stands.
 
I'm yet to see how data decompression can become infinitely parallel like graphics rendering or NN learning or NN inference.
Block based techniques or as
I was thinking in something like MKV for games. A package made of (code+shaders+audio/music+assets).
might have suggested (MKV is a container format... don't remember alot about those) like video compression the constant type frames and intermediate or dependent frames. In addition I'm pretty sure some domain specific ones are easily parallelizable.

Tensor cores consist of tiny ALUs designed for handling small variables (maximum FP16), focused on ML inference.
Just wondering are these fixed function or fully programmable?
 
I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inherenrly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less resources for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard. Think of the difference of using an ASIC, an FPGA or a generalised CPU for the same task - one may make a lot more sense under a certain Budget, manufakturing limitation, or based upon the actual computation necessary and the overhead.
I see no reason why it needs to be deemed inferior for achieving the same goal in a different manner.
You are ignoring parallelised queuing, priority control and latency. You're also ignoring GPU memory fragmentation problems.
 
When it said supported pc's, could be talking about nvme drives, types, i.e. when it says some need less operations etc.
Even if new hardware is required to take full advantage of direct storage, I would expect current systems with applicable DX12U gpu's to see a big boost in supported games.
Games will still be limited to fast loading for a while yet, I don't expect gameplay changes for a while.

So, don't feel down about recent upgrades yet....
 
Fear in need of a new mobo.

It's the most work, but not the most expensive :p I'm going to wait atleast, before plunging in. If i have to upgrade then there will be a 3090 24GB 40TF in there, il sell the old stuff then. Wasn't planning this originally but ok.
 
You are ignoring parallelised queuing, priority control and latency. You're also ignoring GPU memory fragmentation problems.
https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/
NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads. The DirectStorage programming model essentially gives developers direct control over that highly optimized hardware.
In addition, existing storage APIs also incur a lot of ‘extra steps’ between an application making an IO request and the request being fulfilled by the storage device, resulting in unnecessary request overhead. These extra steps can be things like data transformations needed during certain parts of normal IO operation. However, these steps aren’t required for every IO request on every NVMe drive on every gaming machine. With a supported NVMe drive and properly configured gaming machine, DirectStorage will be able to detect up front that these extra steps are not required and skip all the necessary checks/operations making every IO request cheaper to fulfill.
Emphasis is mine.
It seems like direct storage is about addressing existing IO "hang ups" on PC and leveraging the special nature of NVMe drives. I see no reason to fret or assume it is directly worse.

What do you mean by GPU memory fragmentation problems?
 
I'm far from an expert, but it seems like NV has their own answer to the SSD tech in consoles. They have a stake in pc gaming and probably invested alot into this tech. Frankly, just going by both sony and NV PR showcasings, the NV SSD tech seems more flexible and even faster.
 
It appears that only GPUs with tensor cores "RTX" will be able to accelerate DirectStorage. Microsoft only mentioned them for now, RDNA2 GPUs with ML instructions are not mentioned so far.
 
It appears that only GPUs with tensor cores "RTX" will be able to accelerate DirectStorage. Microsoft only mentioned them for now, RDNA2 GPUs with ML instructions are not mentioned so far.
You know it could just be that no RDNA2 GPUs are on the market yet so they can't mention any capabilities of them. But I understand your need to push everything Nvidia and RTX.
 
Yeah i don't get this either, surely there can be PCIE transfers between the NVMe device and the GPU directly.

P2P DMA is supported between any 2 PCIe devices on Zen/Zen+ platforms and presumably Zen 2 as well. I'm not sure about Intel.

However that's assuming the data is transferred via P2P DMA. I don't think that's been confirmed. While the diagrams do suggest completely cutting out the CPU and system memory, that may just be representative of the amount of overhead that is removed from that process.

The exact wording of the press release is open to interpretation. The sentence "DirectStorage will be supported on certain systems with NVMe drives" could simply mean you need an NVMe drive. Although I expect Windows 10 is also a requirement, probably a recent version of it. And possibly a DX12 capable GPU?
 
You know it could just be that no RDNA2 GPUs are on the market yet so they can't mention any capabilities of them. But I understand your need to push everything Nvidia and RTX.
No need to be snarky, Microsoft announced that Xe and RDNA2 GPUs support DX12U well before their presence in the market, they didn't do the same here.
 
Back
Top