Blazing Fast NVMEs and Direct Storage API for PCs spawn

Deleted member 13524 · Sep 2, 2020

Silent_Buddha said:
While it may not be dedicated hardware, it isn't necessarily using general compute shader cores. Considering the limitation (Turing and Ampere) it's far more likely that it's doing it on the Tensor cores. Something that doesn't generally get fully utilized in games.

Tensor cores consist of tiny ALUs designed for handling small variables (maximum FP16), focused on ML inference. Why would these be good for decompression of large datasets? Is there any research pointing to successfully using tensor cores for data decompression?

Same goes for INT8 and INT4 processing. Quad-rate INT8 just means it performs four INT8 operations in parallel, not that a single INT8 operation runs 4x faster.
I'm yet to see how data decompression can become infinitely parallel like graphics rendering or NN learning or NN inference.

For example, the Switch got a significant boost in loading times when Nintendo allowed the CPU cores to go up to 1.9GHz in loading screens. That's a boost coming from higher single-threaded performance. If the TX1 GPU's 256 shader cores with 2xFP16 throughput were any good for data decompression then it would have been implemented, as the shader cores are AFAIK mostly unused during loading screens.

Silent_Buddha said:
Another possibility is the dual issue fp32. Considering that may or may not get leveraged much in games, it's possible that NV could use some of that capability for DirectStorage without hindering game performance.

Dual issue FP32 just means more threads working in parallel, not higher single-threaded performance on FP32. The previous point still stands.

cheapchips · Sep 2, 2020

And no mentioning that in order to send data directly to the GPU people are going need to buy new hardware due to how current motherboards and PCIEX lanes are set-up

/Laughs at people who future proof their PC's.

Infinisearch · Sep 2, 2020

ToTTenTranz said:
I'm yet to see how data decompression can become infinitely parallel like graphics rendering or NN learning or NN inference.

Block based techniques or as

Osamar said:
I was thinking in something like MKV for games. A package made of (code+shaders+audio/music+assets).

might have suggested (MKV is a container format... don't remember alot about those) like video compression the constant type frames and intermediate or dependent frames. In addition I'm pretty sure some domain specific ones are easily parallelizable.

ToTTenTranz said:
Tensor cores consist of tiny ALUs designed for handling small variables (maximum FP16), focused on ML inference.

Just wondering are these fixed function or fully programmable?

iroboto · Sep 2, 2020

cheapchips said:
/Laughs at people who future proof their PC's.

I am crying. Should have waited a little longer

Jawed · Sep 2, 2020

Dictator said:
I do not understand why you write that it not being dedicated hardware somehow makes it "worse". A different Design achieving the same goal of acceleration decompression is not inherenrly worse due to its differentness. If I were to take a guess MS and Sony went the Route of hardware decompression blocks for the conglomerative cost (thermal, monetary, yield, etc.) of utilising more generalised Transistors on the GPU for the same purpose. If MS or Sony did it on the GPU like NV, it would mean less resources for graphics or conversely more die space or more heat and powe usage directly on the SoC. Using a dedicated GPU with in a System with swappable parts for such a purpose does not seem inferior, just a smart way to enforce a standard via a swappable Part. Basically, it makes a lot of sense on the PC to do it differently and not with a hw Block on a motherboard. Think of the difference of using an ASIC, an FPGA or a generalised CPU for the same task - one may make a lot more sense under a certain Budget, manufakturing limitation, or based upon the actual computation necessary and the overhead.
I see no reason why it needs to be deemed inferior for achieving the same goal in a different manner.

You are ignoring parallelised queuing, priority control and latency. You're also ignoring GPU memory fragmentation problems.

PSman1700 · Sep 2, 2020

iroboto said:
I am crying. Should have waited a little longer

Always best to wait out when theres a generational shift. I'm sure you won't have to replace everything though.

fehu · Sep 2, 2020

What? GPU can be linked directly to CPU and main memory, why the need for a new setup?

Betanumerical · Sep 2, 2020

fehu said:
What? GPU can be linked directly to CPU and main memory, why the need for a new setup?

Yeah i don't get this either, surely there can be PCIE transfers between the NVMe device and the GPU directly.

PSman1700 · Sep 2, 2020

There's probably more to it, hence the need of a new mainboard.

Jay · Sep 2, 2020

When it said supported pc's, could be talking about nvme drives, types, i.e. when it says some need less operations etc.
Even if new hardware is required to take full advantage of direct storage, I would expect current systems with applicable DX12U gpu's to see a big boost in supported games.
Games will still be limited to fast loading for a while yet, I don't expect gameplay changes for a while.

So, don't feel down about recent upgrades yet....

iroboto · Sep 2, 2020

fehu said:
What? GPU can be linked directly to CPU and main memory, why the need for a new setup?

Fear in need of a new mobo.

PSman1700 · Sep 2, 2020

iroboto said:
Fear in need of a new mobo.

It's the most work, but not the most expensive

I'm going to wait atleast, before plunging in. If i have to upgrade then there will be a 3090 24GB 40TF in there, il sell the old stuff then. Wasn't planning this originally but ok.

fellix · Sep 2, 2020

DegustatoR said:
More DX news: https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

So, this new storage API specifically exploits the NVMe performance features? Looks like SATA SSDs will not benefit from this.

Dictator · Sep 2, 2020

Jawed said:
You are ignoring parallelised queuing, priority control and latency. You're also ignoring GPU memory fragmentation problems.

https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads. The DirectStorage programming model essentially gives developers direct control over that highly optimized hardware.

In addition, existing storage APIs also incur a lot of ‘extra steps’ between an application making an IO request and the request being fulfilled by the storage device, resulting in unnecessary request overhead. These extra steps can be things like data transformations needed during certain parts of normal IO operation. However, these steps aren’t required for every IO request on every NVMe drive on every gaming machine. With a supported NVMe drive and properly configured gaming machine, DirectStorage will be able to detect up front that these extra steps are not required and skip all the necessary checks/operations making every IO request cheaper to fulfill.

Emphasis is mine.
It seems like direct storage is about addressing existing IO "hang ups" on PC and leveraging the special nature of NVMe drives. I see no reason to fret or assume it is directly worse.

What do you mean by GPU memory fragmentation problems?

PSman1700 · Sep 2, 2020

I'm far from an expert, but it seems like NV has their own answer to the SSD tech in consoles. They have a stake in pc gaming and probably invested alot into this tech. Frankly, just going by both sony and NV PR showcasings, the NV SSD tech seems more flexible and even faster.

DavidGraham · Sep 2, 2020

It appears that only GPUs with tensor cores "RTX" will be able to accelerate DirectStorage. Microsoft only mentioned them for now, RDNA2 GPUs with ML instructions are not mentioned so far.

Malo · Sep 2, 2020

DavidGraham said:
It appears that only GPUs with tensor cores "RTX" will be able to accelerate DirectStorage. Microsoft only mentioned them for now, RDNA2 GPUs with ML instructions are not mentioned so far.

You know it could just be that no RDNA2 GPUs are on the market yet so they can't mention any capabilities of them. But I understand your need to push everything Nvidia and RTX.

pjbliverpool · Sep 2, 2020

Betanumerical said:
Yeah i don't get this either, surely there can be PCIE transfers between the NVMe device and the GPU directly.

P2P DMA is supported between any 2 PCIe devices on Zen/Zen+ platforms and presumably Zen 2 as well. I'm not sure about Intel.

However that's assuming the data is transferred via P2P DMA. I don't think that's been confirmed. While the diagrams do suggest completely cutting out the CPU and system memory, that may just be representative of the amount of overhead that is removed from that process.

The exact wording of the press release is open to interpretation. The sentence "DirectStorage will be supported on certain systems with NVMe drives" could simply mean you need an NVMe drive. Although I expect Windows 10 is also a requirement, probably a recent version of it. And possibly a DX12 capable GPU?

DavidGraham · Sep 2, 2020

Malo said:
You know it could just be that no RDNA2 GPUs are on the market yet so they can't mention any capabilities of them. But I understand your need to push everything Nvidia and RTX.

No need to be snarky, Microsoft announced that Xe and RDNA2 GPUs support DX12U well before their presence in the market, they didn't do the same here.

Jawed · Sep 2, 2020

Dictator said:
What do you mean by GPU memory fragmentation problems?

Do you know what disk fragmentation is? GPUs have the same problem with their memory. This is one of the reasons why Vulkan/DX12 programming is difficult, since developers are now fully exposed to the memory fragmentation problem.

Blazing Fast NVMEs and Direct Storage API for PCs spawn

Deleted member 13524

Guest

cheapchips

Infinisearch

iroboto

Daft Funk

Jawed

PSman1700

fehu

Betanumerical

PSman1700

Jay

iroboto

Daft Funk

PSman1700

fellix

Dictator

PSman1700

DavidGraham

Malo

Yak Mechanicum

pjbliverpool

B3D Scallywag

DavidGraham

Jawed

Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Deleted member 13524

Guest

Daft Funk

Daft Funk

Yak Mechanicum

B3D Scallywag

Blazing Fast NVMEs and Direct Storage API for PCs spawn