Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Only for devices on the same CPU root port.

SSD and GPU are typicaly connected to the CPU on two different root ports, so the driver won't enable peer-to-peer for them

But that's directly contradicted by the AMD GPU engineers statement in the link I posted earlier. I.e.

"Many Root Complexes, including AMD ZEN, *do* support peer-to-peer DMA even between Root Ports. Add a whitelist and allow peer-to-peer DMA if both participants are attached to a Root Complex known to support it".
 
I think for now we don't have enough specificity to have a clear idea of what we'll be getting. The only clear aspect was that Microsoft would have some sort of internal checks done to know if they can safely bypass several layers of legacy in order to optimize operations.

This. Before the NV announcement, there where speculations going on the PC would lag behind severly, for many years. And here we are. I think theres too much stake in it for NV, AMD, Intel, MS and devs to not fix the ' ssd problem'.
 
"new APIs for fast loading and streaming directly from SSD to GPU memory" ... you can explain some slides being copied (which I still don't buy) but Jensen specifically stating that? Nah.
It's a high-level API to load data from SSD to GPU memory, but nobody said anything about peer-to-peer DMA transfers so far.

But that's directly contradicted by the AMD GPU engineers statement
I'm telling you how the actual P2PDMA component works, as reflected in the current source code and the official kernel documentation.
The code detects whether the P2P client and provider belong to the same upstream port (such as a switch). If this fails, it checks whether these devices belong to the same PCIe root port (host bridge in legacy PCI terms) on the whitelisted PCIe Root Complex.

EDIT: It turns out that PCI Host Bridge is an equivalent of the PCIe Root Complex, not PCIe Root Port, so P2P DMA is possible within the PCIe Root Complex.
 
Last edited:
It's a high-level API to load data from SSD to GPU memory, but nobody said anything about peer-to-peer DMA transfers so far.
Fair enough. You're vastly more knowledgeable about this stuff than I am.. I just cant see Nvidia being lazy and including a slide as well as directly saying "directly from SSD to GPU memory" if that wasn't somehow going to be the case.

Just have to wait and see what happens I guess. Still very exciting developments. That marble demo loading in 1.62s with GPU decompression VS 5.25 is a nice improvement.

Resize
 
It's a high-level API to load data from SSD to GPU memory, but nobody said anything about peer-to-peer DMA transfers so far.

Agreed that until someone specifically confirms that P2P DMA is how this is being done, we shouldn't assume it to be the case, given the support challenges involved. That said, Jenson does specifically talk about transferring data *directly* between SSD and GPU with a diagram very clearly showing showing the same. In light of that I think we should at least keep an open mind about the possibility that they've solved this in Windows for supported platforms.

I'm telling you how the actual P2P component works, as reflected in the current source code and the official kernel documentation.
The code detects whether the P2P client and provider belong to the same upstream port (such as a switch). If this fails, it checks whether these devices belong to the same PCIe root port (host bridge in legacy PCI terms) on the whitelisted PCIe Root Complex.

Thanks for the code sample, quite illuminating. Everything in there though says that P2PDMA is supported by Zen platforms provided its in the whitelist. In fact it goes further than previous sources to specifically state that all AMD platforms since Zen support P2PDMA between different root ports and provides a list of several Intel platforms that do the same. Im on a phone at the moment so can't check the device ID's but it'd be interesting to see what platforms those device ID's correspond to.
 
Everything in there though says that P2PDMA is supported by Zen platforms provided its in the whitelist. In fact it goes further than previous sources to specifically state that all AMD platforms since Zen support P2PDMA between different root ports

I gave the source code another look, and I admit I was wrong to asume that 'PCI Host Bridge' refers to a 'PCIe Root Port'. In fact, 'PCI Host Bridge' is an equivalent of the PCIe Root Complex, while PCIe Root Ports would appear as 'PCI/PCI Bridge' in the topology.


So you were right in quoting the comments - the Linux P2P DMA driver does support peer-to-peer DMA transfers between devices on the same PCIe Root Complex, and an August code update enabled P2P DMA on any Zen or higher processor.


I just cant see Nvidia being lazy and including a slide as well as directly saying "directly from SSD to GPU memory" if that wasn't somehow going to be the case.
Yes, it looks like P2P DMA would be possible between SSD and GPU on recent AMD platforms, at least in Linux 5.9.

Placing a NIC between SSD and GPU is clearly an error though, this was copied from the GPUDirect slide as is.
 
Last edited:
I gave the source code another look, and I admit I was wrong to asume that 'PCI Host Bridge' refers to a 'PCIe Root Port'. In fact, 'PCI Host Bridge' is an equivalent of the PCIe Root Complex, while PCIe Root Ports appears as 'PCI/PCI Bridge' in the topology.

So you were right in quoting the comments - the Linux P2P DMA driver does support transfers between devices on the same PCIe Root Complex, and an August code update enabled P2P DMA on any Zen or higher processor.


Yes, it looks like P2P DMA would possible between SSD and GPU on recent AMD platforms, at least in Linux 5.9.

Placing a NIC between SSD and GPU is clearly an error though, this was copied from the GPUDirect slide as is.

I wonder if the presentation itself is done by the hardware/software engineers themselfs or some sort of PR team. Somehow i think it's the latter.
 
~7 GByte/s top-end SSD controllers in 2020 Q3

Phison PS5018-E18, Silicon Moton SM2264, Samsung 980 Pro

FYI, first wave of Phison E18 products is coming to market:

Sabrent Rocket 4 Plus
https://www.sabrent.com/rocket-4-plus/

Galax HOF Extreme SSD
https://www.techpowerup.com/271947/galax-announces-pcie-4-0-hall-of-fame-extreme-ssd
https://www.galax.com/en/ssd/hof.html (no product page yet)


Silicon Motion SM2264/ SM2267 products should also be coming, ADATA showed a few prototypes at CES-2020.


PS. ADATA announced two new Silicon Motion models:
https://www.techpowerup.com/272381/...mmix-s70-pcie-gen4-m-2-2280-solid-state-drive
https://www.techpowerup.com/272182/...s50-lite-pcie-gen4-m-2-2280-solid-state-drive


PPS. Samsung 980 Pro:
https://www.anandtech.com/show/16087/the-samsung-980-pro-pcie-4-ssd-review
 
Last edited:
Samsung doesn't think you need anything but the basic copper typeplate for those speeds though

It'll be interesting to see how this all plays out once they release and we have thermal throttling tests performed.
 
FYI, first wave of Phison E18 products is coming to market:

Sabrent Rocket 4 Plus
https://www.sabrent.com/rocket-4-plus/

Galax HOF Extreme PCIe 4.0 SSD
https://www.techpowerup.com/271947/galax-announces-pcie-4-0-hall-of-fame-extreme-ssd


There is still no official announcement for Samsung 980 Pro:
https://www.anandtech.com/show/16052/samsung-980-pro-briefly-listed-online

No word on Silicon Motion SM2264 and SM2267 products either; ADATA showed a few prototypes at CES-2020.

Going to be interesting to see what Sony end up doing for the PS5 to try to control the heat output on NVME SSDs like these.

Regards,
SB
 
Going to be interesting to see what Sony end up doing for the PS5 to try to control the heat output on NVME SSDs like these.

Regards,
SB
This is probably the real reason why they need SSDs with higher bandwidth to compensate. Temps will probably run hot and they'll need a buffer for potential performance degradation due to the heat.
 
This is probably the real reason why they need SSDs with higher bandwidth to compensate.

Cerny said it was because standard commercially available drives do not have as many priority channels as the internal Sony drive.
 
Cerny said it was because standard commercially available drives do not have as many priority channels as the internal Sony drive.
Yea I remember him saying that. I'm mostly just playing devils advocate.

Still though, they have to have a reasonable handle on the heat to keep these drives from throttling. Next gen games in practice will likely be stressing these drives more than your typical PC usage I would imagine.. so that'll be something interesting to keep an eye on.
 
Lets see how much damage that could do, besides a lot for 16 TB ... Existing 4TB NVME at $500 times 4 is $2K plus whatever the cost of the card is.
 
Back
Top