Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

Discussion in 'PC Hardware, Software and Displays' started by DavidGraham, May 18, 2020.

  1. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    I just remembered a theory I had from the console world. If I pretend that the 6GB of lower speed RAM is 3 channels. Maybe that is dedicated to the OS and a 'bounce buffer' for the gpu to output to the high speed memory. Maybe PC GPU/OS drivers can do the same thing?

    edit - xbox series x. 16GB RAM - 10 high speed 6 lower speed.
     
    #341 Infinisearch, Sep 5, 2020
    Last edited: Sep 5, 2020
  2. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Why 'No.' it seems to me that it is the 'temp' data of decompression that is susceptable to memory fragmentation.
     
  3. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    191
    Likes Received:
    131
    So is this going to be the first time with some mass cut-off or exodus because games would run horribly on HDD/slow SSDs? I still remember playing Crysis on a Pentium.

    I think a lot of multi-plats might still target those much slower 560mb/s SSD next gen if those are the majority, next step up from HDDs.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I was agreeing that an overprovisioned pool/chunk of memory is a potential solution :)
     
  5. Osamar

    Newcomer

    Joined:
    Sep 19, 2006
    Messages:
    231
    Likes Received:
    43
    Location:
    40,00ºN - 00,00ºE
    A collection on ideas and thoughts about DirectStorage (DS) made just by reading and trying to understand.
    Waiting for your corrections, points out, explanations, etc.

    - (DS) is an API based on the capabilities of NVME that are underused at this moment and by a packing of assets in a new way.
    - The MVME side is universal. The possiblities are there, but legacy APIs ignore them, consoles are free of this burden.
    - The assets treatment can by fully hardware (XboxSerie), hardware assited (RTX IO, hope AMD IO too) or pure software (CPU) unpacking nd installing like todays HDD games.
    - (DS) is part of DirectX. It can be detached and used alone Direct3D+(DS)Vulkan+(DS) or is (DS) so embeded in Direct3D, that the combo Direct3D+(DS) is needed??

    The way I see it, (DS) is the DX12/Vulkan of disk drive storage access. A low level I/O Api controled by game engine to access a clearly define and structured assets packaged.
     
  6. Infinisearch

    Veteran

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Oh well there goes my confidence in my reading comprehension. Any opinion on my stupid theory that I re-quoted below?

    Oh and since we're talking about moving fast moving did anyone suggest mip chain compression as a piece of what they are doing? With a special case potential importance on finding commonalities between all textures in the game for the further away mip slices.
     
  7. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    May I remind you that GPUDirect Storage / RDMA uses peer-to-peer DMA, which requires a PCIe switch in between NIC/SSD and GPU.

    So far such setup is only found on Radeon Pro SSG (Solid State Graphics) cards and NVidia DGX-2 supercomputers - nothing like this would be possible on regular PCs without proper support from chipset/CPU, and it's certainly not how DirectStorage actually works on the Xbox Series X, because its custom Ryzen APU only has system memory (though very fast one) and no dedicated video memory.

    So unless upcoming mainstream desktop platforms from AMD and Intel support P2P DMA across separate PCIe root ports (which I really doubt), data from the NVMe SSD would still have to travel through the system RAM before it ends up in the video RAM.

    My thoughts too - supported/recommended SSDs would probaly need to implement some block-size related NVMe 1.3 features in their firmware, like LBA Data Size (LBADS) for native 4 KByte sectors (4Kn instead of 512e) and maybe even larger sector sizes, and I/O block alignment/granularity hints fort the OS storage driver, like Namespace Optimal I/O Boundary (NOIOB), Write Size (NOWS), Namespace Preferred Write Granularity (NPWG), Write Alignment (NPWA), Deallocate Granularity (NPDG), and Deallocate Alignment (NPDA).
     
    #347 DmitryKo, Sep 7, 2020
    Last edited: Sep 8, 2020
    Kej, Jawed, iroboto and 1 other person like this.
  8. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    The entire diagram is an error. Actual RTX IO news article only talks about GPU decompression, with no mention of direct data transfer to the video memory as in GPUDirect.


    I guess they had to present some visuals for product unveiling, and they've just reused the diagrams from the GPUDirect Storage presentation without giving it much thought.

    https://devblogs.nvidia.com/gpudirect-storage/#attachment_15420
    https://devblogs.nvidia.com/gpudirect-storage/#attachment_15426

    Note how the RTX IO slide copies the first GPUDirect slide, where a box in the same position is actually labeled 'PCIe Switch', and note how the left part of the second GPUDirect slide presents the same connection flow from SSD through the NIC - because it's actually networked storage that uses NVM Express over Fabrics (NVMe-oF).
    The RTX IO slide looks like an improper synthesis of these two GPUDirect slides.


    Actual API implementation is still in early stages, since PC version of DirectStorage has to co-exist with the driver stack in Windows IO Manager which handles filesystems and disk devices.


    [​IMG]

    [​IMG]
     
    #348 DmitryKo, Sep 7, 2020
    Last edited: Sep 8, 2020
    Kej, Jawed, nutball and 7 others like this.
  9. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Yes I suspect you may be right. I speculated further up the thread that the diagram may merely be representative of the reduction in CPU overhead rather than specifically representing a P2P DMA. I've certainly not seen any mention of DMA in any of the literature around this, nor mention of any additional hardware requirements beyond an RTX GPU and Windows 10. Even the NVMe requirement is questionable as I've seen mention of SATA SSD's and even mechanical drives, although those reports may be inaccurate.

    Well per our previous discussion it does seem that AMD has supported this since Zen, Intel is likely another matter though.

    Regardless of whether the data still needs to go via system memory or not though, the reduction is CPU overhead outside of the decompression requirements is still very significant. From 2 cores down to .5 according to Nvidia. But that could easily be the Direct Storage effect.
     
    BRiT and PSman1700 like this.
  10. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    No, it's only supported for chipset ports that share the same CPU root port. It's not really possible to initiate P2P between different CPU root ports in the PCIe hierarchy, such as NVMe M.2 SSD on an x4 link and GPU on a different x16 link.


    To recap, P2P support has to be manifested with a PCIe Switch in the topology - a collection of one Upstream Port with multiple Downstrean ports. This typically concerns devices connected to the same root port, such as chipset sharing an x4 link from the CPU to multiple PCIe devices like NICs/audio/M.2 SSDs and additional PCIe slots.

    AMD x470/x570 chipsets have two-level hierarhcy of switches for NIC ports, but the Linux driver only walks through the nearest upstream port and never reaches the upper level upstream port.

    So when the Linux driver detects several devices on the same root port, but can't find a connection between them through the hierarchy of upstream/downstream ports, it would enable P2P using the whitelist of recent CPUs.

    EDIT: Actually it is possible to initiate P2P DMA transfers between root ports on the same PCIe Root Complex (PCI Host Bridge).

    Then why a Network Interface Card is pictured in the data path? I'd think they just needed pretty graphics with some vague message of great things to come, so they've reused the GPUDirect slide.

     
    #350 DmitryKo, Sep 8, 2020
    Last edited: Sep 9, 2020
    BRiT likes this.
  11. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    Jensen specifically states that there are 3 new advances with RTX I/O. "New I/O APIs for direct transfer from SSD to GPU memory" being one of them.

    I can't believe Nvidia would be so lazy and inept to simply use some slides that aren't even remotely representative of what is actually possible.

    You bring up a good point about them not mentioning it in their blogs on their site, but I think that's because they aren't wanting to speak about hardware requirements at this time, because IF it does require new hardware, people might be inclined to not upgrade until motherboards are out which support it.

    I mean, the fact that they've mentioned "certain NVMe drives" are required, should tell you that there's going to be some other requirements as well. If this was simply about GPU decompression, then there's no reason why certain NVMe drives would work, and others not.

    I think they'll be ready to talk about hardware requirements sometime mid next year.
     
    PSman1700 likes this.
  12. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    It does not mean peer-to-peer DMA between SSD and GPU is involved. I was specifically answering to the suggestion that RTX IO is similar to GPUDirect Storage as found on the DGX-2 with a two-level PCIe switch complex.

    I can find no other viable explanation of how the NIC block ended up on the RTX IO diagram.
     
    BRiT likes this.
  13. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    I just gave you one. New hardware will be required.. and they aren't ready to talk about that yet. Until then, it's all about the GPU decompression.
     
  14. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    How exactly the network controller is going to be required for DirectStorage / RTX IO data path on the PC, other than having been taken from a different GPUDirect RDMA drawing where it belongs?

    Sorry, I don't follow the logic.

    They may require firmware support for certain NVMe 1.3 features, like 4K sector size and optimal I/O boundary hints. Or they may certify certain NVMe drives with specific minimum read/write/IOPS performance to match the Xbox Series X.
     
    BRiT likes this.
  15. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    You're looking at the picture too literally. "NIC" could refer to a new controller on the motherboard designed for the specific purpose of routing data from the SSD to GPU. If they aren't ready to talk about it... of course they're going to use the same terminology as their GPUDirect implementation. Stating anything else at the moment would clue people in that they're going to need a new motherboard.. and they have reasons at the moment in which they most definitely wouldn't want to do that.

    From the DirectStorage blog:
    "Certain systems with NVMe drives"...not "systems with certain NVMe drives" It implies something more is changing.
     
    PSman1700 and BRiT like this.
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    But that quote specifically states P2P DMA can be enabled using a whitelist. So the hardware does support that functionality. It's Linux and the Linux driver that don't natively support it. AMD also states the hardware supports it here: https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.2-AMD-Zen-P2P-DMA

    If Microsoft wanted to support that hardware capability natively in Windows 10 through something like Direct Storage, then I'm not sure why they'd be unable to.

    Even if the slide doesn't accurately reflect the data flow, it can still be representative of the vastly reduced CPU load resulting from RTX IO.
     
    PSman1700 likes this.
  17. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    The chances for this are close to zero.

    NIC is an established acronym for "Network Interface Card/Controller", the term has been widely used since early 1980s. There are no alternative meanings for this acronym.

    Storage controllers are typically referred to as "NVMe/SATA/SAS/SCSI controller".

    "New Interface Controller" would be a terrible name for any device.


    It makes no meaningful difference. NVMe drive is a part of the system, so NVMe drive requirements make a part of system requirements.


    They further elaborate on these "certain systems" in the the blog post: With a supported NVMe drive and properly configured gaming machine, DirectStorage etc...

    Thais wouid probaly require a certain class of SSDs, but is unlikely to require an entirely new motherboard.

    They are not using "the same terminology", in fact the GPUDirect Storage news and the RTX IO news have almost nothing in common textually.
     
    #357 DmitryKo, Sep 8, 2020
    Last edited: Sep 8, 2020
    BRiT likes this.
  18. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    Only for devices on the same CPU root port. SSD and GPU are typicaly connected to the CPU on two different root ports, so the driver won't enable peer-to-peer for them.

    EDIT: Actualy P2P DMA is enabled for devices on the same PCIe Root Complex (PCI Host Bridge in Linux terms) and even between different Root Complexes (in multiprocessor systems and NUMA nodes).

    Can't really see how this diagram could be interpreted in terms of reduced CPU load.
     
    #358 DmitryKo, Sep 8, 2020
    Last edited: Dec 12, 2020
    BRiT likes this.
  19. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    Ok... but Jensen specifically stating during the presentation "new APIs for fast loading and streaming directly from SSD to GPU memory" ... you can explain some slides being copied (which I still don't buy) but Jensen specifically stating that? Nah.
     
    PSman1700 likes this.
  20. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,502
    Likes Received:
    24,399
    I think for now we don't have enough specificity to have a clear idea of what we'll be getting. The only clear aspect was that Microsoft would have some sort of internal checks done to know if they can safely bypass several layers of legacy in order to optimize operations.
     
    PSman1700 and Remij like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...