Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Discussion in 'Console Technology' started by Shortbread, Sep 18, 2020.

  1. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    842
    Likes Received:
    879
    I wouldn't go that far. Just look at the already quite good loading time of the HDD. E.g. some PS4 games were optimized for loading times after PS5 was out. So it was already possible before to make loading times better. But that was just never the focus before.
    E.g. in GTA V someone in the community found an easy fix for the loading times years ago but even that easy fix was never implemented in the console versions. Now with next-gen version they finally seem to have implemented that fix so loading times are much better now only because of a fix for that bad implementation of loading a config file before.
    Before this generation loading times were always just "good enough" to play the game. They were never in the focus. That GDC video above actually shows what happens if loading times get into the focus. Even HDD benefit from those optimizations.

    Yes, must more could be done with the big bandwidth improvement, but already the SATA SSD values show, there is not much more to win here as the bandwidth is just no longer the bottleneck. Even without the direct storage API.

    The PC already has a big advantage over time. Additional central memory. So even if you need a bit more startup-time you can cache much data into main memory which is much, much faster than any SSD. But as the SSD speed is not really the bottleneck anymore, this might not be a to big advantage.
    E.g. the AMD driver with a 8GB Vega card already shows, that the main memory can be enough to store texture there to not bottleneck the GPU. E.g. FarCry 6 + Texture pack runs quite well on Vega cards (when the feature in the driver was activated) while it is stuttering on RTX3070 cards.
    But I guess this is more a software problem with the games engine than anything else. More memory just fixes the bad implementation.
     
    #761 Allandor, Mar 26, 2022
    Last edited: Mar 26, 2022
    BRiT, DSoup and PSman1700 like this.
  2. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    I tried to explain that aswell but you present it much better. Anyway, maybe it is better to leave it at that, it is nothing more than platform warring at this point, ment or accidential. Like with FSR2 vs dlss just await the results, sometimes people get surprised by what can be done.
     
  3. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Agreed. In terms of the PS4 games benefiting from faster drives, I recall - quick googling - this Digital Foundry article of them replacing the stock HDD with an SSD which demonstrated - for the most part - not as massive a difference as you might expect, and which highlighted that what is oft considered 'loading' is actually waiting for the CPU to build bits of the game world.

    The Verge article quotes the Forsaken developer who also mentions that when you remove I/O bottlenecks you reveal others.

    There are things developers can be proactive about, including arranging and storing data in a format that facilitates quick loading and the ability of the engine to swiftly use the loaded. data A question that remains is how much effort is required for the developer to leverage modern technologies. Based on the disparity of loading times between some first party current generation console titles and third party efforts, I assume some effort is required to leverage this.

    The PC has the tremendous ability to adopt new techniques faster than consoles but if the goal it to move decompression to entirely to GPUs, that means aa bite of existing GDDR needs to be reserved for ephemeral storage of compressed data and decompressed data. This may not be that much an issue on on modern cards but if we accept the Steam hardware surveys, the vast majority of gamers are using more modest hardware.
     
    Allandor likes this.
  4. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    283
    Likes Received:
    474
    You sure about that? The point of DirectStorage is the direct access to GPU memory from the SSD. Meaning, the compressed data will likely be decompressed on the fly by the GPU while still being on the SSD. So only the decompressed data would be GDDR where the data gets used as textures for example. Is the GPU able to do that though, to use the SSD as extended GDDR? I am not sure.

    Otherwise this technology would be unuseable on every GPU that is not a 12 GB VRAM+ card by AMD. I can't imagine the DirectX engineers were not thinking about that.

    How is it handled on console then, especially on Series S with its low amount of memory? The decompression blocks do not have their own memory subsystem, do they?
     
    PSman1700 likes this.
  5. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    I am relying on what Microsoft have published, which according to their devblog is:

    When you say "the compressed data will likely be decompressed on the fly by the GPU while still being on the SSD", how would that happen using PC hardware? How do you envisage the GPU decompressing data on the SSD? How do you explain how the GPU accessing (reads/writes) the data on storage via the graphics driver, north-bridge, possibly the south-bridge, the drive controller and the NAND data?

    It's already slow enough that games avoid having the GPU access data on main RAM (and vice-versa) because of the latency. Data can, of course, be moved across the bus but accessing it instantaneously is not something that can be done. This is what differences local bus and central bus technologies.

    ¯\_(ツ)_/¯
     
  6. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    I've been saying this since Sony first talked about PS5's SSD speed. My PC has a SATA SSD, and NVME Gen 3 SSD, and a 5400RPM platter drive. I've tested games on all of those drives, and there are differences in loading in some games, but not in others. And... and I do not know exactly why, but I used to have a SATA SSD in there that one game would load slower on that it does on the 5400RPM drive. I don't know why exactly, but if I turned off real time protection in windows defender then it would load faster. That makes sense when comparing that drive to itself but I don't know why it affected that SSD and not the HDD or my other SSDs, as they showed little to no difference with realtime protection on/off.

    This isn't a new thing, either. I remember when I was playing Half-Life back in the day, I had a k6-2 and my cousin got a pre-built Pentium 3 machine. His drive was a 5400rpm while mine was 7200 but his game would load faster. I/O and drive speed are only part of the equation.
     
    chris1515, DSoup and swaaye like this.
  7. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    I honestly do not believe that 2 seconds vs 1 second matters at all. PS5 could be 2x as efficient... and I don't believe it will matter in the end. Look at the Win32 results already... What we're seeing here is a developer ACTUALLY design around having a fast efficient SSD to begin with. That was the biggest bottleneck previously holding loading performance back on PC. Their goal of literally 1 second of loading is commendable.. and they're already extremely close.. regardless of API they use.

    No, we already know that for the moment data must still go to RAM, and that the CPU must still copy the compressed data to GPU for decompression. Games which properly utilize DirectStorage+GPU decompression will actually likely have VRAM requirements reduced.
     
    PSman1700 and pjbliverpool like this.
  8. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    Hopefully one of these days we move to a newer solution as per Nvidia GTC
    https://developer.nvidia.com/gpudirect

    But somehow a direct bus from IO to GPU storage would need to be a specification for all mobos or something like that.
     
    DSoup likes this.
  9. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,312
    Yes, in theory since work is already required to implement DirectStorage, then it would seem logical that CPU compressed resources would be package differently than GPU compressed resources.

    At that point data would then go from SSD -> CPU (modern CPUs with Northbridge basically integrated into the CPU) then CPU data is routed to main memory while GPU data is routed to GPU memory. This of course, assumes that CPUs are capable to routing data streams prior to them hitting main memory.

    Regards,
    SB
     
    Remij likes this.
  10. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,312
    This was already known as Star Citizen already showed that massive increases in loading speed is achievable with the existing I/O on Windows (when using an SSD) just by re-architecting how the game handles I/O to take advantage of SSDs. DirectStorage just streamlines that more as well as removing or reducing some inefficiencies within the Windows storage stack and perhaps gives developers guidelines on how to achieve similar or perhaps better results to what the Star Citizen engineers accomplished via changes to their engine or data structures.

    It's the same on PS5 and XBS-X. Just using the faster I/O isn't enough to reap the full benefits of faster I/O. The developer must still architect their game and game assets in such a way as to be able to take advantage of the faster I/O. If you look at loading on PS5 when not specifically coding for the faster I/O you see basically similar loading speeds to PC games which aren't specifically coded to take advantage of SSDs.

    Regards,
    SB
     
  11. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Yes we're i agreement here. I only made the comment as you seemed to be suggesting that there would be bandwidth issues over PCIe on the PC specifically. Which obviously wouldn't be the case relative to the PS5 given that the narrowest of those PCIe busses is also present in the PS5.


    I certainly didn't mean to take you out of context. If I misunderstood what you were trying to say with the statement I quoted then that's my bad.

    That's all obviously correct. My argument is with the level of importance you're assigning to it. Certainly having to copy data multiple times between different RAM pools comes with a CPU overhead which is undesirable. But that's what DIrect Storage is designed to address (insofar as it reduces the cost of those operations to the point where they're trivial as opposed to removes the need for them). But the actual added latency from those operations which is measured in nanoseconds or at worst the low microseconds are insignificant compared to the multiple milliseconds of an average frame. So this really isn't something that's going to prevent PC's with "much faster drives" from actually realising much higher data throughput. What it may do though, particularly in systems that aren't using Direct Storage is increase CPU requirements. And if the CPU is the actual bottleneck for the load time, then that could be a disadvantage for the PC compared with a console. Although faster CPU's etc...

    Yes it is at the moment and that absolutely could give the PS5 an advantage until GPU decompression is in place. But my statement was in the context of the full implementation of DirectStorage where the decompression is taken off the CPU and moved to the GPU where it would not be a bottleneck. At that point you only have the IO management on the CPU which is fairly trivial under DirectStorage with modern multicore CPU's. Even then the CPU may still be the bottleneck, but the workload on the CPU will be very similar between PC and PS5. PS5 will liekly still have a small advantage in that area, but faster CPU's etc....

    I think there's a general over emphasis on how impactful these differences are on real world scenario's. At least under a full Direct Storage implementation which is specifically designed to mitigate the issues by significantly reducing the CPU overhead. The differences aren't really as significant as you seem to think. The main components are largely the same, the 2 primary differences are that on PC the GPU has it's own memory pool which sits at the other end of a very wide PCIe bus meaning data copies are required back and forth over that bus to main memory, and that the decompression is done within that memory pool rather than prior to hitting main memory like it is in the consoles. Mapping the routes the data takes looks something like this:

    PC
    NVMe -> Main Memory -> GPU Memory (decompression) -> Main memory (CPU data only)
    or

    NVMe -> Main Memory (CPU data decompression) -> GPU Memory (GPU data decompression)

    PS5
    NVMe -> Decompression Block -> Main Memory

    EDIT: Just watched the GDC presentation for Forespoken which clarifies the above. Some great info in there!

    So in the PC's case you're talking either 1 or 2 extra hops (depending on whether CPU and GPU data is split up for decompression by those units or it's all done on GPU and then the CPU data sent back to main memory), but the bus those hops go over is much wider than the consoles NVMe->Main memory bus, and the added latency incurred by those hops is miniscule.

    Chris already pointed out that the PS5 hardware decompression block uses 256KB for this function. So it's a complete non-issue for any GPU.
     
    #771 pjbliverpool, Mar 26, 2022
    Last edited: Mar 26, 2022
    Dampf and PSman1700 like this.
  12. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Not only that, adding a direct bus between the GPU and storage that would bypassing the north/south-bridges and the operating system adds a whole host of problems, not least the OS has no visibility of any of this activity. If you want to involve the OS then the CPU needs to be involved as well. Unless Nvidia are also proposing to have something like a secure satellite I/O processor on the GPU and have Microsoft re-engineer the Windows kernel to delegate certain I/O processes to the that processor, which will also keep the operating systems in the loop. ¯\_(ツ)_/¯

    There is a reason Microsoft and Sony designed the hardware architecture of their consoles the way they do - virtually identically - to achieve the same performance goals. That is the best way to do it. But it's a closed architecture with very limited expansion and scaleability options which would not suite the PC.
     
  13. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Yeah, its to save cost to begin with.
     
  14. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    Yes, there is a reason... it's the most efficient way to do it, considering how cost effective consoles have to be. On PC there are other ways of mitigating those differences.. by actually programming games for the PC architecture's strengths... wide buses, and high capacities. PCs will never have console level efficiencies in design.. we know that. There's more latencies getting data off the disk, and more latencies having to process that data to get it to its destination ready for use... but they can move larger amounts of data at one time, and both pools of memory can hold far more data.. requiring less fetching from disk.

    Short of not having a unified memory architecture... it's actually a bonus having a much LARGER pool of high bandwidth lower latency system RAM connected to much larger amounts of VRAM. On PC they'll make due with what they have.. I have no doubts about that. We're a long way away from needing absolutely INSTANT loading. Using Forspoken as an example.. if PC is 2 seconds and PS5 is 1 second.. I'm good on that. It will make literally no difference. Developers can, and will, design their game loading/transitions/streaming around what the hardware can do.. and thus if my black fade out and back in takes 1 second longer.. I'll deal with it.
     
    PSman1700 and Allandor like this.
  15. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,762
    Likes Received:
    2,639
    Location:
    Maastricht, The Netherlands
    Both pools can hold far more data? That depends on how you look at it. At its most extreme the PS5 basically has 14GB of VRAM. Not that many GPUs have that yet.
     
  16. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    That last part is one of the reasons why AF still isnt used all too much on consoles (DF). Also, shared memory can induce its own disadvantages too, latency, memory BW contention, amount of memory (quite limited compared to 16GB Vram gpus etc etc). Some are only seeing the disadvantages of one platform, ignoring the other platforms disadvantages.
    We will see soon enough what the differences are between PS5 and pc loading/streaming performance. I think some are going to be surprised.
     
  17. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    That isn't what is being described in the Nvidia link and the capability in hardware already exists to so this in modern consumer PC's.

    Nvidia are describing the transfer of data from SDD to GPU memory over the existing PCIe fabric and via the CPU's root complex. Most modern CPU's (from Zen upwards for AMD, and I'm not sure of Intel) already implement the hardware capability which is an optional requirement of the PCI Express spec. The difference from how it works now it that the data doesn't get copied to main memory first, and I believe the data copy is handled by the NVMe drives own DMA engine rather than the CPU. The CPU (and OS) would still send the request to the DMA engine though.

    This seems to be what Nvidia's RTX-IO is proposing which in turn appears to be based on their GPUDirect Storage described in the link iroboto posted above. I'm pretty sure it's also what the consoles, or at least the PS5 is doing - using of course, the Zen2's root complex.
     
    PSman1700 likes this.
  18. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    To add to my previous response, the North and South bridges play no part here. The North Bridge because it doesn't exist in modern systems, and the South Bridge because a systems primary NVMe drive should already bypass the SB and link directly to the CPU - just like the consoles.
     
    PSman1700 likes this.
  19. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    Amount of memory and memory BW contention is not a limit of shared memory but of console cost. This is possible from a technical point of view to have 32 GB of faster GDDR6 with 512 bits bus but the cost is too high for a console.
     
  20. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Nvidia's description seems to describe a different model:

    GPUDirect Storage enables a direct data path between local or remote storage, such as NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. It avoids extra copies through a bounce buffer in the CPU’s memory, enabling a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory — all without burdening the CPU.​

    Since around 2011, AMD and Intel began incorporating south-bridge and north-bridges controllers on the main CPU die itself. The bus controllers very much still exist and their features still advance filling the need to support various new I/O models. The distinct logic blocks and interconnects continue still exist on-die, i.e. there are still but controllers for the different devices that can be connected. The integration was why CPU pin-counts exploded because the motherboard suddenly had a lot of more signals crowding into one chip.

    edit: deleted some errant text.
     
    #780 DSoup, Mar 27, 2022
    Last edited: Mar 27, 2022
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...