Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Discussion in 'Console Technology' started by Shortbread, Sep 18, 2020.

  1. JPT

    JPT
    Veteran

    Joined:
    Apr 15, 2007
    Messages:
    2,505
    Likes Received:
    943
    Location:
    Oslo, Norway
    Yeah, I guess its the same as with the PS5 supha dupa speeds everybody thought would magically happen. To get better results, you need to tailor for the solution. Will be interesting to see when going forward, how much performance gain they will "scrape" out of it.
     
  2. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    To be fair to anyone who believed this, Sony's pre-launch messaging included statements by an unnamed Sony rep claiming loading screens were going to be a "thing of the past" and Cerny claiming you could load in data as fast as you could turn your character around.
     
    RootKit and PSman1700 like this.
  3. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,762
    Likes Received:
    2,639
    Location:
    Maastricht, The Netherlands
    Cerny’s claims stand though as far as I am concerned.
     
  4. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Its PR just like any other, it was up to each one to believe it all or not. Anyway, great to see the pc platform being up there with the consoles in nvme/io tech. Forspoken loading in under 2seconds is quite a good start, aswell as the 5000mb/s read speeds test using DS.
     
    RootKit likes this.
  5. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    The thing is that the latency involved in passing the data from one bus to the next is so small compared with the time it takes to transfer the data that it makes virtually no difference to the real world result. i.e. if you're waiting a second to load your data, what's a few extra microseconds (or less) to pass it between different controllers? The bottleneck remains how quickly you can pull the data off the SSD, or how quickly you can do all of the other processing on the CPU - by a very wide margin.

    And you're correct that GPU decompression would be using the GPU memory and caches for that work. But I'm not clear how that is disadvantageous to the hardware block of the consoles which is effectively doing the same thing in hardware with local caches. The key factor is whether that acts as a bottleneck to the throughput or not. Which in both cases we're told that it won't.
     
    PSman1700 likes this.
  6. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    GPU will only decompress things like texture and geometry data. CPU will continue to decompress other things like audio.

    There shouldn't be any issue with throughput. During "load times" you're not rendering anything and the GPU can go full out on decompression maximizing throughput.. and during streaming, you're not going to be requiring that bandwidth at a constant rate.

    I see no reason why it would be worse on the GPU than it already is on the CPU.
     
    RootKit and PSman1700 like this.
  7. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Bandwidth is also important. Being able to move smaller amounts of data quickly is what a general bus is designed for, but when you're trying to move large amounts of data quickly then you may hit bus arbitration issues. The PCI bus is good at moving data really quickly one way, but if you're shuttling data from storage to GPU for decompression to move to RMA data in an asynchronous flow then arbitration may be the cause of the numbers here. Maximum PCI bandwidth is almost always quotes in burst mode, and burst mode is largely predicated on periods of synchronous I/O.

    It's not disadvantageous, it's just different. Current generation consoles can pull compressed data from the drive and decompress a number of formats supported by the I/O controller. There is no moving data into RAM local to the GPU or CPU or any need to potentially move data elsewhere after decompression. You may read 2GB of data from the SSD and have 6GB delivered into RAM without the GPU or CPU having not been involved at all. This is all managed using cache on the I/O controller itself. One unified RAM pool also simplified the model.

    This is part of the issue, surely? If you have a .PAK (zip) file that includes a mix of data types, like textures for the GPU and audio and geometry data for the world, where does it go? This is why loading takes a while now. The data needs to be loaded, picked up and re-directed.
     
    #747 DSoup, Mar 25, 2022
    Last edited: Mar 25, 2022
  8. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    I remember two years ago how you discussed IO on the pc platform. Didnt really turn out to be that way.

    Anyway, we will see how this all compares to the PS5 (or xbox) when we can test, play and benchmark things against eachother. And then you can conduct on how bad it all is.
     
  9. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Can you provide some quotes? I'm not sure what posts you are referring too or what data you think contradicts it? I do recall being skeptical that a Windows API could negate innate PC architecture design, i.e. how individual components are connected via hardware buses.
     
  10. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    Don't we know where it goes? There will be a new class of compression technology which will likely require things to be packaged differently. It will continue to go to system memory first, but now the CPU will only decompress things such as audio assets into RAM while it copies the compressed geometry and texture data to VRAM for decompression by the GPU.
     
    Silent_Buddha and PSman1700 like this.
  11. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Yes bandwidth is important. This was my point, Bandwidth for moving data is more important than the relatively miniscule latencies involved with passing that data between different buses. Are you suggesting that PCIe bandwidth is somehow a limiting factor here though? Because if you are I'm really not following. On the PC the data will move from NVMe to root complex over a PCIe 4x bus - same as the PS5. After that it moves on to the GPU over a 16x bus. Why would the 16x PCIe bus from root complex to GPU in the PC be a limiting factor when the 4x PCIe bus from NVMe to root complex in the PS5 wouldn't be?

    And for that matter, If a PCIe 4x interface (let alone a 16x interface) were a limiting factor in data transfer speeds then why do benchmarks demonstrate throughput improvements, even with small block sizes in NVMe drives all the way up to 7GB/s and beyond?

    Yes there is. The shared memory of the console is RAM local to the GPU. The only difference here is that on PC the data moves over more busses to get there. The narrowest bus on the PC is still equivalent in width to the single bus on the PS5 though. The rest are much wider. So while this adds latency, it's insignificant to the overall result of x MB transferred in x ms.

    This is true, but why does it matter if the CPU or GPU are not a bottleneck? All that matters is that the processing is completed without holding anything else up. The problem with PC's sans DirectStorage is that too much pressure is put on the CPU in terms of IO management and decompression at a time when those CPU cycles are needed for there things like world setup. But if you move much of that work to the GPU that has more than enough capacity to pick it up then this has no performance penalty vs doing it on a dedicated hardware block - which itself is also going to add latency to the process.
     
    PSman1700 likes this.
  12. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    I'm not saying any of it was impossible, but I never expected it to be what is always being done. And that's sort of the point I was making. People who are perhaps a bit less PR literate than the average B3D poster could easily read those comments and expect there to never be loading and that RAM was only being populated with what's on screen.
     
    PSman1700 likes this.
  13. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    To be clear, I'm not suggesting that PCIe bandwidth is the issue. In your reply to me you mentioned only latency. I am making clear that both latency and bandwidth are required for moving a lot of data quickly. PCIe bandwidth is finite, however but the limiting factor in Forsaken using DirectStorage is on the CPU, because that is where the decompression occurs. On current generation consoles, decompression of supported data formats takes place in realtime as data is read. Supported data formats include zlib (which is what most games use for .PAK files), kraken (PS5) and BCPack (Xbox Series).

    Did you seriously just quote half a sentence just to make it appear like I'm wrong? Come on, man that really disingenuous and not the kind of thing you expect to see in the technical forums. Quoting what I wrote in full:


    The difference between the PC approach compared to the consoles, is that on the PC you need to read the compressed data and write that into one of the RAM pools. Then the CPU or the GPU needs to read that data and write decompressed data. Data that needs to be in the other RAM pool then needs copying there.

    At the risk of repeating myself, the current generations consoles have cache built into the I/O controller. Compressed data doesn't need to be put isn't read into RAM, it is temporarily read into super-fast on-chip cache on the I/O controller and written out in uncompressed when written to the single RAM pool.

    The CPU is still the bottleneck. Have you readread the Verge article which quotes the Forsaken developer?.
     
  14. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain

    Yes there is SRAM in PS5 where they decompress the data, Fabian Giesen from RAD tool game did a tweet on it; aying the data is decompressed in chunk of 256KB but on PS5 this is transparent to the dev out of how they package the game data.

    In R&C Rift Apart the CPU is the bottleneck too for the moment because of the way they initialize the entity in a level.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
  16. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Don't tell me, tell the other guy. I think there is generally a lack of appreciation of how complex the PC is architecturally, and this is not a flaw or a design oversight, this is necessary for the PC to be extensible and scalable.

    You could definitely architect a PC to work like a modern console, but not if you want to use an external graphics card because the I/O controller will not have a direct hardware bus to the GPU's RAM pool. Different choices, different designs, different pros and cons. It does feel like it's becoming a platform war argument which why I am really coming to fucking hate these forums. :yep2:
     
    MrSpiggott likes this.
  17. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,762
    Likes Received:
    2,639
    Location:
    Maastricht, The Netherlands
    I don’t know anymore where I posted it and who replied but I stand corrected on those DirectStorage API benefits …
     
  18. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    842
    Likes Received:
    879
    Well, actually this presentation shows more that it isn't really DirectStorage that makes the big loading difference. It is more how you design your game to load the stuff. Yes, DirectStorage reduced the CPU-usage a bit (and allows to effetively use more bandwidth), but the loading times are already quite short. So they already optimized their engine for that stuff.
    E.g. even the HDD loading times with Win32 API are quite good. That happens if you optimize your engine for loading just the stuff you need.
     
    DSoup likes this.
  19. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    DirectStorage will definitely bring benefits but it's not going to negate the need to move data from this bit of a PC to this bit to this bit. I have not been following it that closely either and I thought what had recently been soft-launched was the final implementation but that is yet to come. That will bring loading compressed data over to the GPU where it can be decompressed faster than on the CPU.

    What will continue to be thing on PC is that there are two places where you can you undertake decompression; the CPU and GPU, and each has its own pool of RAM. Ultimately you will still need to load compressed data into some RAM, and decompress it into some RAM and for any data that is needed in the other pool of RAM, you'll need to copy it there.

    The ultimate endgame is to have a programable I/O controller that decompresses compressed-data as it hits the north-bridge and directs it automatically to main RAM and GDDR. That will bring greater benefits but the need for loaded data to be in a usable form, e.g. to generate a massive game world populate theNPCs, vehicles, critters, physics, weather, effects, audio and so on - is still going to be mostly on the CPU. DirectStorage seems to be a bid that moving decompression ultimately to the GPU - including all the to'ing and fro'ing of data - freeing up the CPU, means that latter stage is faster faster and overall, things are better.

    Forsaken definitely does show improvement in load times, but it's not earth-shattering. And ultimately, like everything with the PC, what you will experience will depend on your combination of OS, hardware drivers, graphics card, PCI chipset, CPU and software. Will DirectStorage be smart enough to not have the GPU decompress if the CPU is the faster option (calling for all the data movement)?

    Which is the same situation as games built for current generation consoles. You have Spider-Man loading in around six seconds on PS5, and Astro's Big Adventure in about two seconds, because both have obviously been designed to make the most of the I/O pipeline. You're not seeing that with Assassin's Creed or GTA V.
     
    #759 DSoup, Mar 26, 2022
    Last edited: Mar 26, 2022
    Allandor likes this.
  20. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Again, we will see how much better the consoles are compared to the PC platform in loading/IO performances. Believing the NV presentation it arguably has the numbers going for it, which probably has more impact than some small 'additional latency routes' and other claims of disadvantage. You need to see the advantages aswell, the decompression on the GPU is scalable/programmable and much more capable at the same time.
    Also, remember that games that are fast loading on your PS5 are probably designed around the IO aswell, improvements are being done and seen on the PS4 too in that regard.

    As said, we will see how bad the PC performs relative to the consoles. All this platform warring might be for nothing if they end up loading/streaming about as fast, even though the architectural differens in IO.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...