Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Discussion in 'Console Technology' started by Shortbread, Sep 18, 2020.

  1. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Watch the videos posted above, it is quite similar to the PS5's idea, as far we can believe NV, their slides and the various explanations done.
     
  2. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,312
    If it does go to the GPU for decompression, then it doesn't have to go back to the CPU before going to system RAM. The only reason that it has to hit the CPU die when reading from a storage device is that all storage PCIE lanes go through the I/O complex on the CPU. Data from GPU can be written directly to system RAM.

    If we were to think of the GPU as the point where everything is decompressed then the only difference to the PS5 I/O chain is that when reading data needs to be "bounced" off the CPU. Once that is accomplished then it's roughly the same from the decompression step forwards, except that the GPU is acting as the decompressor. On PS5 it would go from the decompressor to RAM. On PC, it would go from the decompressor to either VRAM or system RAM.

    But yes, we currently don't know whether the GPU would handle all game related decompression tasks or not.

    Regards,
    SB
     
    BRiT, PSman1700 and nutball like this.
  3. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    No it's not, I've never said this. :???:
     
  4. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    Then wouldn't the bandwidth from the drive to the cpu and the cpu to the gpu be the bottleneck at some point ?

    I think the next step is to just add the storage to the gpu instead of the cpu
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,104
    Likes Received:
    16,896
    Location:
    Under my bridge
    Then what exactly has all this fuss been over? If the Northbridge has no impact on the IO performance, why have people spent so many words arguing over what and where it is? :???:
     
    PSman1700 and pjbliverpool like this.
  6. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Please quote where I referenced Sandybridge and Westmere in that way? That's right, I did not. In the context of discussing modern PC architectures you claimed, and I quote "the way the processor part of the die connects to the external bus part of the die is not that different from when the separate processor chip was connected to the separate northbridge via the FSB". To which I responded that the bandwidth and latency of that connection is orders of magnitude better than the old FSB, and thus it is very different. I made absolutely no claims about the real word performance implication of that interconnect improvement because it had absolutely no bearing on the core argument.

    You trying to twist that into me claiming that "Sandybridge was obviously "orders of magnitudes" faster and better than Westmere" is disingenuous at best. It's also entirely irrelevant to the core argument which was your incorrect implication that the Northbridge was some obstacle that the PC had to contend with which the consoles did not.

    Here's what you said:

    You're clearly framing the journeys over the Northbridge as an additional step the PC has the manage that the console does not. Again. This is wrong. You misrepresent both data flows above to make one seem significantly more complicated than the other. Here is what the console flow would look like if written in exactly the same context as you used for the PC flow:

    "your data is read off the storage cells by the drive controller, passed to the decompression unit where it's written to the local cache, decompressed and then written back to that cache before being directed across the north-bridge, to main memory"

    Why the inconsistency? If you're going to mention the Northbridge in one description, why miss it from the other?

    Again, stop trying to wildly misrepresent what I've said. Clearly I've never tried to claim that the functions that used to be handled by the Northbridge "no longer exist". I have referenced multiple times over the past several posts how they were integrated into the CPU, including in the very first post on this subject. The issue here is you framing the Northbridge as something the PC has to deal with, and the console does not. Which once again, is wrong. Why are you so resistant to just admitting this and moving on? These last several pages of argument have been completely unnecessary. A simple acknowledgment that the "Northbridge" functionality/requirement is the same in both console and modern PC is all that was required rather than endless posts arguing about increasingly off topic minutia.

    No, it isn't written direct to memory. In your own framing it is directed across the Northbridge to main memory. I realise that's just semantics but it's the framing that's important because you're trying to represent the simplicity of one data flow vs the complexity of another.

    To illustrate this, take a PC that's using an APU instead. By your description above the data undergoes the same 2 stages, simply in reverse. Data is written direct to memory from the SDD where it is decompressed by the GPU (or CPU) ready for use. Just as simple as the console route you describe above. For PC's with a dGPU there is just one additional step - moving the GPU data from main memory to the GPU. And RTX-IO may even address that. Far from "moving data around a bunch of places" as you describe it above.

    Also, what evidence do you have to support your suggestion that writing the data to GPU memory for decompression introduces any kind of performance penalty compared to doing it in the local memory of the consoles hardware decompressor? And if you're not saying that then why are you mentioning at all? We've been told that the GPU decompression is more than capable of keeping up with the fastest available NVMe drives so why does it matter what memory pool is being used for that? Except of course to frame it as somehow more complicated/inferior/likely to mitigate the benefits of faster components.
     
    PSman1700 likes this.
  7. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    As far as I understand it these are handled by dedicated cores in the SSD controller itself. There was a blog post by someone from RAD Game Tools a while back which explained all these extra bits in the PS5 as basically standard components of a regular NVMe drive, just with a custom firmware. We do know for example that regular NVMe controllers can feature multiple ARM cores.

    Based on the Foresaken GDC presentation it seems CPU and GPU data will be separated and sent to their respective memory pools for parallel decompression. So I think it's fair to say there will probably still be more load on the PC CPU compared to the console CPU. But as the CPU side is much smaller than the GPU side (MS claims about 80% of streamed data is textures) then the CPU load is still massively reduced.

    But is there a real world benefit to this apart from being easier to draw on paper? If 80+% of the decompression workload has now been removed from the CPU, and the IO overhead has been reduced to a tiny fraction of what it is today, would the CPU even be a bottleneck anymore? Could it be argued that the higher decompression capacity of the combined CPU and GPU capabilities is a better trade off because if gives headroom to work with much faster NVMe drives?

    This is pretty much the crux of my argument. But going more fundamental than this, even if the Northbridge did have an impact on performance, it would be having the same impact on the console too. Because what we're referring to as the Northbridge here is actually just the CPU's integrated memory controller. And obviously to copy data from an SSD to main memory, you have to go via the memory controller.
     
    PSman1700 likes this.
  8. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    I have seen you using this phrase before. It isn't all that needed i think, abit too much to the user instead of the discussion.
     
  9. davis.anthony

    Regular

    Joined:
    Aug 22, 2021
    Messages:
    423
    Likes Received:
    147
    So from what I can see it would seem that RTX I/O is GPUDirect Storage that Nvidia debuted in 2019, or it's based on that work as some of the diagrams are very similar.
     
    #809 davis.anthony, Mar 29, 2022
    Last edited: Mar 30, 2022
  10. davis.anthony

    Regular

    Joined:
    Aug 22, 2021
    Messages:
    423
    Likes Received:
    147
    The decompression work can be removed from the CPU from what about everything else that follows?

    As Mark Cerny pointed out, handling the memory writes and file 'check ins' for 100Mb worth of data coming off the disk is nothing for a CPU, but change that to several gigabytes and it becomes a huge task.

    How has that part improved, do we all need to be rocking 10 core CPU's for next gen?

    Forespoken is hugely impressive but it's still a game that can work on a SATA III SSD so it's not going to show how hard the CPU and GPU will get hit by I/O in the next 2-3 years.

    Do we have CPU utilisation results for Forespoken?
     
  11. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    This is already dealt with by direct storage. The overhead associated with this IO management is reduced from multiple CPU cores to 10% of a single core according to Microsoft.

    Forspoken isn't the best example anyway as its not using GPU decompression yet. So CPU utilisation is going to be much higher that it would be when that is used.
     
    PSman1700 and BRiT like this.
  12. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    Wouldn't the next step be to add dedicated dps / co processors just for these steps. MS created dedicated hardware to do this on the xbox series. I would imagine that the next step would be ms liscensing it out to interested parties so it can be intergrated into the cpu or I would image it be more useful at the disk drive or disk drive controller.
     
    PSman1700 likes this.
  13. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    Yeah, what I'm getting from all of this is that the 2 main benefits of RTXIO is the (theoretical) saving from only transferring compressed data over the bus, and decompression of data needed by the GPU can be done on the GPU, thus saving CPU cycles for those specific assets. These are both positive things, of course, but I don't know if we are really at a point where bus bandwidth is really limiting things in a way that user facing. I'm not sure this is a magic bullet for eliminating loading like some people hope it will be, but it's a step in the right direction.
     
    pjbliverpool, PSman1700 and Remij like this.
  14. davis.anthony

    Regular

    Joined:
    Aug 22, 2021
    Messages:
    423
    Likes Received:
    147
    What that on the Series consoles or Windows?

    I said last year that I wouldn't be surprised if we see something like PS5's I/O complex included in CPU's in the future.
     
  15. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    I don't think it's about doubling the bandwidth of the bus, but rather sending the data in compressed form in around half the time.

    Any way you look at it.. it just makes sense to send compressed data over any buses when you can.
     
    PSman1700 likes this.
  16. Remij

    Regular

    Joined:
    May 3, 2008
    Messages:
    677
    Likes Received:
    1,256
    Tim Sweeney seems to think the PS5 architecture will have a strong influence on PCs in the future, so I wouldn't be surprised.

    In the meantime, I think MS are doing an admirable job of improving where they can.. given the existing state of things.
     
    PSman1700 likes this.
  17. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Maybe, but an evolution of some sort of it if anything. With the things as they are now its not really needed either, the pc IO is competitive enough for this generation, probably faster.
     
  18. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    You tell me. It began with my post on how compressed data read off SSD becomes decompressed data usable by GPU and CPU on a PC. Whether you want to refer to the logic block as the 'northbridge', 'system agent' (nomenclature since Sandy Bridge) and whether it's on-or-off die is neither here nor there and it changes nothing. ¯\_(ツ)_/¯
     
    davis.anthony likes this.
  19. davis.anthony

    Regular

    Joined:
    Aug 22, 2021
    Messages:
    423
    Likes Received:
    147
    And that is based on seeing one game (Foresaken) built to run on HDD's?

    What does it run like on the average PC with a 6 core CPU and SATA III SSD?

    And moving everything to the CPU via dedicated fixed function hardware would be a better option as it would offer a more efficient approach.
     
  20. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,777
    Likes Received:
    12,691
    Location:
    London, UK
    Well, that's another debate. Even on consoles, where there is unified memory and an APU containing both CPU and GPU, both Microsoft and Sony chose to put the decompression block off-chip.

    On PC, if commits to do decompression on the CPU then you are stuck with the situation that you are still routing all compressed data via the CPU. I'd argue it makes sense sense to have a smarter controller elsewhere, more like the traditional northbridge, and data read off the SSD can be decompressed there then routed directly to main RAM for the CPU, or GDDR for the GPU directly.

    Having to route data for the CPU via the GPU, or data for the GPU via the GPU, for the purpose of doing basic decompression, is less efficient. People here are thinking about gaming and whether there is enough bandwidth but PCs are used for incredibly heavy data I/O tasks and just moving data around to get to the right place is just inefficient.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...