Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

You can solve almost any problem by throwing money at it.

The solutions used by Microsoft and Sony on current generations consoles are simple, effective and cheap to implement. Throwing RAID at the problem, or putting decompression on GPUs that cost more than a whole console (not your post) is a bit of a mental argument.

If money is no object, I don't doubt that both Microsoft and Sony could design a console architecture with literally no load times. No boot times either.

What money? Anyone gaming on PC is probably doing so with a dedicated GPU, so no added money necessary. That "Raid" card, as noted is just a bog standard low end NV GPU. For consumer gaming purposes you wouldn't need nearly as much GPU power as is used by the GPU that SupremeRAID is using.

The only reason I even mentioned that "Raid" card is that it's an actual released product using a GPU to offload data transfer tasks from the CPU.

And as I mentioned, it's entirely possible that the iGPU on most CPUs (all Intel CPUs and some AMD CPUs) could potentially be leveraged for this.

So, basically Zero additional dollars needed for this. Or in other words no additional money need be thrown at it, well, other than MS needing to spend money to implement it into DirectStorage.

Regards,
SB
 
Nvidia show 12x.

There's a lot of numbers in that RTX I/O presentation that don't add up.

Announcing AMD support for DirectStorage launch - GPUOpen

"When storage performance was the bottleneck to loading performance, CPU utilization time from asset loading was considered less important because it was much smaller than the storage transfer time. Today, we find ourselves in the opposite situation. Code which processes assets at load time is now the most common bottleneck.

Use multiple threads for loading assets and interacting with the DirectStorage API. Processing loading on a single thread will create a performance bottleneck."
 
What money? Anyone gaming on PC is probably doing so with a dedicated GPU, so no added money necessary. That "Raid" card, as noted is just a bog standard low end NV GPU. For consumer gaming purposes you wouldn't need nearly as much GPU power as is used by the GPU that SupremeRAID is using.
The Steam Hardware Survey shows a lot of people are gaming on laptops. How do you just add a RAID SSD to a GPU? Can I add an SDD that to my 3080? If so, how? How do I add an SSD to my desktop 3080? Or my ASUS Zephyrus G14 (3060)? Where do I get a free SSD?
 
The Steam Hardware Survey shows a lot of people are gaming on laptops. How do you just add a RAID SSD to a GPU? Can I add an SDD that to my 3080? If so, how? How do I add an SSD to my 3080? Where do I get a free SSD?

Laptops also have dedicated GPUs. And why are you so focused on RAID? The "only" reason I mentioned the "RAID" card is that it's just a standard graphics card with a standard GPU.

If, when using software RAID, you can reduce the CPU load of data transfers in Raid-0 (so minimal processing required) in excess of 80 GB/s down to 1-3% from double digit percentages when using a CPU for the same purpose, then data transfers of a single NVME SSD of data transfers in excess of 7 GB/s without RAID would be absolutely trivial compared to the CPU load that they currently generate when gaming.

And the whole post was in response to someone implying that CPU load is still going to be significant even after MS implements decompression on GPU.

The whole point was showing that, it isn't necessarily the case that once the GPU is involved in DirectStorage that there will be any significant CPU load related to data transfers in games to speak of.

Also, if you really REALLY wanted to, you could partition the SSD in a laptop into 2 partitions. Then software RAID those partitions. Why? No idea, but you could if you wanted to without spending extra money for another SSD. :p

Regards,
SB
 
Last edited:
The Steam Hardware Survey shows a lot of people are gaming on laptops. How do you just add a RAID SSD to a GPU? Can I add an SDD that to my 3080? If so, how? How do I add an SSD to my desktop 3080? Or my ASUS Zephyrus G14 (3060)? Where do I get a free SSD?

I think you misunderstood SB's explanation of the 'RAID card'. He's primary trying to explain what DS/RTX-IO etc are doing in 'technical layman' terms.
 
Announcing AMD support for DirectStorage launch - GPUOpen

"When storage performance was the bottleneck to loading performance, CPU utilization time from asset loading was considered less important because it was much smaller than the storage transfer time. Today, we find ourselves in the opposite situation. Code which processes assets at load time is now the most common bottleneck.

Use multiple threads for loading assets and interacting with the DirectStorage API. Processing loading on a single thread will create a performance bottleneck."

All that does is explain the reason the I/O load increase, which myself, you and everyone knows.

It doesn't explain why RTX is so much higher than everything else.

Andrew Goossen from Microsoft when discussing storage, decompression and I/O on XSX.

We found doing decompression software to match the SSD rate would have consumed three Zen 2 CPU cores

So that's 3 cores @ 4.8Gb/s

So you're talking 9-10 CPU cores based to get to the 14Gb/s on the RTX I/O slide, a huge difference to the 24 the slide shows :runaway:

So as I said previously, are they using Intel Atom's for the decompression work?

Have they used a slow CPU to make RTX I/O seem better than it is? Marketing your GPU does the same work as 24 cores is better than saying it beats 9-10 cores.

So to me that slide is not right and isn't something I'm using for anything.
 
Why should the way things are done now mean things will always be that way as you suggest? Either nVidia are outright lying, or that's the future intention.

Or.......Nvidia have just rebranded GPUDirect Storage (From 2019) in to RTX I/O.

The system diagrams are an exact match.

It also explains why that NIC block is present as GPUDirect Storage uses NIC's for server and drive access which is why it's there in the diagram.

GPUDirect Storage also uses a PCIEX switch which is what that PCIEX block in the RTX I/O slide is, it's not to show it doesn't go through the CPU as everyone thought.

It's an actual, physical and separate PCIEX switch.

So it seems that whole slide is a waste of time.
 
So that's 3 cores @ 4.8Gb/s

So you're talking 9-10 CPU cores based to get to the 14Gb/s on the RTX I/O slide, a huge difference to the 24 the slide shows :runaway:

So as I said previously, are they using Intel Atom's for the decompression work?

Have they used a slow CPU to make RTX I/O seem better than it is? Marketing your GPU does the same work as 24 cores is better than saying it beats 9-10 cores.

So to me that slide is not right and isn't something I'm using for anything.

They key there is "if" you are doing the decompression on the CPU.

RTX-IO is doing that on the GPU.

RTX IO: GPU Accelerated Storage Technology | GeForce News | NVIDIA

...NVIDIA RTX IO, a suite of technologies that enable rapid GPU-based loading and game asset decompression...

That, of course, as we know, isn't currently in DirectStorage although we know that MS are working to implement GPU based decompression.

Regards,
SB
 
You're not understanding what I'm saying.

Why are Nvidia claiming it takes 24 CPU cores to achieve 14Gb/s when using what others have stated it should only take around 9-10?

Where are you getting 24 cores from? Their graph that has 24 on it is related to Read Bandwidth. 24 GB/s when using a Gen4 SSD with compressed game assets.

[edit] Oh wait, now I see.

So, yeah, I guess the question would be what CPU were they using for that graph and what compression algorithm were they using.

Regards,
SB
 
How does that fit in with nVidia's slide showing RTX I/O bypassing the CPU completely?
I guess bypass completely is a bit to much. It won't use many resources would be better. The CPU (and the OS) must still give the ok, that the GPU can access the parts of the memory for security reasons. And as the PCIe bus is still part of the CPU data will always flow through the CPU ^^ ;)
But it won't involve many cpu cycles to validate what the GPU gets. The data itself shouldn't be checked to much. I guess you have a bigger cpu usage if you have a encrypted storage. Than the CPU must decrypt before the GPU can get the data. Btw, this is also what we have on xbox. Everything is encrypted and signed. I guess those checks must be done before the data is decompressed. So at this point, the cpu will always read the data into memory and than the gpu can do whatever it wants to do. Security breaks the neck of many possible optimizations.

Choosing not to implement popular decompression techniques in hardware because something new will come along at some points means you'll never implement decompression in hardware. Something new will eventually come along. But zlib decompression hardware was standard in last generation consoles and both Microsoft and Sony have really focussed their architectures on making supported decompression have zero impact on CPU or GPU.
The current CPUs are useable for gigabytes per second of kraken decompression. This is really not a big problem. CPUs in PCs are more than good enough. It only can get a problem if you have a closed system or a system where you have a power limit (laptop).
E.g. for PCs there were already PCI extension cards (even PCIe cards) that could handle hardware (de)compression. But CPUs almost always evolved faster and the cards get useless (expect for some edge cases) quite fast.
 
Last edited:
I guess you have a bigger cpu usage if you have a encrypted storage. Than the CPU must decrypt before the GPU can get the data. Btw, this is also what we have on xbox. Everything is encrypted and signed. I guess those checks must be done before the data is decompressed. So at this point, the cpu will always read the data into memory and than the gpu can do whatever it wants to do. Security breaks the neck of many possible optimizations.

Unless decryption is done on the GPU too... Though I'd imagine the CPU would/could have dedicated silicon for handling that, which isn't entirely counted as CPU resources -- kind of like the DMA blocks or embedded "northbridge". Or is any part of the decryption done by the "storage controller"?
 
Unless decryption is done on the GPU too... Though I'd imagine the CPU would/could have dedicated silicon for handling that, which isn't entirely counted as CPU resources -- kind of like the DMA blocks or embedded "northbridge". Or is any part of the decryption done by the "storage controller"?
If your storage controller does the decryption, you have problems with external attacks. So your decryption-logic must be as near as possible to the cpu. If the GPU decrypts, you must give it the decryption key and you get more security issues than before.

Even signing is breaking performance. E.g. we've seen this from Windows 7 to Windows 8. Windows 7 does not check for signed DLLs etc. Windows 8 does which is why Windows 8 runs horrible on HDDs and the update process of the xbox one needs half an hour.
 
If your storage controller does the decryption, you have problems with external attacks. So your decryption-logic must be as near as possible to the cpu. If the GPU decrypts, you must give it the decryption key and you get more security issues than before.
Microsoft have a solution to this, which is 'TPM-based certificate storage'. This does rely on TPM and the drive controller this function but I think the implementation is that encrypted data is decrypted by the controller using the keys supplied by the TPM as authorised. by the CPU and OS.

I don't know to what extent is used though.. ¯\_(ツ)_/¯.
 
Back
Top