Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

I wonder if it means that we will be able to replace internal SSD when faster SSDs will arrive to the market

Xbox SSD is "slow" so there have been as fast and faster ones for years already.

So speed isnt an issue to find as fast or much faster, compatibility is. Someone probably tests soon what happens with different SSD
 
The removable SSD is probably good from a console repairabilty perspective, but I'm not sure there is much of an advantage from a consumer perspective. That form factor of M.2 isn't widely available....
 
The removable SSD is probably good from a console repairabilty perspective, but I'm not sure there is much of an advantage from a consumer perspective. That form factor of M.2 isn't widely available....

Yeah, not much from consumer perspective except for having lower overall product price; it's said Microsoft uses it in their Surface products, so that should net them better prices.
 
Bandwidth between CPU and VRAM is a tiny fraction of that between CPU and local RAM, which is the reason CPU and GPU have their own local RAM pools in the first place.

On a typical Zen 2 system you're likely to be looking at around 51.2GB/s between the CPU and system RAM. CPU to GPU bandwidth is about 14-15GB/s in each direction over PCI3.0. Over PCIe4.0 it's double that so around 30GB/s in a single direction.

So I wouldn't really describe it as a tiny fraction. Granted latency will be rubbish compared to local memory but we're not talking about the CPU rendering out of vram here, we're talking about the time to copy what is likely a few hundred MB of game data from where is is decompressed in VRAM to where it needs to be in main RAM for the CPU to work on it (assuming that's how DirectStorage/RTX-IO even works which is far from given). And all this is to be done at a loading screen so we're talking about timescales in full seconds, not the micro seconds of latency that are added by having to work over a PCIe bus rather than from local memory.

Tell me how you propose splitting the PCI resource snd I'll math not up for you. You're presumably wanting as many channels moving raw data as fast as possible to GPU to decompression while reserving enough to carry decompressed data to main RAM while reserving enough for all the other devices that rely on low-latency PCI bus access to function, like audio and networking.

You have 4 channels coming from SSD to GPU via the SSD<->CPU link and then the CPU<->GPU PCI link. So essentially 4 of your 16 channels from CPU->GPU are taken up by that. You still have 16 channels going back from GPU to CPU to move any data that needs to be in system RAM back. Given that data is now decompressed and thus potentially twice as large as when it came over, those 16 channels should still be double what you need to keep up with the maximum speed from the SSD into the GPU. Not that you're likely to need anything like that maximum speed as the data required by the CPU would only be a very small proportion of the total data streaming in from the SSD. MS say 80% of streamed game data is textures, so at the very most you're only looking at 20% of what you stream to the GPU having to go back over the 16x PCIe bus into main memory.

As an example, lets say at the game load you need to pull 10GB off the SSD and into memory. 2GB of that is for the CPU and 8GB is for the GPU. To keep things simple lets say you have a 5GB/s SSD with an effective throughput of 10GB/s with compression.

Provided you load and decompress the CPU data first, that will be in VRAM and decompressed in the first 0.4 seconds. You then push that back over the CPU<->GPU PCI link (4GB of it now) at a rate of ~30GB/s, so it takes about 0.13 seconds to put that decompressed data into system RAM. Meanwhile you're still spending the next 1.6 seconds bringing in the remainder of the GPU data into VRAM from the SSD.

So I'm not seeing why the PCIe bridge between CPU and GPU is acting as a bottleneck in any way in this scenario. Even if you didn't transfer the CPU data from SSD first and push it back in parallel to streaming the remainder of the GPU data from the SSD, you're still adding at worse 0.13 seconds to your 2 second timeframe.

A congested PCI bus will be terrible for audio and networking. PCIe has maximum theoretical bandwidth but you can only approach it if you're willing to sacrifice low latency priority devices.

You're doing this at a load/transition screen. Why would PCIe traffic between the CPU and GPU be heavily utilised at that point by audio and networking? More to the point, why would that be impacting the CPU<->GPU PCI link at all? Those functions sit on the south bridge which would have their own separate PCI link to the CPU.
 
On a typical Zen 2 system you're likely to be looking at around 51.2GB/s between the CPU and system RAM. CPU to GPU bandwidth is about 14-15GB/s in each direction over PCI3.0. Over PCIe4.0 it's double that so around 30GB/s in a single direction.
I think your whole post is based on the I/O consistently achieving close to it's theoretical maximum throughput on modern hardware with data being optimally arranged that it needs only one PCIe exchange (SSD to GPU) or two (SSD to GPU to CPU) at most, which may be the case for some games package in future to work on this exact arrangement, but not every gaming shipped to date, nor shipping in the next 6-12 months. And probably not many after that isf Ubisoft's lazy arse port of Valhalla on nextgen consoles is anything to go by.

Let's see if this is actually the case some DirectStorage's released.

You're doing this at a load/transition screen. Why would PCIe traffic between the CPU and GPU be heavily utilised at that point by audio and networking? More to the point, why would that be impacting the CPU<->GPU PCI link at all? Those functions sit on the south bridge which would have their own separate PCI link to the CPU.

What else is your computer doing?
 
I think your whole post is based on the I/O consistently achieving close to it's theoretical maximum throughput on modern hardware with data being optimally arranged that it needs only one PCIe exchange (SSD to GPU) or two (SSD to GPU to CPU) at most, which may be the case for some games package in future to work on this exact arrangement, but not every gaming shipped to date, nor shipping in the next 6-12 months. And probably not many after that isf Ubisoft's lazy arse port of Valhalla on nextgen consoles is anything to go by.

Let's see if this is actually the case some DirectStorage's released.

Yes agreed, my post was just to illustrate the lack of hardware bottlenecks asscoiated with the CPU<->GPU PCIe link in this scenario (game loading/fast travel). It's down to the developers and DirectStorage to make the best use of the hardware and to enable anything close to these maximum throughput levels. Although I expect the SSD transfer speeds to be more susceptible to not hitting near their peak rates than transfers between VRAM and system RAM over the PCIe link.


What else is your computer doing?

No doubt plenty, but in terms of moving Gigabytes of data across these specific links during a game loading screen I wouldn't expect there to be anything that would be heavily impacted. Taking my example above on a PCIe4 system you still have more than a full PCIe3 16x link of free bandwidth between the CPU and GPU even when the SSD is running at full pelt and data is being transferred back from vram to system RAM so there's more than enough to spare.
 
XSX SSD format and layout

On the surface of it, shouldn't be much trouble upgrading to a larger size.
Especially be interesting for XSS in future.

Biggest headache is probably getting to them in the console.
 
Where do the dedicated ssd decompressors reside? GPU for both systems? How are they different from each other?
 
Where do the dedicated ssd decompressors reside? GPU for both systems? How are they different from each other?

Both have dedicated decompression hardware. For the PlayStation we know that its a separate block on the soc. For the xbox I don't think we know exactly where it is, but most likely as a separate piece aswell. And by separate I don't mean like a separate chip, just a specific portion of the silicon is dedicated to it. When they are talking about hardware decompression they both aren't talking about gpu based decompression.
 
I would guess in the SoC itself. I don't know if decompression happens before or after decryption, but either way, for the PS5, they don't have enough PCIe lanes to handle it off SoC (at least, a M.2 slot doesn't have enough PCIe lanes to do it), and the Xbox doesn't seem to either. MSFT claims 8 PCIe 4.0 lanes on the SoC itself. Not all are used at gen4 speeds, since 1 lane goes directly to the GbE controller (RTL8111HM - I am assuming it is a RTL8111H MAC+PHY with the extra M standing for Microsoft) and what looks like 2 lanes to the southbridge. This leaves 5 lanes, of which two are taken by the internal SSD and I'm guessing two by the external slot. Either way, not enough leftover lands to support an external decompression path.
 
Both have dedicated decompression hardware. For the PlayStation we know that its a separate block on the soc. For the xbox I don't think we know exactly where it is, but most likely as a separate piece aswell. And by separate I don't mean like a separate chip, just a specific portion of the silicon is dedicated to it. When they are talking about hardware decompression they both aren't talking about gpu based decompression.

didnt ms do a presscon for their chip? no info from there on what part it resides in? how certain is xbox not using gpu based decompression?
 
These will be on the Apu for cost and security reasons.

Sony confirmed the IO is on the apu

View attachment 5053

MS have decompression and decryption on the Apu

View attachment 5054


Hey, the hot chips slides had a leak on it!

They only announced Microsoft pluton, which is a hardware security processor that's going to get built into future cpus, last month.
But it's right there under the Security and Decompression bullet point, with HSP/Pluton
 
Hey, the hot chips slides had a leak on it!

They only announced Microsoft pluton, which is a hardware security processor that's going to get built into future cpus, last month.
But it's right there under the Security and Decompression bullet point, with HSP/Pluton
Pluton has been public and acknowledged by MSFT officials since at least last year, and probably before that. Basically before Hot Chips 32. At least one MSFT security engineer has described the work they did on the Xbox One as the genesis of AMD PSP and Pluton.
 
This has nothing to do with the "compression" itself. It is just how the build-pipeline creates the packages for the game. The xbox version is working with every xbox console of the current and last gen so they still use the "old" way to package a game there. The PS5 version is just for the PS5.
Just as an example, if they must replace one file of a 1GB package, they will replace the 1GB package. But if your game consists of many, many more small packages (or even individual files) only those parts will be replaced instead of a big package.

So far a game on xbox that did adapt to the new loading-system is gears 5 as MS uses this game to adapt their tech. Completely different packages are delivered to xbox one and xbox Series consoles, with the consequence that the game is much smaller on the current gen.
 
The xbox version is working with every xbox console of the current and last gen so they still use the "old" way to package a game there. The PS5 version is just for the PS5.

Nope, it's a different package with different file sizes between Xbox One and Xbox Series.
 
Even if the SeriesX was literally using no compression at all (aside from normal GPU native texture compression), the PS5 using Kraken still wouldn't be 3x smaller according to all available information on Kraken compression ratio's.
I think we mentioned this before in one of our DF Directs - but Devs have said the PS5 build pipeline does compression easily and by default - Xbox one, currently does not tmk!
 
Back
Top