Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

hur hur hur.
Never say never =P
But when paired with the GPU power of this generation, I agree 32GB of vram makes no sense.
But as we move further into dynamic lighting, that opens up the world of dynamic environments and destruction, all that geometry and texturing and decals etc, could require a significant lift in memory requirements.

And why do you want to have this inside memory? Next generation will see faster SSD at least PCIE 5(max 25 GB/s uncompressed data?) and maybe PCIE 6(max 50 GB/s of uncompressed data) if console release in 2027/2028. They can stream this from storage too if needed. At 60 fps a PCIE 6 SSD would be able to load 1.6 GB of data in 2 frames for example. If transition between PCIE 5 and PCIE 6 is the same in 2026 first consumer PCIE 6 SSD will arrive, if it is a bit longer 2027.

This is the concept of useful RAM and load just in time. You never know if you need this new geometry, texturing and decals load it when it is needed.
 
Last edited:
And why do you want to have this inside memory? Next generation will see faster SSD at least PCIE 5(max 25 GB/s uncompressed data?) and maybe PCIE 6(max 50 GB/s of uncompressed data) if console release in 2027/2028. They can stream this from storage too if needed. At 60 fps a PCIE 6 SSD would be able to load 1.6 GB of data in 2 frames for example. If transition between PCIE 5 and PCIE 6 is the same in 2026 first consumer PCIE 6 SSD will arrive, it is a bit longer 2027.

This is the concept of useful RAM and load just in time. You never know if you need this new geometry, texturing and decals load it when it is needed.

I wonder how that affects bandwidth. When you currently have 2.5-10 GBs of bandwidth being used by the SDD, you leave the cpu and gpu to contend with the other 97-99%. But 50-100 GBs of bandwidth being used by the SSD creates a third major user of the total bandwidth offered by RAM and an extra layer of contention.
 
I wonder how that affects bandwidth. When you currently have 2.5-10 GBs of bandwidth being used by the SDD, you leave the cpu and gpu to contend with the other 97-99%. But 50-100 GBs of bandwidth being used by the SSD creates a third major user of the total bandwidth offered by RAM and an extra layer of contention.

I hope we will have GDDR7 inside console in 2027 or 2028. It means we can have around 1 TB/s with a 256 bits bus with 32 Gbps or 36 Gbps module of bandwidth inside a console. If they release next generation console so late, it will let extra time for GDDR 7 to be more affordable probably 3 to 4 years after release of the memory.

 
Last edited:
And why do you want to have this inside memory? Next generation will see faster SSD at least PCIE 5(max 25 GB/s uncompressed data?) and maybe PCIE 6(max 50 GB/s of uncompressed data) if console release in 2027/2028. They can stream this from storage too if needed. At 60 fps a PCIE 6 SSD would be able to load 1.6 GB of data in 2 frames for example. If transition between PCIE 5 and PCIE 6 is the same in 2026 first consumer PCIE 6 SSD will arrive, if it is a bit longer 2027.

This is the concept of useful RAM and load just in time. You never know if you need this new geometry, texturing and decals load it when it is needed.
Well the idea here is that when you get into destructible environments, you're in a situation where you're fighting around a collapsed building that was destroyed dynamically, but what's currently saved on the NVME drive is just the original model. You have to keep everything destroyed in memory while playing in it, because we are assuming here you can no longer stream this geometry from the NVME drive. As you continue to play out in destroyed territory, you have to treat it like a traditional level, as in, everything needs to stay loaded in memory to be able to not recalculate everything and just have things run smoothly.

Once you get into this situation, you're back to traditional non streaming rendering styles, and during destruction any new textures, new decals, deformation that are made must stay resident etc, unless you are writing to the drive. Which might be a future innovation (since today's consoles cannot support fast write speeds), but then you're going to get into some VERY interesting hardware asks about constantly reading and writing to the drive and the heat levels that could come from that type of heavy saturation.

So the reason we need vram is to support interactivity and engagement. If you want broken bottles and boxes to stay resident, instead of disappearing, that needs to stay in memory. if you want cars and vehicles to immolate and destroy themselves without switching to a dead vehicle type generic model, then you're going to have to keep all that in memory too. If you want to shoot bullet holes through cars and bricks and have lights coming through and you want them all to stay, you need to keep them stored in memory. If you want to destroy parts of a house, or building, or do any sort of those sandbox crazy things, all of that needs to stay in memory.
 
Well the idea here is that when you get into destructible environments, you're in a situation where you're fighting around a collapsed building that was destroyed dynamically, but what's currently saved on the NVME drive is just the original model. You have to keep everything destroyed in memory while playing in it, because we are assuming here you can no longer stream this geometry from the NVME drive. As you continue to play out in destroyed territory, you have to treat it like a traditional level, as in, everything needs to stay loaded in memory to be able to not recalculate everything and just have things run smoothly.

Once you get into this situation, you're back to traditional non streaming rendering styles, and during destruction any new textures, new decals, deformation that are made must stay resident etc, unless you are writing to the drive. Which might be a future innovation (since today's consoles cannot support fast write speeds), but then you're going to get into some VERY interesting hardware asks about constantly reading and writing to the drive and the heat levels that could come from that type of heavy saturation.

So the reason we need vram is to support interactivity and engagement. If you want broken bottles and boxes to stay resident, instead of disappearing, that needs to stay in memory. if you want cars and vehicles to immolate and destroy themselves without switching to a dead vehicle type generic model, then you're going to have to keep all that in memory too. If you want to shoot bullet holes through cars and bricks and have lights coming through and you want them all to stay, you need to keep them stored in memory. If you want to destroy parts of a house, or building, or do any sort of those sandbox crazy things, all of that needs to stay in memory.

This is not how it works. In games destruction is most of the time artist driven for multiple reason. Because if it ask less power and it made physics predictable. The worst would be a player blocked in a game because he destroy some object. I know in many games with destruction for example pillar they are made of pre fracturing destructible mesh, you can keep them in memory or stream them when needed at runtime the pre made block will be destroyed depending of where the object is shot and the force of the impact.


Fortnite 5 uses Niagara particle system and VFX to do the destruction without using pre fractured mesh but this is not the case of every game. And same it is using some assets too. It is not generating random asset. You can decide to have the asset in memory or stream them in the future. And this is a current gen games with dynamic lighting.

 
How are they dumb?

Many people think they're a good idea.
Because they are just taken over by the spot of the next generation machine to begin with, on top of not being strong enough to provide a measurable leap in quality that justifies an entirely new purchase while shifting company resources away from other things tbh...

It also to some degree, shifts less priority on base machines in order to run with a marginally better experience on the higher tier when devs otherwise would be focusing on pushing everything out of the base machines.

It happened with pro and X and it doesn't need to be so
 
For more realistic destruction like the bridge in Matrix Awakens demo they can use some pre fractured geometry and baked the physic simulation inside a cache to replay it.


The Chaos Destruction system is a collection of tools that can be used to achieve cinematic-quality levels of destruction in real time. In addition to great-looking visuals, the system is optimized for performance, and grants artists and designers more control over content creation and the fracturing process by using an intuitive nonlinear workflow.

The system allows artists to define exactly how geometry will break during the simulation. Artists construct the simulation assets using pre-fractured geometry and utilize dynamically-generated rigid constraints to model the structural connections during the simulation. The resulting objects within the simulation can separate from connected structures based on interactions with environmental elements, like Physics Field and collisions.

The destruction system relies on an internal clustering model which controls how the rigidly attached geometry is simulated. Clustering allows artists to initialize sets of geometry as a single rigid body, then dynamically break the objects during the simulation. At its core, the clustering system will simply join the mass and inertia of each connected element into one larger single rigid body.

The destruction system uses on a new type of asset called a Geometry Collection as the base container for its geometry and simulation properties. A Geometry Collection can be created from static and skeletal mesh sources, and then fractured and clustered using UE5's Fracture Mode.

At the beginning of the simulation a connection graph is initialized based on each fractured rigid body's nearest neighbors. Each connection between the bodies represents a rigid constraint within the cluster and is given initial strain values. During the simulation, the strains within the connection graph are evaluated. These connections can be broken when collision constraints or field evaluations apply an impulse on the rigid body that exceeds the connections limit. Physics Fields can also be used to decrease the internal strain values of the connections, resulting in a weakening of the internal structure.

For large-scale destruction simulations, Chaos Destruction comes with a new Cache System that allows for smooth replay of complex destruction at runtime with minimal impact on performance.

Chaos Destruction easily integrates with other Unreal Engine systems, such as Niagara and Audio Mixer, to spawn particles or play specific sounds during the simulation.

KZ 2 wall destruction with prefractured wall

Like many games same for example for the pillar in Killzone 2, the destruction using pre fractured geometry is baked into the object after at runtime the destruction itself is calculated using the physics engine but the debris are made of pre fractured geometry store on the disk after they are in memory but I was reading a presentation about Gears 5 and they said to pay attention to the level of destruction because some part were using a mix between the physic engine and for the destruction an alembic cache animation and this was heavy in memory and probably too heavy to be streamed from the storage.

Here in Gears 5, this is not the alembic cache system but the Swift destruction sytem.
QzbwdlO.png
 
Last edited:
Because they are just taken over by the spot of the next generation machine to begin with, on top of not being strong enough to provide a measurable leap in quality that justifies an entirely new purchase while shifting company resources away from other things tbh...

It also to some degree, shifts less priority on base machines in order to run with a marginally better experience on the higher tier when devs otherwise would be focusing on pushing everything out of the base machines.

It happened with pro and X and it doesn't need to be so

But that's your opinion on them, just because you feel they're dumb doesn't make them so for everyone.
 
I hope we will have GDDR7 inside console in 2027 or 2028. It means we can have around 1 TB/s with a 256 bits bus with 32 Gbps or 36 Gbps module of bandwidth inside a console. If they release next generation console so late, it will let extra time for GDDR 7 to be more affordable probably 3 to 4 years after release of the memory.

It may still be problematic. The PS4's CPU bandwidth had a disproportionate effect on the bandwidth available to the GPU. No cpu bandwidth allowed the gpu to have a ~135 GBs (76% of theoretical max) of bandwidth to gDDR. Just 10 GBs devoted to CPU caused overall bandwidth to fall to 125 GBs per second which allowed the GPU access to fall to 115 GBs. Thats like a 2 for 1 tradeoff.

For every 1 GBs you give to the PS4 CPU you take 2 GBs from the GPU. If AMD did nothing to improve this in future gens or if the SSD as a third major client adds an extra layer of contention that recreates this tradeoff, it may present a circumstance thats better served by more RAM not higher SSDs speeds. The HDD of the PS4 was limited to 50 MBs not 100 GBs. 50-100 GBs of bandwidth used by the CPU and 50-100 GBs used by the SDD with a 2:1 tradeoff can leave the GPU with 1TBs of max gddr bandwidth with just ~500-600 GBs (including the 76% of theoretical max) for the GPU.

If MS and Sony are targeting 4080 like performance in 2027-2028, it probably won't be a big deal but that still might take a relatively hefty increase to caches to compensate.

1673304386172.png
 
Last edited:
But that's your opinion on them, just because you feel they're dumb doesn't make them so for everyone.
Haven't you heard? My opinion is law 😎

But seriously though. There is not NEED for iterative machines even if there will be a number of people who buy them if they happen to exist. I think they just clog the marketplace and make things too messy when it isn't neccesary
 
It may still be problematic. The PS4's CPU bandwidth had a disproportionate effect on the bandwidth available to the GPU. No cpu bandwidth allowed the gpu to have a ~135 GBs (76% of theoretical max) of bandwidth to gDDR. Just 10 GBs devoted to CPU caused overall bandwidth to fall to 125 GBs per second which allowed the GPU access to fall to 115 GBs. Thats like a 2 for 1 tradeoff.

For every 1 GBs you give to the PS4 CPU you take 2 GBs from the GPU. If AMD did nothing to improve this in future gens or if the SSD as a third major client adds an extra layer of contention that recreates this tradeoff, it may present a circumstance thats better served by more RAM not higher SSDs speeds. The HDD of the PS4 was limited to 50 MBs not 100 GBs. 50-100 GBs of bandwidth used by the CPU and 50-100 GBs used by the SDD with a 2:1 tradeoff can leave the GPU with 1TBs of max gddr bandwidth with just ~500-600 GBs (including the 76% of theoretical max) for the GPU.

If MS and Sony are targeting 4080 like performance in 2027-2028, it probably won't be a big deal but that still might take a relatively hefty increase to caches to compensate.

View attachment 8093

I know but first this problem is reduce in PS5 and Xbox Series because AMD have a patent for UMA where they give priority to the CPU memory call and it reduce memory request collision. This is logic CPU is more sensible to memory latency. This is not only for consoles but for AMD APU in general.
And I expect around 22/25 GB/s for PCIE 5 to maximum around 44/50 GB/s if is PCIE 6 not 100 GB/s. In PS5 if someone push the system it can go up to 11 GB/s of uncompressed data in a burst too or more if the data is more compressible with only 448 GB/s of bandwidth and the bandwidth is shared between the CPU, GPU and the Tempest engine. And the Tempest Engine doesn't have the same mechanism to reduce memory call collision.


The Tempest engine itself is, as Cerny explained in his presentation, a revamped AMD compute unit, which runs at the GPU's frequency and delivers 64 flops per cycle. Peak performance from the engine is therefore in the region of 100 gigaflops, in the ballpark of the entire eight-core Jaguar CPU cluster used in PlayStation 4. While based on GPU architecture, utilisation is very, very different.

"GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time."
 
Last edited:
Well the idea here is that when you get into destructible environments, you're in a situation where you're fighting around a collapsed building that was destroyed dynamically, but what's currently saved on the NVME drive is just the original model. You have to keep everything destroyed in memory while playing in it, because we are assuming here you can no longer stream this geometry from the NVME drive. As you continue to play out in destroyed territory, you have to treat it like a traditional level, as in, everything needs to stay loaded in memory to be able to not recalculate everything and just have things run smoothly.
Games like Red Faction (PS2) and even Bethesda's Creation Engine have a simple solution to this problem, which is keeping the default world state on the drive and saving the differences in the save file. In older Elder Scrolls games that would typically only be corpses and looting differences but Fallout 4 show that Bethesda's engine holds to to storing any changes made to environment extremely well.

With Fallout 4 mods you can destroy/edit/create new structures anywhere in the game and the engine tracks and saves the changes, and applies changes to the default world state when when loading/streaming areas into memory.
 
Direct storage is intended to make loading times in PC games a thing of the past and enable larger, more detailed game worlds. For this purpose, access to the data storage is modernized - with impressive results in the first PCGH benchmarks with Intel Arc, AMD Radeon and Nvidia Geforce.

 
Direct storage is intended to make loading times in PC games a thing of the past and enable larger, more detailed game worlds. For this purpose, access to the data storage is modernized - with impressive results in the first PCGH benchmarks with Intel Arc, AMD Radeon and Nvidia Geforce.


Interesting article. I do disagree with one of their final conclusions though: "It is also confirmed that an SSD connected via SATA cannot benefit from direct storage, as the NVME protocol is an elementary part of the feature."

I don't think what we're seeing here is the inability of SATA SSD's to benefit from Direct Storage as in fact we know (and their screenshots show it too) that GPU decompression will work on any drive. What we're actually seeing here is that a 12900k is fast enough to max out the throughput of a SATA drive and so the GPU's are unable to show a speed advantage. It doesn't show you how the 12900K is still being offloaded by the GPU though, or how a slower CPU might fare that could perhaps not keep up with the SATA SSD's max throughput.

Also worth noting that Intel used this same test on the same hardware and got much higher results. I'm not sure if it has options to tweak but I'm curious to know why the difference exists:

 
Interesting article. I do disagree with one of their final conclusions though: "It is also confirmed that an SSD connected via SATA cannot benefit from direct storage, as the NVME protocol is an elementary part of the feature."

Microsoft's most recent developer blogs focus only on NVMe enabling DirectStorage APIs on Windows. DirectStorage is more than just lightening the I/O workloads and increasing bandwidth, it's about the flexibility to re-prioritising I/O needs in realtime. PATA/SATA simply is orders of magnitudes behind on this front.

If support of SATA/PATA connected drives were on the horizon, Microsoft would be talking about this, just like they were talking about DirectStorage long before they announced the hardware requirements.
 
Microsoft's most recent developer blogs focus only on NVMe enabling DirectStorage APIs on Windows. DirectStorage is more than just lightening the I/O workloads and increasing bandwidth, it's about the flexibility to re-prioritising I/O needs in realtime. PATA/SATA simply is orders of magnitudes behind on this front.

If support of SATA/PATA connected drives were on the horizon, Microsoft would be talking about this, just like they were talking about DirectStorage long before they announced the hardware requirements.

From that article:

"Storage Device: DirectStorage enabled games will work on all devices (. You’ll need an NVMe SSD, where the bandwidth capabilities are much higher and the storage media itself is faster, to see the significant improvements of DirectStorage. We highly recommend ensuring your game files are saved to an NVMe to get the best gaming experience."

NVMe is needed to see the full advantage of DS, particularly from the lowered overhead on the API side which I believe relies heavily on the NVMe protocol, but GPU decompression is supported on any device with it's associated reduction in CPU load.

Also, the screenshots from that PCGH article show GPU decompression is being used in the SATA benchmarks. We've seen the same with HDD's previously too.
 
So I run several benchmarks, DirectStorage on cpu, gpu, via sata ssd, sata hdd, nvme drive hooked up directly to the cpu, through b550 chipset et al.

5950X@CO -30@ALL CORES, 4*8GB 3600MHz CL14,
Palit RTX 4090 GameRock OC @2745MHz/24000MHz, 16*PCIe 4.0
Win11 Home 22H2, Driver 528.02,

DS @CPU, Samsung 970PRO 512GB, 4*PCIe 3.0, Directly to CPU
bulkloaddemo_2023_01_z0cst.png


DS @GPU, Samsung 970PRO 512GB, 4*PCIe 3.0, Directly to CPU
bulkloaddemo_2023_01_l2drl.png


DS @GPU, Kingston KC3000 2TB, 4*PCIe 4.0, Directly to CPU
bulkloaddemo_2023_01_5nfsz.png


DS @GPU, HikVision E2000 2TB, 4*PCIe 3.0, Through B550 Chipset
bulkloaddemo_2023_01_6ai9r.png


DS @GPU, Samsung 850EVO 256GB, SATA 3.0, Through B550 Chipset
bulkloaddemo_2023_01_the3u.png


DS @GPU, Western Digital Blue 4TB, SATA 3.0, Through B550 Chipset
bulkloaddemo_2023_01_99cvi.png
 
Last edited:
Back
Top