Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

At the end the Jaguar CPU was only using 6.75 cores only 25% of the 6th core before it was like PS5.
The last update anyone talked about was bringing it up to 6.5 Cores for non-vr Games. If the game was VR, the game had less cpu to use at 6.4 Cores.

PS4
Around late 2015 Sony changed System Reservations from 100% CPU Core #7 to allow for using only 50% of CPU Core #7 or 60% of CPU Core #7 if VR, giving developers use of 50% of CPU Core #7 or 40% of CPU Core #7 if VR.

 
The last update anyone talked about was bringing it up to 6.5 Cores for non-vr Games. If the game was VR, the game had less cpu to use.

But at the end it change nothing, they need a thread to work with Tempest Engine managing 3d audio from OS side not Game side.
 
The person who said PC can't stream in new data per frame is calling me wrong? 😂

Not from the secondary storage. And that is what's talked about here troublemaker.

When working with current gen asset geometry and textures which are large and various, It is better to have 16gb hUMA with console-like i/o than it is to have incongruous 16gb VRAM with 32gb system RAM, assuming you have the API and engines to properly leverage, no question Playstation studios will have them.
 
Not from the secondary storage. And that is what's talked about here troublemaker.

When working with current gen asset geometry and textures which are large and various, It is better to have 16gb hUMA with console-like i/o than it is to have incongruous 16gb VRAM with 32gb system RAM, assuming you have the API and engines to properly leverage, no question Playstation studios will have them.
No it isn't... not when you have proper APIs and engines to leverage them on the PC side... Like DirectStorage.
 
You are a baby crying for your results.
Hell of a debut here kiddo.

Regarding the "Spiderman not loading some textures proves the PC CPU's can't keep up": This explanation would only really make sense if this kept occurring in every scene with different random textures that took longer to pop in. However, that's usually not how it operates. This was first noticed in one of the earlier opening cutscenes, where the cop's radio and the labelling on the side of the police car never transition into their proper full-res versions - which do work perfectly well in many other cutscenes. It makes no sense that the game would load 98% of the textures with no delay in these scenes and then just...stop. When it occurs overlooking the city, you can jump in and see 95% of the textures loaded in barely 2 seconds - but sometimes, one specific building face will take 5+ secs, or never transition until you get closer. Why would the entire city be rendered almost instantaneously when you load in, but one or two out of 50+ building faces take 4X as long to show up?

It's a bug. A pernicious one that should have been fixed by now, but a bug.

You would also expect to see these texture errors occur more often as the CPU load ramps up if this was a CPU bottleneck. Such as seeing more texture pop-in when using RT. But you don't. You can get them if you're running at 120+ fps without RT, or under 60. You don't see any appreciable higher amount of these texture errors on a Core i3 vs a Core i9 - no matter how powerful the system, those cutscenes will always have those specific texture errors. If this was purely a CPU decompression issue, then the SteamDeck experience would have the texture detail of Virtua Fighter.

Maybe the additional complexity of managing this within a non-UMA architecture and having to code decompression to use the CPU has made this a bottleneck in some way, obviously it's apparently not a simple fix (albeit I don't know if Miles Morales suffers from this). The strange way it manifests doesn't really lend much credence to your theory that the 'overheard' of using the CPU and transferring data from ram to VRAM is the culprit though.
 
Last edited:
Not from the secondary storage. And that is what's talked about here troublemaker.

When working with current gen asset geometry and textures which are large and various, It is better to have 16gb hUMA with console-like i/o than it is to have incongruous 16gb VRAM with 32gb system RAM, assuming you have the API and engines to properly leverage, no question Playstation studios will have them.

It's better to have large pools of physical RAM.
 
So at the end of the conversation, does it make sense for PC to need some extra ram to swap in assets? 32gb system memory still seems excessive for a console that only has 16 gigs of ram total
 
So at the end of the conversation, does it make sense for PC to need some extra ram to swap in assets?
In the absence of an SSD, yes, going by the recommended specs for Returnal. In the presence of an SSD, no, going by the game's minimum specs. All arguments for/against at this point are (wild) speculation and we just need wait for Returnal to release so we can evaluate the differences and make informed deductions about the impact of the I/O systems. At that point, hopefully those who are woefully misguided are 'man enough' to own up and eat humble pie...
 
In the absence of an SSD, yes, going by the recommended specs for Returnal. In the presence of an SSD, no, going by the game's minimum specs. All arguments for/against at this point are (wild) speculation and we just need wait for Returnal to release so we can evaluate the differences and make informed deductions about the impact of the I/O systems. At that point, hopefully those who are woefully misguided are 'man enough' to own up and eat humble pie...
At the baseline, I have wondered if returnal was "next gen", ie if it's structure was something made much easier using an SSD than having to rely on ps4s stock drive.

Would you say it's warranted? Or would a ps4 equipped with hdd still be able to do a similar structure of game easily
 
Not from the secondary storage. And that is what's talked about here troublemaker.

Back this up.

Where is the bottleneck specifically in that data transfer that makes it impossible in the millisecond timescales of a frame?

NVMe read latency are in the micro second range - 3 orders of magnitude faster.

PCs can support NVMe drives with even greater throughput than the PS5 and the PCIe 4.0 4x interface is the same in both systems.

So where is the bottleneck and what evidence do you have to support its existence?
 
So at the end of the conversation, does it make sense for PC to need some extra ram to swap in assets? 32gb system memory still seems excessive for a console that only has 16 gigs of ram total
Some, yes given the wide variety of PC builds. It's easier to have your bases covered by specifying higher than needed.

Now as to the amount of memory, the console games only have access to 12.5 Gigs for GPU and CPU use. Being extra conservative, on the PC once you remove OS use etc, that 32 GB is down to 28 GB (overly conservative numbers). So now you're looking at something like 28 GB CPU memory and 8 GB GPU memory in that situation.

There are so many ways to attempt to break down the use of this memory, that I'm not sure where to even start. Does one pick a starting point for how much memory is used for the program itself and work out how much GPU and leftover is available?

For example, If the game uses 4.5 GB RAM for the executable code, then the theoretical memory usage is something like this?
Console footprint: 4.5 GB code use, 8 GB GPU, 0 GB available memory
PC footprint: 4.5 GB code use, 8 GB GPU, 23.5 GB available memory

A 23.5 GB general memory pool sure can buffer a metric ton of game assets.

With only 16 GB memory instead of 32, those numbers for the same hypothetical would be 12 GB instead of 28 GB as a starting point, and 7.5 GB available general memory pool instead of 23.5 GB.
 
Some, yes given the wide variety of PC builds. It's easier to have your bases covered by specifying higher than needed.

Now as to the amount of memory, the console games only have access to 12.5 Gigs for GPU and CPU use. Being extra conservative, on the PC once you remove OS use etc, that 32 GB is down to 28 GB (overly conservative numbers). So now you're looking at something like 28 GB CPU memory and 8 GB GPU memory in that situation.

There are so many ways to attempt to break down the use of this memory, that I'm not sure where to even start. Does one pick a starting point for how much memory is used for the program itself and work out how much GPU and leftover is available?

For example, If the game uses 4.5 GB RAM for the executable code, then the theoretical memory usage is something like this?
Console footprint: 4.5 GB code use, 8 GB GPU, 0 GB available memory
PC footprint: 4.5 GB code use, 8 GB GPU, 23.5 GB available memory

A 23.5 GB general memory pool sure can buffer a metric ton of game assets.

With only 16 GB memory instead of 32, those numbers for the same hypothetical would be 12 GB instead of 28 GB as a starting point, and 7.5 GB available general memory pool instead of 23.5 GB.
So in short.... 🤔
 
So in short.... 🤔

It's still early for me, just waking up and gathering my thoughts. With that disclaimer in place...

I think 16 GB system memory can work even from "rust spinners" hard drives. It only needs some prefetching hints. At this point, they should know the actual level layout and what assets are needed before they're used and at what point it makes sense to prefetch and buffer them.

Now this requires more development time to put in place if they didn't have anything already existing like this. Ie: player is in Section X with Adjacent sections Y, Y+1, and Y+2 that have random monsters M1, M3, and M9 in them.
 
It's still early for me, just waking up and gathering my thoughts. With that disclaimer in place...

I think 16 GB system memory can work even from "rust spinners" hard drives. It only needs some prefetching hints. At this point, they should know the actual level layout and what assets are needed before they're used and at what point it makes sense to prefetch and buffer them.

Now this requires more development time to put in place if they didn't have anything already existing like this. Ie: player is in Section X with Adjacent sections Y, Y+1, and Y+2 that have random monsters M1, M3, and M9 in them.
Thank you!
 
The question here is not whether you can get more out of current generation consoles than PCs, because more can be done with the properly used low-level API, fine-grained memory movement and the game engine built for sampler feedback streaming. The question is when and to what extent MS will use these modern techniques. Because everyone can be quite sure that the PCs of the HDD era significantly delay the use of the real knowledge of today's consoles. If almost all sold PCs will have an SSD, then it will be possible to take advantage of the enormous data transfer speed that, for example, the I/O capacity of the SSDs in the consoles together with DirectStorage can provide.

As an example, 16 GB of RAM can be more than enough for super-res textures with spectacular GI and Ray-tracing if the SFS data movement method is used effectively. From 2023, these games will probably arrive.
 
Maybe some things to help clear up some inaccuracies I'm seeing.

Single large pool of memory is better than split pools of memory. Yes and No. In extremely simple terms.
  • Single pool of memory
    • Advantage that everything is in a single pool of memory and your CPU and GPU have easy access to all data therein.
    • Disadvantage is that there is memory contention every the time CPU and GPU need to access memory at the same time.
      • Effectively reduces the bandwidth below the theoretical.
      • Hypothetical example, 500 GB/s bandwidth. GPU can use up to 500 GB/s, CPU can use up to 100 GB/s.
        • If both access memory at the same time overall bandwidth is reduced to 400 GB/s in this hypothetical example.
        • GPU is only getting a hypothetical 323 GB/s to use
        • CPU is getting a hypothetical 75 GB/s to use
        • Hardware can obviously be designed to giver greater priority to one or the other when contention happens but there is always a penalty.
        • This means careful management of memory access by GPU and CPU is necessary to try to avoid memory contention as much as possible.
  • Two pools of memory
    • Disadvantage that data is held in seperate pools of memory.
      • Generally everything is held in main memory and anything that GPU needs has to be copied to VRAM over whatever bus is being used before it is accessible to the GPU.
        • With an x16 PCIE 4 slot that would be 31.5 GB/s
      • Sounds crippling except it isn't necessarily. It just means it's slightly more complex as a developer has to manage memory access.
    • Advantage is that if both the CPU and GPU access data simultaneously then both get full unrestricted access to memory as long as the data is in its data pool.
      • If the GPU has 500 GB/s access to VRAM then it practically always has access to 500 GB/s of memory access.
      • If the CPU has 100 GB/s access to RAM then it practically always has acccess to 100 GB/s of memory access.
      • If both access their pools of memory simultaneously that's 600 GB/s of bandwidth available to be used.
        • Because they are split you end up with overall significantly more bandwidth available
        • IF the data needed to be used immediately is in their respective RAM pools.
      • This means careful management of memory pools in order to avoid having the GPU needing to copy critical data into VRAM from RAM when it's needed.
  • Overly simplistic summary
    • It's a tradeoff between having to more carefully manage memory accesses (single pool) or having to manage your memory pools (two pools ... or more)
Ah, but what about SSD?
  • Single pool
    • Treat it as a 2nd pool of memory (single pool case).
    • Compared to a 2 pool solution (above) this is limited not only bandwidth but latency.
    • Bandwidth is worse than transferring data from RAM to VRAM in the 2 pool.
    • Latency is an order of magnitude worse than transferring data from RAM to VRAM.
    • This is basically a worse way to stream in data than if it was being streamed in from RAM.
    • However, for some forms of data access it can act as a substitute for data being immediately accessible (in RAM).
  • Two pool case
    • Treated as "another" 2nd pool of memory (DirectStorage)
      • Same advantages and disadvantages as Single Pool case.
      • You effectively now have 2x data sources for VRAM to pull data from. RAM or SSD.
    • Treated as a 3rd pool of memory (no DirectStorage)
      • Data must be copied from storage into RAM before it is available to use
End of overly simplistic overview and maybe people can talk about this while being on roughly the same page. If I got anything wrong feel free to correct me. I'm still drinking my coffee and waking up. :)

Regards,
SB
 
Last edited:
Maybe some things to help clear up some inaccuracies I'm seeing.

Single large pool of memory is better than split pools of memory. Yes and No. In extremely simple terms.
  • Single pool of memory
    • Advantage that everything is in a single pool of memory and your CPU and GPU have easy access to all data therein.
    • Disadvantage is that there is memory contention every the time CPU and GPU need to access memory at the same time.
      • Effectively reduces the bandwidth below the theoretical.
      • Hypothetical example, 500 GB/s bandwidth. GPU can use up to 500 GB/s, CPU can use up to 100 GB/s.
        • If both access memory at the same time overall bandwidth is reduced to 400 GB/s in this hypothetical example.
        • GPU is only getting a hypothetical 323 GB/s to use
        • CPU is getting a hypothetical 75 GB/s to use
        • Hardware can obviously be designed to giver greater priority to one or the other when contention happens but there is always a penalty.
        • This means careful management of memory access by GPU and CPU is necessary to try to avoid memory contention as much as possible.
  • Two pools of memory
    • Disadvantage that data is held in seperate pools of memory.
      • Generally everything is held in main memory and anything that GPU needs has to be copied to VRAM over whatever bus is being used before it is accessible to the GPU.
        • With an x16 PCIE 4 slot that would be 31.5 GB/s
      • Sounds crippling except it isn't necessarily. It just means it's slightly more complex as a developer has to manage memory access.
    • Advantage is that if both the CPU and GPU access data simultaneously then both get full unrestricted access to memory as long as the data is in its data pool.
      • If the GPU has 500 GB/s access to VRAM then it practically always has access to 500 GB/s of memory access.
      • If the CPU has 100 GB/s access to RAM then it practically always has acccess to 100 GB/s of memory access.
      • If both access their pools of memory simultaneously that's 600 GB/s of bandwidth available to be used.
        • Because they are split you end up with overall significantly more bandwidth available
        • IF the data needed to be used immediately is in their respective RAM pools.
      • This means careful management of memory pools in order to avoid having the GPU needing to copy critical data into VRAM from RAM when it's needed.
  • Overly simplistic summary
    • It's a tradeoff between having to more carefully manage memory accesses (single pool) or having to manage your memory pools (two pools ... or more)
Ah, but what about SSD?
  • Single pool
    • Treat it as a 2nd pool of memory (single pool case).
    • Compared to a 2 pool solution (above) this is limited not only bandwidth but latency.
    • Bandwidth is worse than transferring data from RAM to VRAM in the 2 pool.
    • Latency is an order of magnitude worse than transferring data from RAM to VRAM.
    • This is basically a worse way to stream in data than if it was being streamed in from RAM.
    • However, for some forms of data access it can act as a substitute for data being immediately accessible (in RAM).
  • Two pool case
    • Treated as "another" 2nd pool of memory (DirectStorage)
      • Same advantages and disadvantages as Single Pool case.
      • You effectively now have 2x data sources for VRAM to pull data from. RAM or SSD.
    • Treated as a 3rd pool of memory (no DirectStorage)
      • Data must be copied from storage into RAM before it is available to use
End of overly simplistic overview and maybe people can talk about this while being on roughly the same page. If I got anything wrong feel free to correct me. I'm still drinking my coffee and waking up. :)

Regards,
SB

AMD have a patent to reduce the problem of memory contention between CPU and GPU in an APU. It gives priority to CPU memory call and it seems to greatly reduce the problem compared to PS4 for example. I don't think APU bandwidth memory is a problem out of being an economical one. If consoles had bigger budget they could brute force the bandwidth with 384 bits bus for example or/and with higher speed memory. But consoles need to be cost effective.

Game engine work is to build some frame in 8.3ms for 120 fps, 16.6ms for 60 fps and 33.3ms for 30 fps. This is two or three orders of magnitude worse than the latency of a NVME SSD between 0.01ms and 0.225ms. Latency is not a problem too with SSD. Data can be delivered very fast inside a frame generation.

storage-sata_vs_sas_vs_nvme-f.png


And you forget one disadvantage of having more RAM it takes longer to fill it. The advantage to have less RAM and a faster storage with Direct Storage it means faster loading time.
 
You can easily increase RAM and not have a single change to load times.

With a HDD because this is the debate? It is more RAM with a slower storage speed or less RAM and a faster storage. In Returnal PC minimum requirement is 16 GB and they recommand a SSD. With 32 GB of RAM they don't even recommend a SSD.

EDIT: And this is with the same amount of memory to fill...

Forspoken_DirectStorage.jpg
 
Last edited:
With a HDD because this is the debate? It is more RAM with a slower storage speed or less RAM and a faster storage. In Returnal PC minimum requirement is 16 GB and they recommand a SSD. With 32 GB of RAM they don't even recommend a SSD.

Games require 'X' memory on load time, this won't change if a user suddenly doubles system RAM or upgrades to a GPU that has more VRAM.

We also don't know what improvements or changes they've made to the game, art or assets to even begin talking about their recommended specs for the game.
 
Back
Top