Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

not sure, i wasn't sure if there was a way to by pass PCIE and go straight thunderbolt but I guess you confirmed you cannot.

Thunderbolt is a standard of connector carrying PCIe, USB and DisplayPort. It doesn't have any underlying bus technology of it's own.

So I'm not understanding why thunderbolt came into the discussion around bandwidth when it's clearly going to have less than PCIE?

In this post you mentioned "TB+" which I thought was a reference to Thunderbolt!?! :???:
 
So I'm not understanding why thunderbolt came into the discussion around bandwidth when it's clearly going to have less than PCIE?
I feel like you're conflating terms again.

Thunderbolt has exactly the same bandwidth as PCIE, becuase it is PCIE. What you're trying to say, and missing (the full context is important!) that Thunderbold only provides four lanes of PCIE, when an internal socket on the motherboard could provide as much as 16 lanes.

Which, that's great if you have an accessible motherboard. Howabout laptops? Howabout mini desktops? Thunderbolt is PCIE, so connecting an outboard GPU using modern Thunderbolt is a viable option for machines with no "full sized" PCIE sockets available.
 
I feel like you're conflating terms again.

Thunderbolt has exactly the same bandwidth as PCIE, becuase it is PCIE. What you're trying to say, and missing (the full context is important!) that Thunderbold only provides four lanes of PCIE, when an internal socket on the motherboard could provide as much as 16 lanes.

Which, that's great if you have an accessible motherboard. Howabout laptops? Howabout mini desktops? Thunderbolt is PCIE, so connecting an outboard GPU using modern Thunderbolt is a viable option for machines with no "full sized" PCIE sockets available.
It’s a viable option for laptops but not too many ML setups using eGPUs. Right now Apples M1 offers pretty decent performance but, yea CUDA is out. With laptops I would just go with paying for cloud pricing for GPUs.
 
With laptops I would just go with paying for cloud pricing for GPUs.
Well, this too makes assumptions on A: what you intend to do with that GPU, and B: the quality of your internet connection. Here's my personal anecdote:

My personal preference is for first person video games, such as the Fallout and Elder Scrolls series, also the Borderlands and Outer Worlds series, and I also enjoy some RTS stuff like Age of Empires and Endless Space and Civilizations. Also, around September of 2020 I purchased a lightly used 33-foot Class-A RV to travel the countryside while waiting for the panedmic to burn itself out. While the kids were in virtual school and I was at virtual work, our family travelled 16,000 miles through 22 states and countless national and state parks for nearly a year -- coming home perhaps four or five times to either clean out for a seasonal change of wardrobe, to make minor repairs (I replaced one of the roof-mounted AC+Heat Pump units in January when it died.) We finally returned home late August so the kids to could return to in-person schools.

During these 11 months on the road, our only access to internet was some form of cellular hotspot. My wife and I both have "unlimited plan" (man that's a bullshit term, at least in the US) through AT&T via our cell phones, and then my employer gave me an "unlimited plan" Verizon hotspot device as well. Because AT&T and Verizon are both different RF spectrum carriers, we had pretty decent luck getting at least some internet in most of the places we visited. Still, it's cell coverage so performance varied wildly: near metro areas we might see maybe as much as 100mbit with latency in the teens, however in the deep country where most state and national parks exist, you'd get single-digit megabit with latencies in the 100+ msec range.

In probably 75% of the places we stayed (everywhere we went was at least a week duration, many times two weeks or even more) the internet would be insufficient to comfortably play a first person shooter. Maaayyybe the RTS games could've worked, if there was enough bandwidth, however the "unlimited" plans still have bandwidth caps and slow down to ~256kbps after you exceed them. And when that happened, it seriously impacted my kids and my own abilities to do school and work remote -- because there was simply not enough bandwidth for virtual meetings (kids HAD to be on video, I HAD to be on video as well.)

So no, virtual GPUs in the cloud are not a solve-all despite what some people who never leave their in-city apartment or house might otherwise think. My Gigabyte Aero 15x v8 with a Core i7-8750H (630 UHD iGPU) with the Optimus 1070M made quick, easy work of every game I wanted to play on its 15", 1080P 144Hz display. And in doing so, I never had to worry about if I had cell coverage, or if the cell coverage sucked in the back bedroom but was actually good at the front of the truck where my kids were sleeping (yes, this is a thing that really happens in real life.)
 
Well, this too makes assumptions on A: what you intend to do with that GPU, and B: the quality of your internet connection. Here's my personal anecdote:

My personal preference is for first person video games, such as the Fallout and Elder Scrolls series, also the Borderlands and Outer Worlds series, and I also enjoy some RTS stuff like Age of Empires and Endless Space and Civilizations. Also, around September of 2020 I purchased a lightly used 33-foot Class-A RV to travel the countryside while waiting for the panedmic to burn itself out. While the kids were in virtual school and I was at virtual work, our family travelled 16,000 miles through 22 states and countless national and state parks for nearly a year -- coming home perhaps four or five times to either clean out for a seasonal change of wardrobe, to make minor repairs (I replaced one of the roof-mounted AC+Heat Pump units in January when it died.) We finally returned home late August so the kids to could return to in-person schools.

During these 11 months on the road, our only access to internet was some form of cellular hotspot. My wife and I both have "unlimited plan" (man that's a bullshit term, at least in the US) through AT&T via our cell phones, and then my employer gave me an "unlimited plan" Verizon hotspot device as well. Because AT&T and Verizon are both different RF spectrum carriers, we had pretty decent luck getting at least some internet in most of the places we visited. Still, it's cell coverage so performance varied wildly: near metro areas we might see maybe as much as 100mbit with latency in the teens, however in the deep country where most state and national parks exist, you'd get single-digit megabit with latencies in the 100+ msec range.

In probably 75% of the places we stayed (everywhere we went was at least a week duration, many times two weeks or even more) the internet would be insufficient to comfortably play a first person shooter. Maaayyybe the RTS games could've worked, if there was enough bandwidth, however the "unlimited" plans still have bandwidth caps and slow down to ~256kbps after you exceed them. And when that happened, it seriously impacted my kids and my own abilities to do school and work remote -- because there was simply not enough bandwidth for virtual meetings (kids HAD to be on video, I HAD to be on video as well.)

So no, virtual GPUs in the cloud are not a solve-all despite what some people who never leave their in-city apartment or house might otherwise think. My Gigabyte Aero 15x v8 with a Core i7-8750H (630 UHD iGPU) with the Optimus 1070M made quick, easy work of every game I wanted to play on its 15", 1080P 144Hz display. And in doing so, I never had to worry about if I had cell coverage, or if the cell coverage sucked in the back bedroom but was actually good at the front of the truck where my kids were sleeping (yes, this is a thing that really happens in real life.)
oh, sorry I wasn't referring to gaming; I was referring strictly to my resolution/clarity app I'm building but with Data Science in general; essentially set your environment up off premise and B run the job and then available bandwidth isn't really even needed, just run it and close the laptop and return when the job is completed.

But with gaming yea you have quite a few more options available if you want an external GPU. Geforce Now is reasonable for cloud gaming I think. Especially if you have something extremely weak.
I've never looked into eGPU for gaming, but I agree it's a reasonable compromise for certain situations.

With respect to the original topic of whether we need constantly faster PCIE, so much that' it's exceeding fast I/O, I mentioned my program indeed runs into a PCIE bottleneck. Something I discovered when I sent it over to DF to trial, the codecs they run can only be decoded CPU side, but the work I need done is GPU side, so there's a lot of PCIE transferring going back and forth and this is separate but tangential discussion from I/O speeds in games.

I think there's definitely some benefit to be had by constantly moving up the PCIE bus speed even if I/O doesn't need to follow suit necessarily. There are definitely other possible benefits in reducing the bottleneck between CPU and GPU.
 
https://gamingbolt.com/ps5s-ssd-is-...roke-martha-is-dead-levels-during-development
GamingBolt interviewed LKA’s Tommaso Bonanni on how the team took advantage of the PS5 SSD’s 5.5 GB/s Raw Read Bandwidth and 9-9 GB/s Compressed Bandwidth, and how it compared to the Xbox Series X.

Bonanni said, “We are all amazed by the performance of the PS5 SSD, which in itself is a technical gem. However, the element that really makes the difference here is the PS5’s data reading and writing system, which is extremely efficient. Even on some occasions we have found the fact that PS5 loads some scenes too quickly compared to other consoles and PCs, literally broke the loading of the levels because it was totally unexpected, so we had to get our hands on the code to counterbalance this excessive loading speed.”
 
Something like Forbidden West will be a really interesting comparison point given that it's likely designed to take full advantage of the PS5's IO system, but is also likely to get a PC port. Possibly even a DirectStorage enabled one if it ever comes out.
That the question if crossgen game fully utilize io system
 
Something like Forbidden West will be a really interesting comparison point given that it's likely designed to take full advantage of the PS5's IO system, but is also likely to get a PC port. Possibly even a DirectStorage enabled one if it ever comes out.
How could it? It's a PS4 game designed to run on HDD. Few games properly use the PS5 I/O. I'd say there is Ratchet and Demon's souls.
 
How could it? It's a PS4 game designed to run on HDD. Few games properly use the PS5 I/O. I'd say there is Ratchet and Demon's souls.

This got me thinking... Isn't that depending on how "properly use PS5 io" is defined?

Is simply making loadings instaneous means properly use PS5 io? Or need to have game design where it is impossible to implement without the ridiculously fast SSD?

Or simply not needing to do RAM trickery (e.g. Destiny 1 and 2 portals) as things can be directly streamed from ssd?
 
This got me thinking... Isn't that depending on how "properly use PS5 io" is defined? Is simply making loadings instaneous means properly use PS5 io? Or need to have game design where it is impossible to implement without the ridiculously fast SSD

Yup. Just using the native APIs without changing how game data is organised and stored on disc are very different things. Assassin's Creed Valhalla and Demon Souls on PS5 both have around a 20-25 second load, Ratchet & Clank is around 7 seconds (with no more loading ever), Spider-Man (original and Miles Morales) is around 6 seconds. Godfall is around 2 seconds and Astro's Playroom is basically instant.

It feels like AC and Demon Souls were designed conventionally and the other games were all designed to leverage the PS5's I/O.
 
Yup. Just using the native APIs without changing how game data is organised and stored on disc are very different things. Assassin's Creed Valhalla and Demon Souls on PS5 both have around a 20-25 second load, Ratchet & Clank is around 7 seconds (with no more loading ever), Spider-Man (original and Miles Morales) is around 6 seconds. Godfall is around 2 seconds and Astro's Playroom is basically instant.

It feels like AC and Demon Souls were designed conventionally and the other games were all designed to leverage the PS5's I/O.
My experience with demon souls is instant outside of initial game loading. I abused the hell of out farming 1-4 lol
 
My experience with demon souls is instant outside of initial game loading. I abused the hell of out farming 1-4 lol

I am only focussing on the initial load, I think this is the usage case where you will mostly experience waiting for I/O. My experience of most other PS5 games is that even those that have an initial load, when it comes to fast travel or reloading a save, it's as close to instant or just a few seconds that it doesn't register.
 
I am only focussing on the initial load, I think this is the usage case where you will mostly experience waiting for I/O. My experience of most other PS5 games is that even those that have an initial load, when it comes to fast travel or reloading a save, it's as close to instant or just a few seconds that it doesn't register.
What do you think it’s doing in its 6 seconds? It seems weird that initial load is so slow but subsequent world travel is so fast.
It is as though the streaming system is perfectly aligned with the ssd. So I’m curious to see what changed with the initial load
 
Last edited:
What do you think it’s doing in its 6 seconds? It seems weird that initial load is so slow but subsequent world travel is so fast. It is as though the streaming system is perfectly aligned with the ssd. So I’m curious to see what changed with the initial load
For Spiderman? I would imagine that a lot of common textures, models, shaders and sound effects are pre-loaded and put into memory, and likely a great deal of world state data (animations, AI, pathing etc) is calculated in slower-than realtime.

When you're in game, this is already running and game only needs to generate new data for new distant areas of the map in the direction that you're moving, or if you're not moving, removing things disappearing and creating new things that are moving into the boundaries of the simulated world.

Because things are vastly more dense closer up this is probably beyond what can be done in realtime and this is perhaps also why subsequent loads and fast travel still take ~1-2 seconds. Plotting hundreds/thousand of pedestrians, cars and other objects (animals like squirrels, birds flying etc) into the world has got to be resource and computationally intensive. The spacial distribution of anything moving with AI needs to make sense, i.e. you wouldn't have 50 people in one spot in one street, and vehicles have to obey traffic laws and so on. There is a probably a great deal of simulation data that needs to be in motion before you pop into the game world, most of which will require there own objectives, i.e. where is this car/ped/bird going etc, collisions tables and any amount of other things.
 
I don't know if it's new or not but Forspoken might be the first game to support DirectStorage API.

This session will cover the collaboration between AMD and Luminous Production on their upcoming title: Forspoken. This partnership resulted in the implementation of various AMD technologies into the game, including screen-space ambient occlusion, screen-space reflections, raytraced shadows and AMD FidelityFX Super Resolution. Along with a short presentation of those techniques, this talk will detail the process to integrate them into the Luminous Engine and how AMD and Luminous Productions worked hand in hand to optimize it across a wide range of AMD GPUs. Forspoken is also supporting the new Microsoft DirectStorage API. A part of the session will be dedicated to its addition to the game highlighting the challenges the studio faced and the benefits it is bringing to the title.
https://schedule.gdconf.com/session...nologies-of-forspoken-presented-by-amd/886052
 
I don't know if it's new or not but Forspoken might be the first game to support DirectStorage API.


https://schedule.gdconf.com/session...nologies-of-forspoken-presented-by-amd/886052

some more info, I think

https://arstechnica.com/gadgets/202...es-in-pc-game-demo-but-hardware-matters-most/

The Forspoken demo ultimately showed that the speed of the storage you're using still has a lot more to do with how quickly your games load than DirectStorage does.
 
The Forspoken demo ultimately showed that the speed of the storage you're using still has a lot more to do with how quickly your games load than DirectStorage does.

There will will some nuance to the DirectStorage implementation in software (the game) but Microsoft's API cannot change the fundamental data flow across hardware on your average PC for which there are two effective setups:

1) for drives connected over legacy (e.g IDE) buses - your data is read off the storage cells by the drive controller, passed to the bus controller in the south-bridge, which is then routed to either main memory or the graphics card memory via the north-bridge. If the GPU is decompressing data it's doing that from GDDR then writing it back to the GDDR for graphics use or redirecting it across the north-bridge controller to main memory for use by the CPU.

2) for drives using NVMe/PCIe connections - your data is read off the storage cells by the drive controller, passed to the bus controller in the north-bridge, then has to be routed to either main memory or the graphics card memory. If the GPU is decompressing data it's doing that from GDDR then writing it back to the GDDR for graphics use or redirecting it across the north bridge controller to main memory for use by the CPU.

Current generation consoles have very simple (and limited) architectures. They read data off the storage cells by a single I/O controller which decompresses automatically - and is written to one pool of shared memory. So even where PC components and drives are much faster, they are still moving data around a lot more.
 
Back
Top