Velocity Architecture - Limited only by asset install sizes

Yup. I think devs will spend a lot fo time rethinking not just how data is organised in storage but how it's packaged and what format it is in. It may be advantageous for performance (loading and running) if data is stored in formats which take a little more space but where it's quicker to utilise. I can also envisage entirely new nextgen check-in, e.g. as you're turning your car maybe you don't want to load in a streets worth of data before you start generation the world geometry, instead maybe you want to do that in smaller chunks so that world generation is handled in smaller chunks, racing to keep up with I/O as data is loaded. Much of the work I did on servers was akin to this.

On PS5 Spider-Man is a 6 second load and Astrobot is a 3 second load. I think in 2-3 years, devs will have shaved a good fraction off such times.


Im pretty sure this is the reason the COD is so massive, irrc its because they couldn't rely on the cpu having enough free processing capacity to decompress textures from a more compact format on the fly
 
That's interesting; I knew MS's controller wasn't exactly off-the-shelf but I didn't know it was also capable of higher bandwidth than the standard one. I'm curious of what other customizations were done on the controller.

About the latency stuff due to having more ICs involved in the process, I've seen that brought up once before, it could be something worth looking out for. Especially considering that in the I/O block (technically applicable for both systems, but likely less an issue on Series systems since some slice of the I/O stack processing is still done on CPU) is "equivalent" Zen 2 cores, but it's not like it has literally 13 Zen 2 cores in there. I figure all of the talk of being comparable to such and such many Zen 2 cores in that aspect is similar to the way both Sony and MS have described their audio solutions as being analogous to prior system CPUs; more for illustrative purposes to give a picture of rough peak performance capability, but not much beyond that.

Another interesting thing is that indeed Series X is performing a lot closer to PS5's SSD I/O than most probably expected, yet at the same time games are just scratching the surface of XvA. Granted, that could probably be said for PS5's SSD as well, but I'm honestly not expecting the delta between them to grow any larger than it already is in this regard. If anything, it will probably shrink even more, especially with 1P games. When you're averaging load time differences of a literal second or two and have equally performant latency/file I/O for asset streaming, it all basically becomes a moot point.



This is true, although I think the performance we're seeing with games currently between the two in terms of load times will generally stay true once the optimization process begins. It's ironic because it was actually the recent PS4 firmware update that's convinced me you can technically do a LOT more with less, considering that system's interface standard and general design regarding I/O, yet games like TLOU and Until Dawn are pulling load times there comparable with BC titles on PS5 and Series X. That says a lot IMHO.

However if, once that optimization process starts, we do see the delta start to grow between the two some (particularly with 3P titles), then I think it'll come down more to Sony's I/O solution being the "easier" of the two to leverage in a shorter span of time, since a lot of that hardware is there to automate tons of the process. MS's approach seems a bit more flexible but there are parts of it which have a higher learning curve, like parts of SFS (to my knowledge), and there could be cases where getting even lower load times comes down to parts of the game code which might have to be adjusted to accommodate that. Not all 3P titles would likely have the means to dedicate that type of resource, but MS's own API tools being readily available (and technical support for 3P devs; they seem to be very good with this) can resolve a good deal of that likely.

Exactly! Unless the PS5 has more RAM, the disk I/O subsystem in both systems performs despite the PS5 having twice the throughput. Looking at MK 11 ultimate, which is upgraded for both systems, loading into gameplay takes the same amount of time. Loading into the game from the main menu is faster on the PS5 though but its down to the Series X showing so many screens on start up. So when games get really optimized it will be more like 2 seconds vs 4 second load times or even less. No discernible difference.
 
Exactly! Unless the PS5 has more RAM, the disk I/O subsystem in both systems performs despite the PS5 having twice the throughput. Looking at MK 11 ultimate, which is upgraded for both systems, loading into gameplay takes the same amount of time. Loading into the game from the main menu is faster on the PS5 though but its down to the Series X showing so many screens on start up. So when games get really optimized it will be more like 2 seconds vs 4 second load times or even less. No discernible difference.

The only monkey wrench that might get thrown into this are ballooning game sizes as the gen goes on, but I think that's actually more a worry for storage space than loading times. Which means prices on PS5-compatible SSDs and expansion card (Series systems) better come down sooner rather than later!

I'm really interested if games start leveraging upscaling techniques and therefore can go with smaller texture files in their packages. Ninja Theory have talked about work from their end on this so I'm definitely looking forward to Hellblade II on that note (and visual mastery note as well, considering how good the first game still looks going seven years later).
 
Given the number of gamers who said they see no benefit with Quick Resume, to go between multiple games, they really don't need more than 300 GB of space, since by their own statements they only ever play one game maybe two during the same period.

For everyone else, using an external SSD as cold-storage for games needing to be on internal NVME to play is a solid experience. The copy speeds between the two are quite fast. You get 2x the space for the same cost (2TB SSD vs 1TB Storage Card).
 
The only monkey wrench that might get thrown into this are ballooning game sizes as the gen goes on, but I think that's actually more a worry for storage space than loading times. Which means prices on PS5-compatible SSDs and expansion card (Series systems) better come down sooner rather than later!

I'm really interested if games start leveraging upscaling techniques and therefore can go with smaller texture files in their packages. Ninja Theory have talked about work from their end on this so I'm definitely looking forward to Hellblade II on that note (and visual mastery note as well, considering how good the first game still looks going seven years later).

Here is a link to a recent Dirt 5 dev interview. The Series X can do 10GB in 2 seconds without the decomp block.With only about 13.5GB RAM for games, It's more than enough for instant load times once they use the decomp block. So it means most games are being held back by game designs based off current gen, otherwise they would be utilizing these speeds.
 
Here is a link to a recent Dirt 5 dev interview. The Series X can do 10GB in 2 seconds without the decomp block.With only about 13.5GB RAM for games, It's more than enough for instant load times once they use the decomp block. So it means most games are being held back by game designs based off current gen, otherwise they would be utilizing these speeds.

Yes I saw that and thought what a complete load of BS it was. How can you load 5GB raw data per second from a drive that peaks at 2.4GB/s?

Heck I think the XSX may even be limited to 2 PCIe 4x lanes which means even if the SSD itself did somehow magically double in speed, the bus with which it communicates to the rest of the system couldn't carry that much data.

More likely the source is just confused and he's talking about compressed data.
 
Yes I saw that and thought what a complete load of BS it was. How can you load 5GB raw data per second from a drive that peaks at 2.4GB/s?

Heck I think the XSX may even be limited to 2 PCIe 4x lanes which means even if the SSD itself did somehow magically double in speed, the bus with which it communicates to the rest of the system couldn't carry that much data.

More likely the source is just confused and he's talking about compressed data.
I saw that a dev wrote that, I didn't even consider it worth my time to see what was said.
Even if it was possibly lost in translation.
 
Yes I saw that and thought what a complete load of BS it was. How can you load 5GB raw data per second from a drive that peaks at 2.4GB/s?

Heck I think the XSX may even be limited to 2 PCIe 4x lanes which means even if the SSD itself did somehow magically double in speed, the bus with which it communicates to the rest of the system couldn't carry that much data.

More likely the source is just confused and he's talking about compressed data.

I do not believe he mentioned compressed or not when he first says this. Then he mentions it again but I think "compressed" in the context of the using the full suite of Velocity architecture, bcpack or other techniques a developer may utilise.

LX compression I would assume is basically raw as far as a developer sees things. All data will go through that block Regardless, I would think the GDK Lz compresses and encrypts any asset you include.

The same for PS5 which is why Cerny says devs have nothing to do to take advantage of the hardware. Your only challenge is throwing stuff at it to saturate the system.
 
Yes I saw that and thought what a complete load of BS it was. How can you load 5GB raw data per second from a drive that peaks at 2.4GB/s?

Heck I think the XSX may even be limited to 2 PCIe 4x lanes which means even if the SSD itself did somehow magically double in speed, the bus with which it communicates to the rest of the system couldn't carry that much data.

More likely the source is just confused and he's talking about compressed data.
He's a technical lead on a major title with years of experience, so I don't think he misspoke but he wasn't clear how its achieved. I agree he wasn't clear and the interviewers didn't press him to clarify because they lack knowledge about basic computer architecture. It should be clarified how its possible without the decomp block. I'm guessing there is a separate wider bus link between the SoC and the RAM. Because in any case how does the Decomp Block work with the DMA to send large decompressed amounts of data into RAM? The CPU could theoretically do the same right?
 
He's a technical lead on a major title with years of experience, so I don't think he misspoke but he wasn't clear how its achieved. I agree he wasn't clear and the interviewers didn't press him to clarify because they lack knowledge about basic computer architecture. It should be clarified how its possible without the decomp block. I'm guessing there is a separate wider bus link between the SoC and the RAM. Because in any case how does the Decomp Block work with the DMA to send large decompressed amounts of data into RAM? The CPU could theoretically do the same right?

Here's the exact quote:

Dirt 5 Technical Director David Springate said:
We look at all of these things all of the time. I can't promise which stuff is going to come in future patches, because we have to balance loads of different things, but it might. That's the best I can do. In terms of fast storage on Series X, I think that hardware is great, I worked on it with Microsoft early on and provided some feedback to them. I looked at the speed that we could get from it, you can get 10GB in two seconds in my personal early tests, it may well be able to do way better than that.

And that was without the compression in the hardware, that was just raw.

I've highlighted a couple of bits that make me wonder if they were maybe using faster hardware in the early days? The interface is PCIe4 and in the timescales they are talking about the fastest PCIe4 drives were around 5 GB/s on a 4x interface which may have been simple off the shelf models plugged into PC's that were spec'd similarly to the final XSX hardware. So if he was testing Direct Storage on early hardware it's conceivable that he was seeing 5GB/s throughput in those early tests.

I can't see how it's possible on the current final hardware though. We know the interface is 2 lanes of PCIe 4.0.thanks to this confirmation from Seagate. That interface maxes out at around 3.75GB/s and so it's literally physically impossible to be getting 5GB/s from SSD to APU without using compression even if the drive itself wee capable of more than 2.4GB/s.
 
Here's the exact quote:
I can't see how it's possible on the current final hardware though. We know the interface is 2 lanes of PCIe 4.0.thanks to this confirmation from Seagate. That interface maxes out at around 3.75GB/s and so it's literally physically impossible to be getting 5GB/s from SSD to APU without using compression even if the drive itself wee capable of more than 2.4GB/s.


maybe the decompression happens transparently to the game dev? so even though the drive is delivering 2.4 GB/s it's no different to the dev than a drive that's 4.8 GB/s? it could be that he's talking about once games start getting made for it they will be able to do affectively 5 GB/s, or thereabouts. Its very possible that codemasters have been playing around with the tech on internal demos just to get a feel for the tech and how best to implement things like SFS and the velocity architecture.
 
Here's the exact quote:



I've highlighted a couple of bits that make me wonder if they were maybe using faster hardware in the early days? The interface is PCIe4 and in the timescales they are talking about the fastest PCIe4 drives were around 5 GB/s on a 4x interface which may have been simple off the shelf models plugged into PC's that were spec'd similarly to the final XSX hardware. So if he was testing Direct Storage on early hardware it's conceivable that he was seeing 5GB/s throughput in those early tests.

I can't see how it's possible on the current final hardware though. We know the interface is 2 lanes of PCIe 4.0.thanks to this confirmation from Seagate. That interface maxes out at around 3.75GB/s and so it's literally physically impossible to be getting 5GB/s from SSD to APU without using compression even if the drive itself wee capable of more than 2.4GB/s.
Maybe the VA has a significant role to play in this.
 
maybe the decompression happens transparently to the game dev? so even though the drive is delivering 2.4 GB/s it's no different to the dev than a drive that's 4.8 GB/s? it could be that he's talking about once games start getting made for it they will be able to do affectively 5 GB/s, or thereabouts. Its very possible that codemasters have been playing around with the tech on internal demos just to get a feel for the tech and how best to implement things like SFS and the velocity architecture.

I guess it is possible that he didn't even realise the data was compressed but I'd kinda expect a bit more from game Technical Director! He was pretty explicit in that he didn't think compression was being used:

"And that was without the compression in the hardware, that was just raw."

Maybe the VA has a significant role to play in this.

The Velocity Architecture is just the marketing name given to the XSX IO system that we're already discussing. We already understand pretty well how it works.
 
"And that was without the compression in the hardware, that was just raw."
.

Ah, I missed that quote.
Maybe he was being generous with the '2 seconds'? if it was 2.9 seconds that would check out. Microsoft said that the 2.4 GB/s is the sustained performance, so I assume it can peak higher than that, so 10/2.9 = 3.44 GB/s ? Seems plausible
 
If you send 5GB of compressed data (that goes through the ASIC block) in two seconds, you'll get 10GB of uncompressed data in 2 two seconds.

Maybe that's all he's saying. But I don't know.
 
Ah, I missed that quote.
Maybe he was being generous with the '2 seconds'? if it was 2.9 seconds that would check out. Microsoft said that the 2.4 GB/s is the sustained performance, so I assume it can peak higher than that, so 10/2.9 = 3.44 GB/s ? Seems plausible

When they say sustained, I take them to mean that the drive can sustain it's peak throughput, probably due to robust cooling. The peak should just be a function of the memory speed and the controller. We know the controller is capable of 3.75GB/s but if it's using slower memory then the peak will be 2.4GB/s. However there's no reason if couldn't sustain that under the right operations if cooling is sufficient.
 
Here's the exact quote:



I've highlighted a couple of bits that make me wonder if they were maybe using faster hardware in the early days? The interface is PCIe4 and in the timescales they are talking about the fastest PCIe4 drives were around 5 GB/s on a 4x interface which may have been simple off the shelf models plugged into PC's that were spec'd similarly to the final XSX hardware. So if he was testing Direct Storage on early hardware it's conceivable that he was seeing 5GB/s throughput in those early tests.

I can't see how it's possible on the current final hardware though. We know the interface is 2 lanes of PCIe 4.0.thanks to this confirmation from Seagate. That interface maxes out at around 3.75GB/s and so it's literally physically impossible to be getting 5GB/s from SSD to APU without using compression even if the drive itself wee capable of more than 2.4GB/s.

The interface maxes out at 3.983GB/s. The E-19 controller maxes out at 3.75GB/s. The custom E-19 controller in the Series X has an extra asic such that it maxes out at 3.983GB/s. So its a highly custom SSD not like others that share the same controller. And the technical director for Dirt 5 was very clear when he said,

"And that was without the compression in the hardware, that was just raw."

So whatever figures he gave, with the decompression hardware it would perform better.

Here's what I suspect, the SSD in the Series X can sustain 2.4GB/s constantly. But due to the custom controller and custom firmware it can go higher than this up to 3.983GB/s and maybe sometimes lower.

But again, it would have been good if he had been pressed about whether the data was decompressed first or just sent to RAM immediately for testing purposes. If it was using the CPU for decompression, then its even more impressive because the decompression block can definitely do better.

Thats why he gave the following statement:
"it may well be able to do way better than that."
 
If you send 5GB of compressed data (that goes through the ASIC block) in two seconds, you'll get 10GB of uncompressed data in 2 two seconds.

Maybe that's all he's saying. But I don't know.
But according to him, it would be going through the pipeline without using the decompression algorithms from the decomp block(Maybe using the CPU?). And thats what makes it super impressive because it would mean with the decomp hw acceleration it can definitely outperform that.
 
The interface maxes out at 3.983GB/s. The E-19 controller maxes out at 3.75GB/s. The custom E-19 controller in the Series X has an extra asic such that it maxes out at 3.983GB/s. So its a highly custom SSD not like others that share the same controller. And the technical director for Dirt 5 was very clear when he said,

"And that was without the compression in the hardware, that was just raw."

So whatever figures he gave, with the decompression hardware it would perform better.

Here's what I suspect, the SSD in the Series X can sustain 2.4GB/s constantly. But due to the custom controller and custom firmware it can go higher than this up to 3.983GB/s and maybe sometimes lower.

I think you've misunderstood what that link was saying. The Western Digital SN530 is a 2.4GB/s SSD that uses a PCIe 3.0 (4x) interface. The custom ASIC used on the Xbox drive merely allows it to use a PCIe 4.0 (2x) interface instead (likely to make future expansion cards easier to manufacture).

The drive speed is still the same at 2.4GB/s which is a factor of the controller and the memory speed. The controller is capable of 3.75GB/s if you pair it with the fastest compatible memory. The SN530 as uses in the Series X uses a slower tier of memory which gives you the 2.4GB/s. That's why Microsoft advertise it as such.

The 3.983 GB/s is merely the theoretical max throughput of a PCIe 3.0 4x or PCIe 4.0 2x interface. You're still limited by the controller on the XSX to 3.75GB/s and then limited again by the memory connected to that controller to 2.4GB/s.
 
Back
Top