Velocity Architecture - Limited only by asset install sizes

Yesterday I realized how fast the SSD really is. If you use quick resume, just start a game like gears 5, than start another one (so the progress is saved). Close the second game and start the game with the saved state. Than you only have the read speed of the SSD and not the write speed.

It felt like just 2-3s to get back into the game. Question is, does gears use the whole available memory, or is it still limited to 9GiB? Another question would be, is compression used for saving a game to the disc (would make sense)? Because it is just a little bit to fast for that small SSD.
 
Last edited:
My take too. It's not not secret sauce, it's not some weird conspiracy, he either misremembered or something changed in development.

It's definitely not a conspiracy, and assuming it's not merely a mistake (e.g. momentarily mistaking PS5 and early XSX kit BW), then I think it's just that something changed. And that would most likely be going from early mock up to final SSD + decompression hardware to get basically the same results for next gen games.

The XSX uses the Phison PS5019-E19T in conjunction with the decompression block to get ~4.8 GB/s. However, before this was ready, you could have got very similar raw results by using something like the Phison PS5016-E16, which has a peak raw throughput of - funnily enough - 5 GB/s. Here's the link on Phison's website, including datasheet: https://www.phison.com/en/solutions/consumer/pc-laptop/pcie/973-ps5016-e16

The press release dates back to Jan 2019 and Tom's had a reference card for testing by early summer. So while they were working with Phison on the XSX / XSS drive and it's firmware, they could have got basically substituted an existing Phison controller, albeit one that's more complex, expensive, power hungry, and uses DDR4 to boot. But it's an early dev kits, cost probably doesn't matter.

Actually, the Phison E16 is pretty interesting because of how similar to PS5 performance it is. So MS certainly had the option of going that fast an SSD, but given how many years the XSX SoC would have been in development, it's clear that their post-decompression performance (and things like SFS) was very carefully thought out a long time in advance.
 
Seeing the real world performance, you actually make alot of sense. Their way too close in SSD performance, i had expected a much larger difference (due to all the hype and seemingly worlds apart performance in ssd).
 
Seeing the real world performance, you actually make alot of sense. Their way too close in SSD performance, i had expected a much larger difference (due to all the hype and seemingly worlds apart performance in ssd).

One talking point that's still used by some people to essentially "write off" Series system SSD I/O is that it's not actually about load times, it's about asset streaming and latency. Except I doubt these same people have done any reading into the FlashMap papers, because latency is a critical issue exhaustively addressed in those.

Also it's not like NAND is particularly great at low latencies to begin with, not compared to actual volatile memories, let alone cache. I could maybe see an argument if talking about PCIe but even there, some people seem to be mistaken that Series systems are using PCIe 3.0 because of the actual raw bandwidth of the drives, when that has nothing to do with it. You could have a PCIe 4.0 drive with 500 MB/s raw if you wanted, that doesn't suddenly mean it's PCIe 2.0 in terms of standard/spec. And AFAIK, PCIe 4.0 has even better latency than 3.0 not to mention there are standards that can be scaled on top with it for cache coherence like CCX (I would like to know if MS are using this, and maybe if there's some part of the flash memory controller customized to handle any slight overhead).

I do still think there will be instances where the PS5's SSD has the advantage because that should be obvious given the specs, but MS's is no slouch either and it's performing well beyond what most were saying it was going to going off some paper specs (similarly to how PS5 is performing beyond what some were saying it would in terms of graphics/framerate going off just a few paper specs).
 
The Nvme solutions are closer then expected, the same goes for the GPU/CPU etc. One camp was shouting SSD! the other GPU!. But in the end their close enough that it doesnt really matter. Something like Cp2077 (or any true next gen title) would run equally well on both, if not hampered by software ofcourse.
 
Hi...Game bundles and resources are packed to limit download times and the measure of capacity needed for every individual game. With equipment quickened uphold for both the business standard LZ decompressor just as a fresh out of the box new, restrictive calculation explicitly intended for surface information named BCPack, Xbox Series X gives the best of the two universes for engineers to accomplish enormous reserve funds with no misfortune in quality or execution. As surface information includes a huge bit of the all out by and large size of a game, having a reason constructed calculation streamlined for surface information notwithstanding the universally useful LZ decompressor, both can be utilized in corresponding to decrease the general size of a game bundle. To convey comparative degrees of decompression execution in programming would require in excess of 4 Zen 2 CPU centers.
 
Although I already said he must be wrong, since that data would exceed the SSD read capacity, there is something to be considered.
The Sampler Feedback Streaming tech allows Xbox to fetch only the required parts of a texture, and in this way reducing the data required to be read. Compared to a system without this tech, he would require 2 to 3x more data read to place to same textures on memory.
This data was explained by Microsoft, who stated the SSD bandwidth can be increased 2 to 3x with this tech.
Although Sample Feedback Streaming is a tech that works based on prediction of texture needs, based on vector movement and data from the previous frames, or in other words, a technology that requires word streaming and active gameplay, maybe he was stating that Xbox can use this tech in other ways.
Maybe if a dev knows exactly what to load at a given moment, it can reach those partial textures in the same way as SFS does, loading an effective equivalent to 5 GB/s or about 2x the SSD bandwidth!
If this were to be the case, it would explain it's words, and something good enough for loadings of a first scene in streaming games with all textures loaded at once. In it's case, since their game is a racing game, with a set scenario on the starting grid, that could potencially work.

Not shure I made myself clear...
 
How do you explain that this claim would see the Series X significantly exceeding the theoretical limit of the PCIe 4.0 2x link which we know connects the SSD to the APU?
I think the most likely explanation is the Dirt 5 dev was using different hw. It can't possibly load 10GB in 2 seconds without decompression hw like he said, unless the I/O bandwidth from the SSD was 5GB/s or higher. Which is not the case. So he should have been asked to clarify by someone more curious.
 
Yes and under ideal circumstances the peak can presumably be sustained. The reality is that the peak is determined by the controller capabilities and memory speed though, and Microsoft have based their specification on that (as have Sony).

Let's say the XSX SSD really was operating with the fastest, most expensive memory that it's Phison E19T controller was able to handle meaning that it could transfer 3.75GB/s or 7.5GB/s with BCPACK compression. Do you seriously think that Microsofts multimillion dollar marketing machine wouldn't pick up on that fact and use it? And that they'd rather use the estimated real world throughput when Sony are by contrast using the theoretical maximum throughput?

I think MSFT has been clear about their figures and the fact they're conservative. The 2.4GB/s is what they guarantee it can sustain, but in fact it is a conservative figure. The max raw throughput is still going to be 3.75GB/s but at a minimum devs should expect to dump 2.4GB into RAM per second before the decomp block kicks in to make it 4.8GB. This is why Andrew Goossen was quoted saying with decompression it is higher than 6GB/s.

"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s,"

In the end, the combination of the SSD I/O and decomp block means the 4.8GB of decompressed data in RAM is a conservative figure.
 
One talking point that's still used by some people to essentially "write off" Series system SSD I/O is that it's not actually about load times, it's about asset streaming and latency. Except I doubt these same people have done any reading into the FlashMap papers, because latency is a critical issue exhaustively addressed in those.

Also it's not like NAND is particularly great at low latencies to begin with, not compared to actual volatile memories, let alone cache. I could maybe see an argument if talking about PCIe but even there, some people seem to be mistaken that Series systems are using PCIe 3.0 because of the actual raw bandwidth of the drives, when that has nothing to do with it. You could have a PCIe 4.0 drive with 500 MB/s raw if you wanted, that doesn't suddenly mean it's PCIe 2.0 in terms of standard/spec. And AFAIK, PCIe 4.0 has even better latency than 3.0 not to mention there are standards that can be scaled on top with it for cache coherence like CCX (I would like to know if MS are using this, and maybe if there's some part of the flash memory controller customized to handle any slight overhead).

I do still think there will be instances where the PS5's SSD has the advantage because that should be obvious given the specs, but MS's is no slouch either and it's performing well beyond what most were saying it was going to going off some paper specs (similarly to how PS5 is performing beyond what some were saying it would in terms of graphics/framerate going off just a few paper specs).

Yes the lower latency would definitely improve I/O performance. If there's any aspect of the two systems architecture that is close, its the disk I/O then maybe the CPU. Having 2x the I/O throughput would have made a huge difference if the PS5 had more RAM but that is not the case. Filling up ~13GB of RAM in a second vs two isn't going to make a difference because of locality. You'd be reloading the same areas in a scene over and over again. Better to cache some data and only have a portion which is constantly refreshed. I think this is why SFS is so impressive and truly augments memory. I think its been unintentionally misleading people to believe that RAM works best if its constantly updated with new data yet its just a cache and needs to utilize locality. The people that talk about streaming of assets don't seem to be aware of these.
 
The SFS technique primarily freed up a significant amount of memory bandwidth. This provides a third (!) of the bandwidth for the textures and plenty of room for bandwidth bound special effects such as Ray-tracing.
 
I think MSFT has been clear about their figures and the fact they're conservative. The 2.4GB/s is what they guarantee it can sustain, but in fact it is a conservative figure. The max raw throughput is still going to be 3.75GB/s but at a minimum devs should expect to dump 2.4GB into RAM per second before the decomp block kicks in to make it 4.8GB. This is why Andrew Goossen was quoted saying with decompression it is higher than 6GB/s.

"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s,"

In the end, the combination of the SSD I/O and decomp block means the 4.8GB of decompressed data in RAM is a conservative figure.

The XBSX SSD is a Western Digital SN530 which is rated at 2.4GB/s max read speed.

https://www.tweaktown.com/news/7613...as-custom-asic-to-support-pcie-4-0/index.html
https://www.westerndigital.com/products/commercial-internal-drives/industrial-ix-sn530-nvme-ssd
 
The XBSX SSD is a Western Digital SN530

There are multiple models as it is multi-sourced and has different specifications than the "standard PC SN530".

According to Western Digital, the Xbox Series X's WD SN530 SSD isn't a stock OEM drive that's limited to PCIe Gen3 x4 performance. Instead, the drive has been outfitted with a special ASIC that enables both PCIe Gen3 x4 and Gen4 x2 performance, which allows for up to 3.938 GB of max throughput. For reference, the Series X targets 2.4GB/sec in uncompressed data transfers.
 

I think it's completely wrong to compare the off the shelf card to the one in the series X as the reporters in the article you linked learned. One would be made to believe its a PCIe Gen 3 SSD with max read speed of 2.4GB/s. In reality, the speed of the SSD will be determined by the PCIe Gen 4.0 controller(3.7GB/s max SR & W. and the firmware(which in this case is custom for the Xbox).

If anything the custom firmware is the biggest advantage the Series X has over other SSDs with the same controller. It was completely rewritten by Andrew Goossen's team at MSFT according to information from HotChips. This is why we say the 2.4GB/s is sustained and that is the language from MSFT as well. There has been no indication anywhere else that the 2.4 figure is a theoretical max. MSFT's goal was to ensure the figures they provided in the spec sheet were sustained. From the processor clocks speeds to the disk I/O throughput.
 
I think it's completely wrong to compare the off the shelf card to the one in the series X as the reporters in the article you linked learned. One would be made to believe its a PCIe Gen 3 SSD with max read speed of 2.4GB/s.

The SSD in the Series X is a cache-less Western Digital SN530 SSD with custom ASIC for PCIe Gen 4 support. The PC version of that SSD is rated at 2.4GB/s on a PCIe Gen 3 x4 interface whereas the Series X is on Gen 4 x2 but both max at 3.938GB/s.

In reality, the speed of the SSD will be determined by the PCIe Gen 4.0 controller(3.7GB/s max SR & W. and the firmware(which in this case is custom for the Xbox).

The Max throughput of the controller as rated by Phison is 3.75GB/s. The max speed of an SSD is determined by the sum of its parts and not by the max rated capability of the controller or the PCIe lane. You have to take into account the rated speed of the NAND as well.

If anything the custom firmware is the biggest advantage the Series X has over other SSDs with the same controller. It was completely rewritten by Andrew Goossen's team at MSFT according to information from HotChips. This is why we say the 2.4GB/s is sustained and that is the language from MSFT as well. There has been no indication anywhere else that the 2.4 figure is a theoretical max. MSFT's goal was to ensure the figures they provided in the spec sheet were sustained. From the processor clocks speeds to the disk I/O throughput.

They cannot guarantee a minimum of 2.4GB/s as that is entirely dependent on the the type and size of file you are reading or writing. They can market their sustained speed as does every storage manufacturer hence 2.4GB/s raw and 4.8GB/s compressed. A game written for last gen consoles for example will not be able to take advantage of that same SSD as a game written for current gen. Here is an experiment you can try if you have a Series X and the accompanying expansion card. Copy a game from the internal SSD to the external SSD and time how fast the copy process takes (a 50GB game should take ~20 sec at 2.4GB/s). You will find you would not even hit close to 1GB/s copy speed owing to it being cache-less and that incurs a huge penalty on SSD speed. That is just the OS doing a read and write operation so it is a straight stress test of the SSD without the clever decompression and sampler feedback file management multipliers games will use to gain higher effective speed.
 
Last edited:
Yes the lower latency would definitely improve I/O performance. If there's any aspect of the two systems architecture that is close, its the disk I/O then maybe the CPU. Having 2x the I/O throughput would have made a huge difference if the PS5 had more RAM but that is not the case. Filling up ~13GB of RAM in a second vs two isn't going to make a difference because of locality. You'd be reloading the same areas in a scene over and over again. Better to cache some data and only have a portion which is constantly refreshed. I think this is why SFS is so impressive and truly augments memory. I think its been unintentionally misleading people to believe that RAM works best if its constantly updated with new data yet its just a cache and needs to utilize locality. The people that talk about streaming of assets don't seem to be aware of these.

I actually have been thinking about this with drumming up some spec speculation for 10th-gen systems and there is one thing I have considered faster decompression bandwidths (that greatly outstrip the capacity limit of the main memory) being potentially useful for: rapid streaming of unique data assets into a space of the VRAM acting as framebuffer processed to the GPU by the bandwidth of the VRAM. But in order to fully take advantage of that, you would need a LOT of unique assets to stream in, easily hitting hundreds of gigabytes if not more than that, and by that point storage capacity will become the bottleneck because it's not like you can leave all that unique data sitting on the Blu Ray, otherwise BR drive access would become the bottleneck instead (or better to say, an additional bottleneck).

The reason I've been thinking about that so much has to do with the argument of diminishing returns; I don't think we've hit that point yet, actually. Yes overall fidelity and image IQ has increased gen-over-gen but one of the biggest advantages a lot of high-quality CG films (or CG-heavy films) have over games is just readily having a crapton of unique high-quality, large-sized assets that can stream in and be processed on farms of systems with lots of RAM. If there is any hope or chance to partially recreate that in a gaming environment it will come from even larger expansion of raw storage bandwidths helping to push decompression bandwidth rates at many multiple times beyond the VRAM capacity so that the VRAM can act as a framebuffer for a steady stream of new data to calculate for a scene by the GPU. But that will also mean a need for greater capacities of the storage device, more powerful decompression hardware, more efficient compression/decompression algorithms and most importantly, some shift in how game assets are created that leverages heavy use of some GPT-style AI programming and asset model generation models (but in an ethical way, so entire human workforces aren't being replaced by AI).

I think once that becomes a reality, is when we'll truly start hitting the point of diminishing returns in terms of graphical fidelity for gaming, which is something I think the 10th-gen systems will be able to accomplish. What we're going to see from the 9th-gen systems barely scratches the surface there IMHO, but it's a start. And that brings it back to your point in a sense: games this upcoming gen won't be able to do the sort of stuff I was just talking about, so there won't really be a design paradigm shift in that way by the industry at large. Therefore as strong as PS5's SSD I/O design is, I don't see any game design concepts that would be possible there that suddenly become impossible on the Series X or Series S. I'd still like to know exactly how the reserved 100 GB block MS referred to before works in practice, a few of you guys like function and iroboto had some pretty good ideas there (and also the idea that part of the system's reserved 2.5 GB GDDR6 for OS is maybe being used as a temp cache and mapping space for SSD data), because you'd think that alone would be a giveaway that Sony aren't the only ones who have designed a SSD solution with more than just game loading times in mind.

On that note yes Microsoft and Sony's approaches are built a bit differently (MS's focuses more on scalability for starters), but in the end they'll do pretty similar things. Sony's might have some advantages here and there but the differences will likely be comparable to the differences we're seeing from both systems right now insofar as certain resolution differences: not too much to really be noticed except by people dedicated to documenting that type of stuff. And IMO the reason why is because neither system have TOO major a setup regarding their SSD I/O that would enable some fundamentally radical game design paradigm shift that's just leagues beyond what we got with the 8th-gen systems, let alone that being the case for one system but not the other.

...That said I DO want to see some more 1P games from Microsoft that tap more into their storage solution capabilities, because I'm really looking forward to some of Sony's games that'll be doing that next year like Ratchet & Clank: Rift Apart.

The Max throughput of the controller as rated by Phison is 3.75GB/s. The max speed of an SSD is determined by the sum of its parts and not by the max rated capability of the controller or the PCIe lane. You have to take into account the rated speed of the NAND as well.

Dunno if this is exactly true all the time because usually max speeds just refer to the rated bandwidths of the NAND modules, their config in parallel and the bandwidth the flash memory controller interfaces to the system over what number of PCIe lanes and their type. At least, I think that's the case.

Also the Phison bandwidth mentioned there, I figure that also includes overhead? If just knocking some off for encoding then it would be closer to 3.9 GB/s, which lines up with what Brit posted here:

According to Western Digital, the Xbox Series X's WD SN530 SSD isn't a stock OEM drive that's limited to PCIe Gen3 x4 performance. Instead, the drive has been outfitted with a special ASIC that enables both PCIe Gen3 x4 and Gen4 x2 performance, which allows for up to 3.938 GB of max throughput. For reference, the Series X targets 2.4GB/sec in uncompressed data transfers.

So maybe the bandwidth you're quoting is the regular version of that controller but MS & WD redesigned it to allow a higher peak range.
 
Back
Top