Xbox Series X [XBSX] [Release November 10 2020]

here seems to be a lot of misconceptions about the xbox velocity architecture. The goal of the PS5's and the Series X's I/O implementation is to increase the complexity of the content presented on screen without a corresponding increase in load times/memory footprint but go about it in totally different ways. Since the end of the cartridge era, an increase in geometry/texture complexity was usually accompanied by an increase in load times. This was because while RAM bandwidth might be adequate, the thoughput of the link feeding the RAM from the HDD was not. Hence, HDDs and the associated I/O architecture was the bottleneck.
One way to address this issue was to "cache" as much as possible in the RAM so as to get around the aforementioned bottleneck. However, this solution comes with its own problem in that the memory footprint just kept ballooning ("MOAR RAM"). This is brilliantly explained by Mark Cerny in his GDC presentation with the 30s of gameplay paradigm. Playstations answer to this problem is to increase the throughput to the RAM in an unprecedented way. Thus, instead of caching for the next 30s of gameplay, you might only need to cache for only the next 1s of gameplay which results in a drastic reduction in memory print. Indeed, the point of it all is that for a system with the old HDD architecture to have the same jump in texture and geometry complexity, either the amount of RAM needed for caching will have to be exorbitant or frametime will have to be increased to allow enough time for the texture to stream in (low framerates) or gameplay design will have to be changed to allow for texture loading (long load times). The PS5 supposedly will achieve all of this with none of those drawbacks thanks to alleviating the bottleneck between persistent memory and RAM (the bottleneck still exists because RAM is still quicker than the SSD but it is good enough for the PS5 rendering capacity and hence doesn't matter anyway. You just don't load textures from SSD to the screen.)

We can now see why the throughput from the SSD to RAM has now become the one-and-only metric for judging the I/O capability of next-gen systems in the mind of gamers. After all, it does make perfect sense. BUT...is there an alternative way of doing things? Microsoft's went in a completely different direction. Is the Persistent memory to RAM throughput still the bottleneck? Yes! Why is more throughput needed? To stream more textures evidently. The defining question is then how much of it is actually needed? After careful research by assessing how games actually utilised textures on a per frame basis, MS seems to have come to a surprising answer: not that much actually.

Indeed, by loading higher detailed MIPs than necessary and keeping the Main memory - RAM throughput constant, load times/memory footprint is increased. Lets quote Andrew Goosen in the Eurogamer deep-dive for reference:

"We observed that typically, only a small percentage of memory loaded by games was ever accessed," reveals Goossen. "This wastage comes principally from the textures. Textures are universally the biggest consumers of memory for games. However, only a fraction of the memory for each texture is typically accessed by the GPU during the scene. For example, the largest mip of a 4K texture is eight megabytes and often more, but typically only a small portion of that mip is visible in the scene and so only that small portion really needs to be read by the GPU."

The upshot of it all is that by knowing what MIP levels are actually needed on a per-frame basis and loading only that, the amount needed to be streamed is radically reduced and so is the throughput requirement of the SSD-RAM link as well as the RAM footprint. Can this Just-in time streaming solution be implement ed via software? MS indeed acknowledges that it is possible to do so but concedes that it is very inaccurate and requires changes to shader/application code. The hardware implementation of determining residency maps associated with partially resident textures is sampler feedback.

While sampler feedback is great, it is not sampler feedback streaming. You now need hardware implementation for :

(1) transition from a lower MIP-level to a higher one seamlessly
(2) fallback to a lower MIP-level if the requested one is not yet resident in memory and to blend back to the higher one when it comes available after a few frames.

Microsoft claims to have devised a hardware implementation for doing just that. This is the so-called "texture filters" described by James Stanard. Do we have more information about Microsoft's implementation? Of course we do. SFS is patented hardware technologgy and is described in patent US10388058B2 titled
"Texture residency hardware enhancements for graphics processors" with co-inventors being Mark S Grossman and....Andrew Goosen.

Combined with Directstorage (presuambly a new API that revamps the file system but information about it is sparse) and the constant high throughput of the SSD, this is how Microsoft claims to achieve 2x-3x increase in efficiency. Hence, the "brute force" meme about the series X is wildly off-base.

As for which of the PS5 or Series X I/O system is better? I say let the DF face-offs begin.
 
Good post and you highlight something I didn't appreciate in MS's solution.

Your description of PS5's solution is an oversimplification. Although the system is fast enough to force-feed data, that's no it's only usable implenetation, as seen in Epic Virtual Geometry (and texturing).

After careful research by assessing how games actually utilised textures on a per frame basis, MS seems to have come to a surprising answer: not that much actually.
We've know this since the PS360 era and the birth of John Carmack's Megatexturing. Sebbbi, creator of Trials HD, was using something like 7 MB/s of textures for 720p with his virtual texture solution.

The upshot of it all is that by knowing what MIP levels are actually needed on a per-frame basis and loading only that, the amount needed to be streamed is radically reduced and so is the throughput requirement of the SSD-RAM link as well as the RAM footprint.
That's an interesting solution. LOD1 will be quarter the resolution and data throughput. Distant objects will require very little texture detail, such that in a platform loading raw textures, one at full res and one at LOD1, the latter will require 1/4 the same BW for the same detail.

Combined with Directstorage (presuambly a new API that revamps the file system but information about it is sparse) and the constant high throughput of the SSD, this is how Microsoft claims to achieve 2x-3x increase in efficiency.
Will it play nice with virtual assets? As that's probably the way things are headed now low-latency access is standard. Both these systems are rooted in solving current-engine bottlenecks, but both need to extend to future solutions, and those future solutions will be cross-platform.
 
Good post and you highlight something I didn't appreciate in MS's solution.

Your description of PS5's solution is an oversimplification. Although the system is fast enough to force-feed data, that's no it's only usable implenetation, as seen in Epic Virtual Geometry (and texturing).

We've know this since the PS360 era and the birth of John Carmack's Megatexturing. Sebbbi, creator of Trials HD, was using something like 7 MB/s of textures for 720p with his virtual texture solution.

That's an interesting solution. LOD1 will be quarter the resolution and data throughput. Distant objects will require very little texture detail, such that in a platform loading raw textures, one at full res and one at LOD1, the latter will require 1/4 the same BW for the same detail.

Will it play nice with virtual assets? As that's probably the way things are headed now low-latency access is standard. Both these systems are rooted in solving current-engine bottlenecks, but both need to extend to future solutions, and those future solutions will be cross-platform.

I will quote your own thoughts on the matter as response (from the UE5 thread):

"The moment the data is arranged this way, we can see how virtualised textures would also apply conceptually to the geometry in a 2D array, along with how compression can change from having to crunch 3D data. You don't need to load the whole texture to show the model, but only the pieces of it that are viewable, which is the same problem as picking which texture tiles with virtual texturing.

Very clever stuff."

The Unreal Engine team has a devised a software solution for a problem that Microsoft has resolved in hardware.
 
Actually it's 2.4 GB/s and 4.8 GB/s with compression. Also I'm pretty sure those are just the speeds MS actually guarantees you'll get all the time, I'd be surprised if they actually didn't allow the SSD to push further when thermals allow it, the controller goes up to 3.75GB/s both directions

@MrFox had some interesting thoughts on this iirc. Basically there were speed bins of flash (flash certified for a particular speed presumably with a known max power consumption) that fit almost exactly with what MS were offering.

So while the controller can go faster, it may be paired with a more affordable speed bin on the flash itself, and it could also be that MS's cooling for the flash and the controller is based around this arrangement.

With a power saving shrink on the controller and cooler flash it does seem that MS could potentially provide a boost on their SSD capability at some point in the future though, because as you say the controller is rated for ~50% greater speeds. Perhaps any benefits from this would be reduced by the capabilities of the hardware decompression blocks though... /shrug
 
The Unreal Engine team has a devised a software solution for a problem that Microsoft has resolved in hardware.
MS has a partial solution that's different. At first I thought it was virtual texturing in hardware, but your post describes it as selective MIP-loading which is different. Virtual texturing allows a small fraction of a whole texture, maybe a 64th, to require loading in order to draw the part of the texture onto the visible geometry. MS's solution loads the entire texture but of a smaller MIP level if the largest isn't needed (most of the time), and has the whole texture in RAM to draw the small part of the texture that's actually needed, but it doesn't load the texture in its full resolution and uses a lower MIP level.

With a 4096x4096 texture of a building at moderate range where a small part of it is visible, virtual texturing may load a 128x128 tile whereas MS's solution would load the 1024x1024 MIP-level and draw part of that texture.

It's a solution that sits halfway between loading the whole texture and virtual texturing loading a single tile within the texture. I also would guess it doubles the texture sizes on disk as you'd need the MIP texture to be prebaked whereas typically I think these are derived from the source texture when loaded and kept in RAM.
 
The Unreal Engine team has a devised a software solution for a problem that Microsoft has resolved in hardware.
hm... the mip blending sounds similar to what they did in UE3 for Gears of War 2 - or at least, you could see the lower res mips blend into the higher res assets over time.

Guess that's a good a reason as any to start experimenting since there was the HDD-less SKU to worry about (and just how UE manages texture memory in general. o_O)
 
Last edited:
MS has a partial solution that's different. At first I thought it was virtual texturing in hardware, but your post describes it as selective MIP-loading which is different. Virtual texturing allows a small fraction of a whole texture, maybe a 64th, to require loading in order to draw the part of the texture onto the visible geometry. MS's solution loads the entire texture but of a smaller MIP level if the largest isn't needed (most of the time), and has the whole texture in RAM to draw the small part of the texture that's actually needed, but it doesn't load the texture in its full resolution and uses a lower MIP level.

With a 4096x4096 texture of a building at moderate range where a small part of it is visible, virtual texturing may load a 128x128 tile whereas MS's solution would load the 1024x1024 MIP-level and draw part of that texture.

It's a solution that sits halfway between loading the whole texture and virtual texturing loading a single tile within the texture. I also would guess it doubles the texture sizes on disk as you'd need the MIP texture to be prebaked whereas typically I think these are derived from the source texture when loaded and kept in RAM.

You should not be too hung up about my example with MIP-levels. It captures only one problem of texture streaming: LOD change concomitant with camera movement. But sampler feedback in truth answers two questions:
(1) What MIP level was utimately sampled (LOD problem). What MIP level to load next
(2) Where exactly in the resource was it sampled (which tiles was sampled). This is based on what's visible to the camera. Basically what MIP to load next.

SFS is the streaming of only visible assets at the correct level of details. So yeah, software implementation of a solution already found in hardware.
 
If they can perform the virtualisation in hardware, that is indeed an impressive piece of design. Although only really of value when games don't support texture virtualisation in engine.
 
Everything remains to be seen. ;) But if games are already virtualising their texture streams, how could MS's system help?
 
Last edited:
If they can perform the virtualisation in hardware, that is indeed an impressive piece of design. Although only really of value when games don't support texture virtualisation in engine.

It's also not exactly an impressive piece of design. It's just that the technology to make what once was opaque to the developer (texture sampling) transparent is now available. I guess that MS being an API developer had a leg up on this particular technology and just anticipated its use cases.
 
Everything remains to be seen. ;) But if games are already virtualising their texture streams, how could MS's system help?

As for as I can tell from MS's notes, using sampler feedback its easier to know exactly what you need, don't need, and probably will need. It's an improvement on existing PRTs and tiled resources.

https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html

Independently of whether sampler feedback is available, Direct3D12-based applications have, through tiled resources, the ability to progressively load the mip chain. There’s also the status bit available through any sample which can be plugged in to CheckAccessFullyMapped so that the app can detect tile residency, and opportunities for streaming in more texture data.

However, the strategy of tiled resources + CheckAccessFullyMapped or other residency determination mechanisms, to detect and load non-resident tiles on demand has room for improvement. That’s where sampler feedback comes in. Sampler feedback streamlines the process of writing out “ideal mip levels”, since the detection of the “ideal mip level” and writing-it-out can be done all in one step, allowing the driver to optimize this process.

There's a rather more easily digestible breakdown here:

https://devblogs.microsoft.com/dire...edback-some-useful-once-hidden-data-unlocked/

Perhaps you use a texture streaming system; perhaps it uses tiled resources to keep those gigantic 4K mip 0s non-resident if you don’t need them. Anyway, you have a shader which samples a mipped texture using A Very Complicated sampling pattern. Pick your favorite one, say anisotropic.

The sampling in this shader has you asking some questions.

What mip level did it ultimately sample? Seems like a very basic question. In a world before Sampler Feedback there’s no easy way to know. You could cobble together a heuristic. You can get to thinking about the sampling pattern, and make some educated guesses. But 1) You don’t have time for that, and 2) there’s no way it’d be 100% reliable.

Where exactly in the resource did it sample? More specifically, what you really need to know is— which tiles? Could be in the top left corner, or right in the middle of the texture. Your streaming system would really benefit from this so that you’d know which mips to load up next. Yeah while you could always use HLSL CheckAccessFullyMapped to determine yes/no did-a-sample-try-to-get-at-something-nonresident, it’s definitely not the right tool for the job.

More details at the link, but suffice to say it could offer some large savings in memory and transfers over current tiled resources implementations. It's the complete opposite of the idea that MS are looking for "brute force" solutions because they lack finesse.

Of course, PS5 might have something like this too. Only SFS is something that we know for sure is a MS bespoke piece of hardware (to hide misses), with sampler feedback being part of DX12 ultimate and coming to PC RDNA2 hardware later this year. Then again, they might not. We don't know yet!
 
^^^^An example of SFS Posting? :oops:

:unsure:

awesome-facial-expressions-change-while-acting.gif
 
Back
Top