Xbox Series X [XBSX] [Release November 10 2020]

I am also a bit ??? at the idea of mid frame usage of something from the SSD. Like, how small is that chunk or texture that could realistically be used mid frame? For that matter a 16.6 or 8.3 ms frame for that game (60hz/120hz)?

Tying a mid-frame bit of used data to the slowest and largest latency hardware sub-component?

Isn't 80MB per frame quite a lot though?

Even, just for the sake of argument, assuming a relatively meagre 40MB per frame is dedicated to streaming textures, that's surely enough for damage levels to be streamed in per frame whilst cars are jostling?
 
Plus in a game where the camera is moving quickly I can image there being texture LODs that will only be needed for - potentially - only one frame.

For example unique track detail at the highest LOD for just in front of the car, or a stage marker flag, or maybe even some kind of animated texture. You know you'll need it that frame and not the next, so why not load it, use it, and dump it ready for whatever you'll need next?
 
Zen2 does seem to perform better than Intel offerings in multithreading relative to it's single threaded performance in PC benchmarks. I assume that's what he's referring to although it seems quite a strange way to frame it. It's not as if the XSX is using a custom CPU that handles SMT differently to it's desktop counterpart, not that we've been made aware of anyway.

AMD's SMT is more efficient than Intel's HT.

In reality, the CPU's in consoles have been terrible in general so anything modern probably feels like a dream. Keep in mind what they're coming from.
 
It's also possible that the XSX scheduler is a lot better suited for games. In Windows, threads hop around a lot for power and thermal reasons, and developers have little control over what runs where.

With two CCXs, having direct control over what thread runs on which core and on which CCX is likely to lead to better results and increased output. Reduced inter-CCX latency over chiplet based processors probably helps somewhat too.
 
@iroboto virtual texture streaming, extra compression on disk, low latency access to disk, low latency decompression to RAM, probably sampler feedback to help reduce the load on memory consumption.
Yea. But when you do all your updates and render updates. You only have so much time to dump your textures in and begin rendering. It’s like trying to render without textures at all arriving. I guess it’s doable. But hard to believe they can do a mid frame dump nearly twice. There’s just so little time for that to happen; SSD is nowhere near as fast as ram.
I was skeptical that PS5 could do it with 5.5GB/s. But they had done so much on the whole stack, I was okay with it. Finding out XSX can also do it just added confusion. At this point in time I don’t know what’s going on.
 
If Sony/MS took the opportunity to delay a year, citing production delays due to Covid they could have Zen 3 without inter-CCX latency. Not going to happen though.
 
Last edited:
Is it possible or sensible to layer alpha textures to represent damage?

So, for example, the car bonnet has its standard texture (which is never swapped out) and at varying levels of destruction, a new alpha texture is streamed from the SSD.

Would that be cheaper in terms of bandwidth?
 
Is it possible or sensible to layer alpha textures to represent damage?

So, for example, the car bonnet has its standard texture (which is never swapped out) and at varying levels of destruction, a new alpha texture is streamed from the SSD.

Would that be cheaper in terms of bandwidth?
its not about the bandwidth, but as Alex said, the reliance on the slowest part of your system to do rendering is just ???. Imagine every single frame having to keep doing that mid-frame texture injection repeatedly.

I mean, it might be a 'wow' I'm messing around with SSD and this is doable. but I'd be surprised if I see someone attempt to do this in the game. If the latter, more details need to be explained. It just doesn't make a lot of sense, sounds like the textures are going from SSD directly into the registers bypassing memory.
 
Yea. But when you do all your updates and render updates. You only have so much time to dump your textures in and begin rendering. It’s like trying to render without textures at all arriving. I guess it’s doable. But hard to believe they can do a mid frame dump nearly twice. There’s just so little time for that to happen; SSD is nowhere near as fast as ram.
I was skeptical that PS5 could do it with 5.5GB/s. But they had done so much on the whole stack, I was okay with it. Finding out XSX can also do it just added confusion. At this point in time I don’t know what’s going on.

Well, Mark Cerny said the PS5 could do it, then Epic said both consoles can do it with UE5 including geometry, and now the Dirt devs are saying they can do it. Remember Xbox has hardware for filtering between mip levels. Their console is designed to stream in low detail mips first and then filter to the high detail if it arrives late.
 
Well, Mark Cerny said the PS5 could do it, then Epic said both consoles can do it with UE5 including geometry, and now the Dirt devs are saying they can do it. Remember Xbox has hardware for filtering between mip levels. Their console is designed to stream in low detail mips first and then filter to the high detail if it arrives late.
right, but that's only if SFS is being used, which wasn't confirmed either. His example was that mid frame he brought in a texture, consumed it, and replaced it with another texture. That pretty much ensures that first texture isn't in resident memory, so next frame he'll have to do that again, and now the slowest part of your system is now directly tied to rendering speed.
 
right, but that's only if SFS is being used, which wasn't confirmed either. His example was that mid frame he brought in a texture, consumed it, and replaced it with another texture. That pretty much ensures that first texture isn't in resident memory, so next frame he'll have to do that again, and now the slowest part of your system is now directly tied to rendering speed.

The exact quote is that he can load "data" in the middle of a frame, consume it and then replace it. I'm not sure if he means textures specifically. He follows that with "How much texture data can I load now?" or something like that. I'm not sure that he's talking about fully unloading all texture data and reloading it each frame. But the idea that they can stream in some texture data per frame, to be consumed immediately, is real, I think.

Also, you can do texture streaming without sampler feedback. My understanding of sampler feedback is it's costly, so you don't query for sampler feedback on every texture sample. You query some random distribution of samples to make better decisions of what texture data you're going to evict and stream. It'll help save bandwidth and memory, but it's not a requirement for streaming textures. I still don't have a full grasp of it, because I've never gotten into that level of detail on virtual texture streaming.
 
The exact quote is that he can load "data" in the middle of a frame, consume it and then replace it. I'm not sure if he means textures specifically. He follows that with "How much texture data can I load now?" or something like that. I'm not sure that he's talking about fully unloading all texture data and reloading it each frame. But the idea that they can stream in some texture data per frame, to be consumed immediately, is real, I think.

Also, you can do texture streaming without sampler feedback. My understanding of sampler feedback is it's costly, so you don't query for sampler feedback on every texture sample. You query some random distribution of samples to make better decisions of what texture data you're going to evict and stream. It'll help save bandwidth and memory, but it's not a requirement for streaming textures. I still don't have a full grasp of it, because I've never gotten into that level of detail on virtual texture streaming.
yea I was expecting most VT systems to not include SFS (big re-write very early in the generation). Under the assumption SFS might help improve things, not actually slow them down. But interesting none the less.

you might be right that it's just data, perhaps updating a bunch of objects on the SSD mid frame using a pointer that is cycling through an array of stuff that is on the SSD.
 
Imagine every single frame having to keep doing that mid-frame texture injection repeatedly.

That's why I'm imagining something like damage. You know, something aesthetic, where it doesn't really matter if the engine faces certain circumstances under which it has to skip a few frames.

If, in a 60fps game, there are so many cars on screen at once, getting damaged, that it takes 10 frames to stream in a new damage texture, no-one's going to notice or complain. But if the game and its engine are designed to be able to cope with the worst case scenarios, I'd expect there to be ideal situations in which mid frame texture streaming becomes possible. Even if it's not all that frequent.

For the sake of easy maths, let's say that 40MB budget (pulled out of my arse, as a straightforward half of the SSD's bandwidth) I mentioned previously is able to cope with the streaming requirements of 10 cars all slamming into each other at once. So an average of 4MB, per car, per frame.

Doesn't it stand to reason that in a best case scenario of only two jostling cars, they then each have a budget of 20MB per car, per frame, or 10MB per car, per half frame?
 
That's why I'm imagining something like damage. You know, something aesthetic, where it doesn't really matter if the engine faces certain circumstances under which it has to skip a few frames.

If, in a 60fps game, there are so many cars on screen at once, getting damaged, that it takes 10 frames to stream in a new damage texture, no-one's going to notice or complain. But if the game and its engine are designed to be able to cope with the worst case scenarios, I'd expect there to be ideal situations in which mid frame texture streaming becomes possible. Even if it's not all that frequent.

For the sake of easy maths, let's say that 40MB budget (pulled out of my arse, as a straightforward half of the SSD's bandwidth) I mentioned previously is able to cope with the streaming requirements of 10 cars all slamming into each other at once. So an average of 4MB, per car, per frame.

Doesn't it stand to reason that in a best case scenario of only two jostling cars, they then each have a budget of 20MB per car, per frame, or 10MB per car, per half frame?
I mean, I guess I would ask why not just leave it in memory? Unless you ran out of memory entirely, then you're going to need to start relying on the SSD to stretch the limits of your VRAM.
 
That sort of on-demand streaming requires your drawing to be highly optimised. You want to know the order you'll draw things and have the texture data present. It's quite a curious situation. I guess from the off, you could ditch the UI and bring those graphics in at the last step. You could also trust the road material to be rendered before the cars.
 
yea I was expecting most VT systems to not include SFS (big re-write very early in the generation). Under the assumption SFS might help improve things, not actually slow them down. But interesting none the less.

you might be right that it's just data, perhaps updating a bunch of objects on the SSD mid frame using a pointer that is cycling through an array of stuff that is on the SSD.

The way it works with d3d12 ultimate is you query for sampler feedback. The IHV provides sampler feedback in some vendor specific format that is decoded into one of two types of feedback maps about mip selection. I think the recommendation is writing sampler feedback on roughly 1% of texture requests for effective performance.
 
The way it works with d3d12 ultimate is you query for sampler feedback. The IHV provides sampler feedback in some vendor specific format that is decoded into one of two types of feedback maps about mip selection. I think the recommendation is writing sampler feedback on roughly 1% of texture requests for effective performance.
got me interested. started following this video

 
Zen2 does seem to perform better than Intel offerings in multithreading relative to it's single threaded performance in PC benchmarks. I assume that's what he's referring to although it seems quite a strange way to frame it. It's not as if the XSX is using a custom CPU that handles SMT differently to it's desktop counterpart, not that we've been made aware of anyway.
Could be the way the OS is restricted to specific cores and threads so be a lot better than on pc?
Add minor tweak to couple cache/regisiter sizes like 1X and overall effect could be lot better than equivalent PC cpu

Easily beaten with a better explanation by @function
 
Last edited:
Zen2 does seem to perform better than Intel offerings in multithreading relative to it's single threaded performance in PC benchmarks. I assume that's what he's referring to although it seems quite a strange way to frame it. It's not as if the XSX is using a custom CPU that handles SMT differently to it's desktop counterpart, not that we've been made aware of anyway.
Actually, that's not quite true for gaming. RPSC3 the PS3 emulator recommends a 8700k minimum over Ryzen because the CCX latency has a significant effect on performance. Maybe it can be mitigated in a closed box environment like a console though.
 
Last edited:
It's also possible that the XSX scheduler is a lot better suited for games. In Windows, threads hop around a lot for power and thermal reasons, and developers have little control over what runs where.

With two CCXs, having direct control over what thread runs on which core and on which CCX is likely to lead to better results and increased output. Reduced inter-CCX latency over chiplet based processors probably helps somewhat too.

I can certainly see the scheduler in XBO being better optimised for game code, or simply not having to deal with as many non-gaming based processes but that doesn't really align with the statement "the SMT in the Series X processor is far more capable than usual CPUs" which suggests a hardware difference.

I also can't see the change to the chiplet approach having that much of an impact given it's coupled with a halving of the L3. My understanding is that the move to a monolithic approach for APU's was more down to reducing the power consumption for the mobile market (removing the need for a separate IO die). Also the memory controller has been decoupled from the IF (in Renoir at least) which may increase latency vs Matisse.

Could be the way the OS is restricted to specific cores and threads so be a lot better than on pc?
Add minor tweak to couple cache/regisiter sizes like 1X and overall effect could be lot better than equivalent PC cpu

Easily beaten with a better explanation by @function

If a minor tweak to couple cache/register sizes could result in "far more capable SMT" I think we'd have seen that already on the desktop.

I suspect he's simply comparing to non-Zen2 based processors in the PC space. He says "usual CPU's" so that probably means Intel quad cores.
 
Back
Top