Xbox Series X [XBSX] [Release November 10 2020]

If a minor tweak to couple cache/register sizes could result in "far more capable SMT" I think we'd have seen that already on the desktop.
Not if it doesn't benefit general use you wouldn't.
I'm saying all those things added together could make a reasonable difference.

Also keep in mind the usual bigging up the platform you happen to be talking about.
 
I can certainly see the scheduler in XBO being better optimised for game code, or simply not having to deal with as many non-gaming based processes but that doesn't really align with the statement "the SMT in the Series X processor is far more capable than usual CPUs" which suggests a hardware difference.

I agree the wording could very well suggest hardware, but maybe he was just being a little loose with the wording given that he was speaking to the Xbox blog? Or maybe some PR filter got involved? I'm not sure. Like you I wouldn't be expecting big changes over Zen 2, but I guess there could be some tweaks to the control logic on the CPU (I'll add more thoughts on that below).

I also can't see the change to the chiplet approach having that much of an impact given it's coupled with a halving of the L3. My understanding is that the move to a monolithic approach for APU's was more down to reducing the power consumption for the mobile market (removing the need for a separate IO die). Also the memory controller has been decoupled from the IF (in Renoir at least) which may increase latency vs Matisse.

Ripping some charts off from Anandtech's Renoir review we can see that while L3 latency within the same CCX is the same as Matisse (bit unexpected given that for AMD smaller has often meant a touch faster), but L3 latency within the same chip is about 25% lower for the monolithic APU.

https://www.anandtech.com/show/1570...k-business-with-the-ryzen-9-4900hs-a-review/2

G14%20Bounce_575px.png


3950x_575px.png

Chiplet to chiplet is worst of all, but that's not really a fair comparison as it only comes into effect beyond 8 cores. That said, it could explain why the 3800X edges out the double chiplet 3900X and 3950X in some games.

Interestingly there's also this comment on the same page of the Anandtech Renoir / 4900H review:

"AMD makes a thread leave the chiplet, even when it’s speaking to another CCX on the same chiplet, because that makes the control logic for the entire CPU a lot easier to handle. This may be improved in future generations, depending on how AMD controls the number of cores inside a chiplet."

Again this is talking about multiple chiplets, but it does highlight that there are probably gains to be had by controlling which threads share a core and a CCX / L3. Maybe there are changes to the control logic that would make sense in a console, that wouldn't be worth it on PC or weren't ready when Matisse launched. So I'm pretty confident that the consoles will have much better control over a thread's core / CCX residency, and I think there may be some tweaks to the control logic to facilitate this. It's just a thought, but it might fit with what this Codemasters fella is saying.

(There's also the thing about main memory access being a little faster on Renoir in some circumstances, though obviously this would be offset to whatever extent by more L3 misses!)
 
Yes, I think so too. What does he mean by 'normal CPUs'? The CPUs most people have in a Windows environment? That's far more likely than significant hardware level changes to Zen 2.

Yeah, I've just been reading this article on TechPowerUp about a Zen 2 custom power plan with gaming enhanced thread scheduling rules. It's actually written by the guy that made the plan.

https://www.techpowerup.com/review/1usmus-custom-power-plan-for-ryzen-3000-zen-2-processors/

As far as I can tell, you'd simplify OS controlled scheduling on console an awful lot by:

- Not juggling threads to hit maximum boost (on Zen 2 different cores and core counts have different boost limits)
- Not bouncing threads around every 40 ms to reduce peak core temperatures
- Not trying to arrange threads so you can shut cores down to save power

As far as I can tell from what that smart chapper has written, Zen 2 controls clock speed and voltage in its hardware, but Windows has control over which SW thread goes on which HW thread (but it doesn't care, same with which CCX you bounce between - Intel doesn't use clusters in this way so MS are probably slow to care on the PC). So maybe the control logic on Zen 2 wouldn't need big changes for XSX .... maybe it wouldn't need any.

So even without developer control MS can probably come up with better for console, and if you have the right profilers and developer control over SW to HW thread mapping you probably should be able to exceed any automatic system. Maybe the most useful thing for MS / Sony to add would actually be low overhead, real time feedback so a game could intelligently decide which HW threads to run new jobs on, or move them over to e.g. is AI the bottleneck? Has it now moved to physics after an explosion with lots of shared data?.
 
hmm interesting. I guess I was wrong about dumping data in mid-frame. Then again, just because you can, doesn't mean you should. But interesting. Curious to see if there is more development into this area whether it's an ideal solution or design paradigm that everyone follows or not.

wtf. lol.

I just don't get it anymore. I tried. I honestly tried. But until someone can hand hold me through this process. His current setup is fast enough to just fly in textures on demand mid-frame. We haven't even gotten to talk about SFS
I am also a bit ??? at the idea of mid frame usage of something from the SSD. Like, how small is that chunk or texture that could realistically be used mid frame? For that matter a 16.6 or 8.3 ms frame for that game (60hz/120hz)?

Tying a mid-frame bit of used data to the slowest and largest latency hardware sub-component?
The streaming strategy of MS is a bit more nuanced than that of Sony. Instead of literally caching data for the next few seconds of gameplay by using a high bandwidth ssd drive with exceptionally low latency, the goal is to literaly target data on a nearly per frame basis. By being very selective about what needs to be streamed, MS can still achieve comparatively similar results. They have optimised their entire pipeline to this end with BCPACK, hardware decompression and a new prefetching method to ensure coherency and cache hits in GPU caches.

Ivan Nevraev, James Stanard, Andrew Goossen and Mark Grossman, all members of Xbox ATG, are the people behind the Xbox Velocity Architecture.
 
Last edited by a moderator:
The streaming strategy of MS is a bit more nuanced than that of Sony. Instead of literally caching data for the next few seconds of gameplay by using a high bandwidth ssd drive with exceptionally low latency, the goal is to literaly target data on a nearly per frame basis. By being very selective about what needs to be streamed, MS can still achieve comparatively similar results. They have optimised their entire pipeline to this end with BCPACK, hardware decompression and a new prefetching method to ensure coherency and cache hits in GPU caches.

Ivan Nevraev, James Stanard, Andrew Goossen and Mark Grossman, all members of Xbox ATG, are the people behind the Xbox Velocity Architecture.

Again, sad to say, the SSDs are just not useable in this fashion. SSD flash memory just doesn't have the latency to target a during frame, or even next frame, response time. You can get extremely tight bounds on what you're streaming, even relative to the camera moving very quickly. But let's that Ratchet and Clank gameplay. Most likely they're pre-loading each shift and area step by step to ensure it's ready for rendering before even the possibility of calling those assets.

It's a lot better than the slow as hell hdds devs had to put up with this gen, you might only need to be one step ahead and don't have to wait that long to take that step. But they're not some drop in substitute for RAM either. Even checking the UE5 demo, you can see the lag from tiles loading on things like the one camera cut, and no doubt there's a bunch of complex predictive streaming and backup LODs/texture tiles going on in the code to hide latency issues as well.
 
https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine, 58m mark. "We pay no attention to cache... it was more productive to make sure the cores are at least trying to do useful work all the time."

I wouldn't expect much CCX topology optimization unless things have progressed since then.

I would. The latency in terms of cycles is somewhere around doubled for main memory access. On top of that, you have twice as many threads per core (potentially).

As the PC has proven with benchmarked examples, ignorance of CCX topology can be a hindrance, and once you've worked out how to keep cores busy being ignorant of cache use/hits is going to hurt you.
 
Again, sad to say, the SSDs are just not useable in this fashion. SSD flash memory just doesn't have the latency to target a during frame, or even next frame, response time. You can get extremely tight bounds on what you're streaming, even relative to the camera moving very quickly. But let's that Ratchet and Clank gameplay. Most likely they're pre-loading each shift and area step by step to ensure it's ready for rendering before even the possibility of calling those assets.

It's a lot better than the slow as hell hdds devs had to put up with this gen, you might only need to be one step ahead and don't have to wait that long to take that step. But they're not some drop in substitute for RAM either. Even checking the UE5 demo, you can see the lag from tiles loading on things like the one camera cut, and no doubt there's a bunch of complex predictive streaming and backup LODs/texture tiles going on in the code to hide latency issues as well.
And yet, a developer just told you the exact opposite. Expect more surprises.
 
Again, sad to say, the SSDs are just not useable in this fashion. SSD flash memory just doesn't have the latency to target a during frame, or even next frame, response time. You can get extremely tight bounds on what you're streaming, even relative to the camera moving very quickly. But let's that Ratchet and Clank gameplay. Most likely they're pre-loading each shift and area step by step to ensure it's ready for rendering before even the possibility of calling those assets.

It's a lot better than the slow as hell hdds devs had to put up with this gen, you might only need to be one step ahead and don't have to wait that long to take that step. But they're not some drop in substitute for RAM either. Even checking the UE5 demo, you can see the lag from tiles loading on things like the one camera cut, and no doubt there's a bunch of complex predictive streaming and backup LODs/texture tiles going on in the code to hide latency issues as well.
Yeah the R and C demo showed it off well, there is a loading screen essentially between worlds where it streams in things as well (after the portal is entered and the main character spins around there). in addition to that, there is the the preloading each shift I presume since they know the path the player is taking on the levels shown off, which were indeed "on rail".

It was interesting to see performance getting stuttery as the portals were being neared, entered and exited.
 
And yet, a developer just told you the exact opposite.
Bit of context - it's an interview on XBox.com about the next XBox, so it might involve a large degree of PR fluff. eg...

"the SMT in the Series X processor is far more capable than usual CPUs."

Really? Not just slightly better, after decades of AMD work in CPUs culminating in Zen2 getting so far with SMT, but far more capable now that MS have stepped in and told AMD where they're going wrong and how to design SMT properly?

The whole thing reads like a PR infomercial. It's a butt-load of one-liner marketing remarks scripted to highlight all the USPs of XBSX. The text also isn't the same as the real interview text. I've been listening to the interview to find the actual talk about SSD,

Given the biased narration, we have to take appreciate this isn't a technical talk for engineers. That's not saying the narrator is inherently platform biased, but given the marketing of his game and positive relationships with MS, we can't expect choice of words to be completely honest such that, if faced with a loaded question like, "what does the SSD bring?" the guy can't reply, "not a lot," if that's the case - basically he's required to respond to every question like, "it's faster and better than everything out there." Case in point, Lary Hyrb asks what does he like about XBSX. Springate says, "everything, it's all good." Hyrb now hits him with their selling points to get some sound bites - "what about ray-tracing?" Springate's face was very interesting when presented with that RT question; big smile and a long, "yeah...." He then says they're looking into it and can't commit to everything, but that's not in the interview text.

Hang on. To everyone in this discussion - has anyone watched the YT interview? Springate never says "load data mid frame." The text is a fabrication.

I started this post commenting on the text, and I've had the interview playing in the background, and realised that the text is basically PR crap. It's not at all what Springate says. The only mention of the SSD is the additional hardware compression allowing faster transfer rates than the drives rated performance. He never says the CPU SMT is better than other CPUs.

As far as I'm concerned now, XBox.com is a completely unreliable source and not worth quoting. I Let's stick with DF etc.

Transcript of real words: From 9:40

H: I think it's important to point out...with Dirt 5, I mean, there is a, you've got this amazing CPU, cos you, and you kinda alluded to it earlier, with next-gen it's not just about fast read/write speed which of course Xbox Series X had, but its the entire, as the engineers would say, it's the pipeline, right. It's making sure the CPU and all of these things are there to really give this next gen experience. So it's really important to look at all pieces of the puzzle. Would you say that that's accurate?

S: Oh completely. And even then, it's not as simple just as the NVMe drive in XBox is fast - and oh boy is it fast - what am I going to do with the other pieces of the puzzle as well. So we can have data using hardware decompression so we compress our data like textures and things like that while we're building the game at the office and writing that to your disk. And that means we can decompress it in hardware when loaded basically for free. That's means I can get even faster throughput than what the NVMe drive can deliver. So, it's a beast! I've got no other way of putting it. Its really, really fast. Um, but yeah, that means that we're able to just deliver better and more beautiful experiences than we could before. And that excites me. I thinking, how, now that I have all these tools laid out on the table, how do I make use of all of these features to deliver an amazing video game, an amazing experience for the gamers.
 
Last edited:
Hang on. To everyone in this discussion - has anyone watched the YT interview? Springate never says "load data mid frame." The text is a fabrication.

I started this post commenting on the text, and I've had the interview playing in the background, and realised that the text is basically PR crap. It's not at all what Springate says. The only mention of the SSD is the additional hardware compression allowing faster transfer rates than the drives rated performance. He never says the CPU SMT is better than other CPUs.

As far as I'm concerned now, XBox.com is a completely unreliable source and not worth quoting. I Let's stick with DF etc.

2 different interviews.
 
Even if we accept that (there's less in the video than the text, so it's not offering a deeper dive...), it's still not a reliable reference for a technical discussion. The piece doesn't match at all what we see in Springate's responses to similar questions, with his responses being less one-sided (as you'd expect for an engineer not trying to sell a console ;)) - eg. 120Hz is a target but a difficult one to work towards and they'll possibly leverage adaptive framerates, versus the article's "Dirt5 runs at 4K120 " presentation.

When asked by Hyrb about the SSD, Springate's only remark was the high bandwidth assisted by compression. If mid-frame load was something he was doing now, he'd have shared that. He's not using mid-frame data load in his current project and the article states its something to try in the future, for which we have no idea what actual words Springate used to describe that possibility and how realistically usable it is.

Sticking to a technical discussion on the data we do have, if we consider an optimal 5 GB/s transfer speed, 1 ms is about 5 MB of data. At 60 fps, you could swap in somewhat less than 90 MB of data. In a 16 GB machine, that's really not a lot. By caching that data you have guaranteed low-latency access when you need it. Trying to develop a system that can fetch 90 MB (or less) mid frame just seem like unnecessary work to me. If RAM isn't that tight, why go to all that effort? Prefetching a few frames in advance makes a lot more sense.

I guess also with asynchronous load, you could be loading texture data LODs and update same data mid-frame, so the newer LOD is available to draw this frame without having to wait to the next one. But I doubt you could design for a game using the SSD as virtual RAM because it plain isn't fast enough.
 
Last edited:
Sticking to a technical discussion on the data we do have, if we consider an optimal 5 GB/s transfer speed, 1 ms is about 5 MB of data. At 60 fps, you could swap in somewhat less than 90 MB of data. In a 16 GB machine, that's really not a lot. By caching that data you have guaranteed low-latency access when you need it. Trying to develop a system that can fetch 90 MB (or less) mid frame just seem like unnecessary work to me. If RAM isn't that tight, why go to all that effort? Prefetching a few frames in advance makes a lot more sense.
Speaking generally, I think it depends on the MiP level here however. If you're asking for a MIP0 texture directly, that's a big pull. But if your'e out there several MIPs out, you might be able to pull in quite a large number of textures. Which is generally why going forward is really good for SVT, but strafing is really terrible
 
You could indeed pull in lots of textures. However, the point is why bother when those same textures occupy such a small amount of RAM? Why not just preload those tiles some frames in advance in a 90 MB cache? Wanting to stream data mid frame only makes sense if RAM is really tight and you need another 50 MBs from somewhere, so you can swap out 50 MBs of texture data every single frame. It also is a requirement that goes against virtualised assets where the RAM footprint is minimised to what's actually drawn on screen.
 
You could indeed pull in lots of textures. However, the point is why bother when those same textures occupy such a small amount of RAM? Why not just preload those tiles some frames in advance in a 90 MB cache? Wanting to stream data mid frame only makes sense if RAM is really tight and you need another 50 MBs from somewhere, so you can swap out 50 MBs of texture data every single frame. It also is a requirement that goes against virtualised assets where the RAM footprint is minimised to what's actually drawn on screen.
I agree. Having thought about this a lot more, my answer now would be along the lines of 'it depends'.

If you are doing SVT, I would agree with your answer.

If you are doing something closer to runtime procedural, or some sort of decal system. Where you are layering textures ontop of another to generate new textures, then the SSD affords you a split second worth of dynamic texturing if you need to call a texture that you didn't have in memory. So imagine it opening up options for texturing damage on people etc. So you stream in the assets you need, but during runtime something happens and you need a texture there, either you don't include it, or with this faster retrieval system, you could just stream it in mid frame, and if you need it for each progressive frame following, hold it in memory, but if you only need it for say a couple of frames for an effect or something, mid-frame stream it and then it's gone. Particle effects could work in this manner as well. Anything that may need a blip of time say 1-3 seconds and then gone.
 
Back
Top