Xbox Series S [XBSS] [Release November 10 2020]

that's what puzzles me. 224GB/s for the GPU is the entire bandwidth of a PS4 Pro. The CPU has 56GB/s for it alone, which could be enough to perform loads of drawcalls -at 60fps at least-. XoX has a total bandwidth of 326GB/s, minus 56GB/s, this Series S is not too far, optimisations aside :?:
It’s a 1080p machine? It doesn’t need all the bandwidth and memory size to hold 4K textures because that isn’t the targeted resolution. But it Still needs bandwidth for compute and ray tracing etc to stay modern.
 
that's what puzzles me. 224GB/s for the GPU is the entire bandwidth of a PS4 Pro. The CPU has 56GB/s for it alone, which could be enough to perform loads of drawcalls -at 60fps at least-. XoX has a total bandwidth of 326GB/s, minus 56GB/s, this Series S is not too far, optimisations aside :?:

You're speaking as if the memory bus is segregated in two different pools between the GPU and CPU here, I don't think that's the case - it's still an APU. You can't 'add' the 224 + 56gb of the Series S against the total 326gb of the S. It means best-case, the Series S will have 224 gb/sec but a portion of it could run far slower if it's actually needed (the OS will likely be residing in that 2GB so unlikely devs will touch it), and the One X has 326GB/sec for all of its memory.
 
Last edited:
I'm not saying it's going to run BC games in One X mode but I think people are focusing to literally on paper specs. If it is more performant than One X and there is a certain degree of virtualization with regards to BC then it's possible.

I definitely expected Series S to be turning in real world better performance than One X but looking at the bandwidth numbers I'm not quite as certain as I was. There are still a boatload of architectural improvements that I feel will carry it but the BC issue is interesting.

First, let's be precise; the Xbox One does not have backwards compatibility with original Xbox and Xbox 360 games, Microsoft have ported code on a game-by-game basis. So I'm wondering if Microsoft have been re-optimising all those old games for the new consoles or if it'll just be running the existing Xbox One build, which isn't optimised for the hardware but should run better because the hardware is more efficient. Which begs the question, how much better could BC be on nextgen Xbox if Microsoft did revisit BC titles and optimise them!

Red Dead Redemption at 1440p in 120fps anybody? :runaway:
 
You're speaking as if the memory bus is segregated in two different pools between the GPU and CPU here, I don't think that's the case - it's still an APU. You can't 'add' the 224 + 56gb of the Series S against the total 326gb of the S. It means best-case, the Series S will have 224 gb/sec but a portion of it could run far slower if it's actually needed (the OS will likely be residing in that 2GB so unlikely devs will touch it), and the One X has 326GB/sec for all of its memory.
did I understand it wrong then? Sigh.... I don't get it. If you have 10GB, of which 2GB have a unique bandwidth for them of 56GB/s it shouldnt' touch the 224GB/s of the other 8GB/s, right? I thought that was a key for the extra performance of the Series X.

It’s a 1080p machine? It doesn’t need all the bandwidth and memory size to hold 4K textures because that isn’t the targeted resolution. But it Still needs bandwidth for compute and ray tracing etc to stay modern.
if so, then hopefully AMD is going to have a DLSS like solution, 1080p looks blurry on 1440p+ screens
 
I definitely expected Series S to be turning in real world better performance than One X but looking at the bandwidth numbers I'm not quite as certain as I was. There are still a boatload of architectural improvements that I feel will carry it but the BC issue is interesting.

First, let's be precise; the Xbox One does not have backwards compatibility with original Xbox and Xbox 360 games, Microsoft have ported code on a game-by-game basis. So I'm wondering if Microsoft have been re-optimising all those old games for the new consoles or if it'll just be running the existing Xbox One build, which isn't optimised for the hardware but should run better because the hardware is more efficient. Which begs the question, how much better could BC be on nextgen Xbox if Microsoft did revisit BC titles and optimise them!

Red Dead Redemption at 1440p in 120fps anybody? :runaway:
the confusing part talking about BC, is talking about BC with XBO gen, or BC with 360 and OG generations.
It's clear it can't run Xbox One X variants of this generation. But the details around 360 and OG generation are still hidden. I'm not sure if they just run the 1S variant or the 1X variant there. I assume the former to keep things clean cut between S and X. The lead platform is, X in this case.
 
did I understand it wrong then? Sigh.... I don't get it. If you have 10GB, of which 2GB have a unique bandwidth for them of 56GB/s it shouldnt' touch the 224GB/s of the other 8GB/s, right? I thought that was a key for the extra performance of the Series X.


if so, then hopefully AMD is going to have a DLSS like solution, 1080p looks blurry on 1440p+ screens
You'll need to wait on details of their 'advanced hardware scaler'
 
Just to clarify for those not following along, no source code was used or referenced in getting original Xbox and Xbox 360 games running on Xbox One. What I believe @DSoup to mean is how Microsoft targeted specific games to get running in the VM executables that wrap each game's distribution media (for lack of a better name for it).
 
I definitely expected Series S to be turning in real world better performance than One X but looking at the bandwidth numbers I'm not quite as certain as I was. There are still a boatload of architectural improvements that I feel will carry it but the BC issue is interesting.

I don't believe we've seen anything definitive yet, so I'm hoping they will release new details on the BC improvement efforts soon. They certainly teased a few nifty concepts such as increased framerate and supporting HDR for non-HDR games that go beyond the previous resolution or texture increases.
 
did I understand it wrong then? Sigh.... I don't get it. If you have 10GB, of which 2GB have a unique bandwidth for them of 56GB/s it shouldnt' touch the 224GB/s of the other 8GB/s, right? I thought that was a key for the extra performance of the Series X.

Yeah I think you're confused. The bandwidth is 'unique' to the extent it can only be accessed at that lower speed, but the architecture is still a unified memory pool. The way you're presenting it, it's as if the CPU cores in the APU have a dedicated bus strictly to that 2GB of 56 gb/sec memory that can run in parallel with the GPU's bus accessing the 8GB at 224gb - that's how a PC CPU+GPU works, but definitely not a console APU. It's one bus.

Like I said, the OS will likely reside in that slower ram so most devs won't even have to think about it, but the 224gb/sec is the best-case scenario for bandwidth in the Series S. It certain circumstances, it has less - but you can never 'add' the slower+faster ram to come up with a total bandwidth.

The closest scenario I can think of is basically the Geforce 970's bus - of the 4gb, 3.5GB ran at the advertised speed, but .5gb was much slower. As long as your vram allocation stayed within that 3.5gb you were fine, but if it spilled over into the 'slower' .5 gb you would see performance drop. It's one chip, it didn't have a separate bus to that slower memory, devs just had to avoid it.
 
Don't forget contention issues. when accessing the 2GB you can't access the other 3 chips.

Huh? Sure you can.

did I understand it wrong then? Sigh.... I don't get it. If you have 10GB, of which 2GB have a unique bandwidth for them of 56GB/s it shouldnt' touch the 224GB/s of the other 8GB/s, right? I thought that was a key for the extra performance of the Series X.

The way modern AMD memory controllers work, there are 4 separate 32-bit channels here. All accesses are done as 64-byte cache lines, and the entire cache line is always stored in a single channel, over 16 clocks.

So the total bandwidth of the system is 224GB/s. If you are running a load that is evenly distributed between all 4 channels, you get 224GB/s. If you are running a load that is only localized on one channel, you get 56GB/s. If you are simultaneously running two loads, one of which uses only one channel, and the other is distributed evenly, your useful total bandwidth might fall below 224GB/s, as one load hogs one channel and causes the other to wait on it. If the scheduling for requests is even, at worst it halves the effective bandwidth for the load that is using all 4 channels, as it completes the other loads faster and builds up a buffer of loads for that fourth channel until it runs out of capacity to buffer and starts stalling.

Generally, you don't want to be doing a lot of accesses to that unbalanced extra memory. The only reason the system is reasonable is that most of the time, you aren't going to do that, as you first stuff the OS reserve there, and then on XSX you maybe stuff some of your housekeeping stuff that you are not using much most of the frame.
 
the confusing part talking about BC, is talking about BC with XBO gen, or BC with 360 and OG generations. It's clear it can't run Xbox One X variants of this generation. But the details around 360 and OG generation are still hidden. I'm not sure if they just run the 1S variant or the 1X variant there. I assume the former to keep things clean cut between S and X. The lead platform is, X in this case.

I'm talking about OG and 360, we know how Microsoft port games because they were remarkably open with Digital Foundry about this. I very much expect Series S and X to run Xbox One games natively. We know some games will get an optimisation patch, like Cyberpunk 2077.

Just to clarify for those not following along, no source code was used or referenced in getting original Xbox and Xbox 360 games running on Xbox One. What I believe @DSoup to mean is how Microsoft targeted specific games to get running in the VM executables that wrap each game's distribution media (for lack of a better name for it).

Yes, it's mostly recompilation to modern 80x86 but they've explained how they tune other parts of some games when they're able to do this. They didn't go into details but they did explain they they run the original OS stack (again likely highly optimised in modern x86) and in any given emulated VM they can probably apply game title-by-title tweaks like forcing anisotropic filtering or just hacking their own APIs to artificially boost graphical quality in other ways.
 
did I understand it wrong then? Sigh.... I don't get it. If you have 10GB, of which 2GB have a unique bandwidth for them of 56GB/s it shouldnt' touch the 224GB/s of the other 8GB/s, right? I thought that was a key for the extra performance of the Series X.


if so, then hopefully AMD is going to have a DLSS like solution, 1080p looks blurry on 1440p+ screens

Yes you understood it wrong.
When acccessing the 2G at 56GB/s you can't access the other 8G because 1/4 of your bus is being contested, thus you will have to idle the other 3/4 of the bus.
We saw it to a lesser extent in the Series X too so people had a good idea how it works.
 
Huh? Sure you can.



The way modern AMD memory controllers work, there are 4 separate 32-bit channels here. All accesses are done as 64-byte cache lines, and the entire cache line is always stored in a single channel, over 16 clocks.

So the total bandwidth of the system is 224GB/s. If you are running a load that is evenly distributed between all 4 channels, you get 224GB/s. If you are running a load that is only localized on one channel, you get 56GB/s. If you are simultaneously running two loads, one of which uses only one channel, and the other is distributed evenly, your useful total bandwidth might fall below 224GB/s, as one load hogs one channel and causes the other to wait on it. If the scheduling for requests is even, at worst it halves the effective bandwidth for the load that is using all 4 channels, as it completes the other loads faster and builds up a buffer of loads for that fourth channel until it runs out of capacity to buffer and starts stalling.

Generally, you don't want to be doing a lot of accesses to that unbalanced extra memory. The only reason the system is reasonable is that most of the time, you aren't going to do that, as you first stuff the OS reserve there, and then on XSX you maybe stuff some of your housekeeping stuff that you are not using much most of the frame.


Lets frame it better.
On any given cycle, you have a choice of accessing the 2GB bank with that 32 bit channel or accessing the 8GB bank with the full 128 bit channel.

Thus when you access the 2GB, you get 56GB/s, when you access the 8GB, you get 224GB/s. You can't do both at the same time, unless you are magically making the 4GB chip send twice the information.
 
As I understand it...they do not touch game code at all for BC optimization.

They do, they recompile it. I linked to the DF article produced with heavy input from Microsoft. In case people are missing the links, the DF article is here:

https://www.eurogamer.net/articles/...x-one-x-back-compat-how-does-it-actually-work

This is three years old, so this has been a known quantity for ages. It's a great read and gives you a real appreciation for how much work goes into porting. :yes:
 
So the system has its memory in clamshell mode right?

That's how we got 10GBs with 128 bit bus, there are 8 chips.

Otherwise did the GDDR6 densities get doubled again? I'm not seeing evidence it's doubled.
 
Yes you understood it wrong.
When acccessing the 2G at 56GB/s you can't access the other 8G because 1/4 of your bus is being contested, thus you will have to idle the other 3/4 of the bus.

That's not how it works.

They most likely have 8 x 16-bit channels (going by XSX) and channels can access independently. Depending on how your data is striped across those channels, you could easily be accessing something from the 'slow' 2GB and something else in the 'fast' 8GB using other channels.

Key point is the bus is split into channels.
 
Huh? Sure you can.



The way modern AMD memory controllers work, there are 4 separate 32-bit channels here. All accesses are done as 64-byte cache lines, and the entire cache line is always stored in a single channel, over 16 clocks.

So the total bandwidth of the system is 224GB/s. If you are running a load that is evenly distributed between all 4 channels, you get 224GB/s. If you are running a load that is only localized on one channel, you get 56GB/s. If you are simultaneously running two loads, one of which uses only one channel, and the other is distributed evenly, your useful total bandwidth might fall below 224GB/s, as one load hogs one channel and causes the other to wait on it. If the scheduling for requests is even, at worst it halves the effective bandwidth for the load that is using all 4 channels, as it completes the other loads faster and builds up a buffer of loads for that fourth channel until it runs out of capacity to buffer and starts stalling.

Generally, you don't want to be doing a lot of accesses to that unbalanced extra memory. The only reason the system is reasonable is that most of the time, you aren't going to do that, as you first stuff the OS reserve there, and then on XSX you maybe stuff some of your housekeeping stuff that you are not using much most of the frame.

Yes you understood it wrong.
When acccessing the 2G at 56GB/s you can't access the other 8G because 1/4 of your bus is being contested, thus you will have to idle the other 3/4 of the bus.
We saw it to a lesser extent in the Series X too so people had a good idea how it works.
I see.... Well, what confused me was the fact that PS5 presented the bandwidth as a total number, 400+GB/s but Xbox has always shown the bandwidth as if they were using some kind of special controller splitting part of the memory for the CPU at a slower speed and extra bandwidth for the GPU. During the new generation consoles presentation I got the sensation that the XsX would be more a lot more powerful than expected (20-30% was the expected number) compared to the PS5, 'cos of the huge extra bandwidth for the GPU alone.

So in effect both the GPU and CPU have the same access speed to the RAM but one of them uses less memory hence less bandwidth?
 
So the system has its memory in clamshell mode right?

That's how we got 10GBs with 128 bit bus, there are 8 chips.

Otherwise did the GDDR6 densities get doubled again? I'm not seeing evidence it's doubled.

Yeah, it's probably something like 6 x 1GB chips and 2 x 2GB chips. Could also be 5 x 2GB I guess, but with only 2 chips in clamshell mode .. assuming that wouldn't break anything.
 
Back
Top