Xbox Series S [XBSS] [Release November 10 2020]

So in effect both the GPU and CPU have the same access speed to the RAM but one of them uses less memory hence less bandwidth?

Speed is not determined by CPU or GPU, its determined by the memory location.

Though CPUs typically don't exceed 100 GB/s (some statement made about the Series X, maybe in the DF deep dive) while GPUs can utilize as much as they can.
 
That's not how it works.

They most likely have 8 x 16-bit channels (going by XSX) and channels can access independently. Depending on how your data is striped across those channels, you could easily be accessing something from the 'slow' 2GB and something else in the 'fast' 8GB using other channels.

Key point is the bus is split into channels.

so you're saying they're splitting the 10 gigs in to effectively three banks and writing stuff into it as needed.

Bank 1: slow memory that can only use that 4G chip. 56GB/s, size is 2GB+alpha

Bank 2: other three chips that can also be accessed while the 2 gigs is being accessed, striped across 3 chips. 56GB/s*3= 168GB/s, size is alpha*3

Bank 3: stuff that you'll be using when you're sure you won't be reading from Bank 1, and is striped across all 4 chips, 224GB/s, size is 10GB-2GB-4*alpha.

You sure about that?

If they can physically access the 2 GB and the 8GB at the same time using other channels then we'll have, in fact, 10gb at 224+56= 280GB/S and you'd see twice the pins on the 4Gb chip

remember, I'm talking about any given cycle. If averaged across several cycles of course you'll see stuff in bbetween.
 
Last edited:
But the details around 360 and OG generation are still hidden. I'm not sure if they just run the 1S variant or the 1X variant there. I assume the former to keep things clean cut between S and X. The lead platform is, X in this case.
i wouldn't be so sure about that.
I wouldn't be surprised if the XSX gets additional features, and if anything the XSS to run simular to 1X.

Don't really need to worry about differentiation where those BC titles are concerned, I wouldn't be surprised if they push the boat out for both models as BC is their thing.
 
Speed is not determined by CPU or GPU, its determined by the memory location.

Though CPUs typically don't exceed 100 GB/s (some statement made about the Series X, maybe in the DF deep dive) while GPUs can utilize as much as they can.
thanks for the explanation!
That's not how it works.

They most likely have 8 x 16-bit channels (going by XSX) and channels can access independently. Depending on how your data is striped across those channels, you could easily be accessing something from the 'slow' 2GB and something else in the 'fast' 8GB using other channels.

Key point is the bus is split into channels.
This might explain it, but I am not sure, @Strange is very probably right.

However, there is always this doubt...., why not presenting the full bandwidth in a single number a la Sony instead of ambiguous split numbers?
 
Lets frame it better.
On any given cycle, you have a choice of accessing the 2GB bank with that 32 bit channel or accessing the 8GB bank with the full 128 bit channel.

Thus when you access the 2GB, you get 56GB/s, when you access the 8GB, you get 224GB/s. You can't do both at the same time, unless you are magically making the 4GB chip send twice the information.

No, you are wrong, because the 224GB/s bus is not something you access as a single unit. It consists of 4 channels, which each operate independently. You usually stripe accesses over it so that you access it evenly, but this is not something where a single access occupies all of it.

Yeah, it's probably something like 6 x 1GB chips and 2 x 2GB chips. Could also be 5 x 2GB I guess, but with only 2 chips in clamshell mode .. assuming that wouldn't break anything.

Looking at the bandwidth, I think it's possible that they are using a 128-bit bus, and 3x2GB + 1x4GB.
 
so you're saying they're splitting the 10 gigs in to effectively three banks and writing stuff into it as needed.

Bank 1: slow memory that can only use that 4G chip. 56GB/s

Bank 2: other three chips that can also be accessed while the 2 gigs is being accessed, striped across 3 chips. 56GB/s*3= 168GB/s

Bank 3: stuff that you'll be using when you're sure you won't be reading from Bank 1, and is striped across all 4 chips, 224GB/s

You sure about that?

I'm not sure there's a 4GB chip involved (think they've probably gone the clamshell route but I could be wrong), but we know there are two controllers, and on XSX each controller has four 16-bit channels (two channels per chip or per pair of chips in clamshell mode).

So any channels that aren't being used to access the 'slow' memory are free to access the 'fast' memory. And that will be a matter of having accesses lined up that you can fulfil based on what's spread across the channels that are free.

I wouldn't describe it as three banks though. It's one pool split into two areas, with one (the slow one) that can only be accessed using using a specific 1/4 of the bus.

At the level of an individual chip, it doesn't care about slow or fast memory.
 
No, you are wrong, because the 224GB/s bus is not something you access as a single unit. It consists of 4 channels, which each operate independently. You usually stripe accesses over it so that you access it evenly, but this is not something where a single access occupies all of it.



Looking at the bandwidth, I think it's possible that they are using a 128-bit bus, and 3x2GB + 1x4GB.

So you're saying it's 4 banks of 56GB/s each? to do what you say in practice you don't stripe stuff across chips at all, making it even harder to hit the max bandwidth because you never know what data you want to access together and you certainly don't try to duplicate data across chips because RAM is real estate that you don't waste.
To effectively use all 128 pins you want a single access to hit all 128 pins. The best way would be to store data across all chips evenly. When you do exactly that you make that data striped across all four chips inaccessible when you allocate 1/4 of the bus to read from one chip.

https://www.anandtech.com/show/4221/nvidias-gtx-550-ti-coming-up-short-at-150/2
 
Last edited:
I love this little guy. As an engineer I have a thing for small, efficient devices. The idea of dramatically lowering resource and cooling costs by targeting a 1080p (native) resolution is incredibly appealing.

I do worry about whether its ray tracing prowess will be able to keep up with its older sibling, even with the reduced resolution. Ray/box and ray/triangle intersections are largely independent of target resolution, correct? If so, they will consume a much larger fraction of the available hardware resources (flops, bandwidth, and whatever fixed-function RT resources exist) on the XSS relative to the XSX. Thoughts?
 
Just to clarify for those not following along, no source code was used or referenced in getting original Xbox and Xbox 360 games running on Xbox One. What I believe @DSoup to mean is how Microsoft targeted specific games to get running in the VM executables that wrap each game's distribution media (for lack of a better name for it).

Aren't One games also running on a VM, or did I misunderstand something there?
 
However, there is always this doubt...., why not presenting the full bandwidth in a single number a la Sony instead of ambiguous split numbers?

MS are trying not to be ambiguous about their particular setup, hence the two numbers. :)

MS can't just use a single number because their memory setup has two speeds. This is because they have different amounts of memory on some channels. They have an area of memory that you can access across all channels, and another area you can only access from some channels.

For XSX this was about increasing bandwidth for the GPU beyond a common 256-bit bus by increasing to a 320 bit bus while sticking with 16 GB of ram. For XSS OTOH this is about putting 10 GB of ram on a 128-bit bus.

In both cases, it's about getting the amount of ram you want on the bus width you want, and creating an area of memory that can go full tilt across all channels with regular distribution across those channels.

Sony have the same size chips across all channels. Their bus size and ram quantity happen to align nicely in that sense.
 
Aren't One games also running on a VM, or did I misunderstand something there?

Yes, but they're not distributed with an entire emulated Dashboard layer inside of it, like the OG-X and X360 games are. You still see the X360 Blades interface inside of any X360 game when run under BC. Moving forward, you won't see an emulated Dashboard layer inside the Xbox One games running on Xbox Series S|X hardware. So the inception point won't be there for Xbox One titles.
 
So you're saying it's 4 banks of 56GB/s each?

Channels, not banks but yes. This, by the way, is how all modern memory controllers work, on every device.

To effectively use all 128 pins you want a single access to hit all 128 pins. The best way would be to store data across all chips evenly. When you do exactly that you make that data striped across all four chips inaccessible when you allocate 1/4 of the bus to read from one chip.

You cannot have a single access hitting all 128 pins, because in a modern cached system, a single access is typically just 64B, and GDDR6 has a burst length of 16n. This means that if you do a single access out of a single 16bit GDDR6 channel, you get 32 bytes. AMD gangs up two such channels per memory controller, so a single access delivers a single 64B result. Beyond that, you hope to spread all your access evenly across the channels, but this is of course never 100%.
 
I think there are a ton of people out there that still game on a 1080p TV, so this Series S option could be extremely attractive to people not interested in upgrading their TV. Its an inexpensive option that will play next gen games pretty darn well at 1080p. Of course Microsofts PR bulletin list 1440p, but I see a lot of developers choosing to target 1080p native. Taking a game that targets native 4K on Series X to 1080p on Series S should be pretty straightforward. Other than reduced pixels, quality of those pixels should remain high. The CPU being clocked 200Mhz less on Series S seems pretty weird. The thermals for an extra 200Mhz on the CPU cant be much. Seems like this will be a nuisance for developers. Nothing that they cant deal with, but is it a hurdle that really needs to be there? Like others have said, seems like this might have been intentional to make sure Series X has superior performance in every way.
 
To effectively use all 128 pins you want a single access to hit all 128 pins. The best way would be to store data across all chips evenly. When you do exactly that you make that data striped across all four chips inaccessible when you allocate 1/4 of the bus to read from one chip.

If every access hit all pins you wouldn't want multiple channels. And if every access blocked all other accesses then the PS5 would have that same problem.

But fortunately that doesn't happen, there are multiple channels, and there are hundreds of individual units that can directly or indirectly trigger accesses to keep multiple channels busy.

I love this little guy. As an engineer I have a thing for small, efficient devices. The idea of dramatically lowering resource and cooling costs by targeting a 1080p (native) resolution is incredibly appealing.

I do worry about whether its ray tracing prowess will be able to keep up with its older sibling, even with the reduced resolution. Ray/box and ray/triangle intersections are largely independent of target resolution, correct? If so, they will consume a much larger fraction of the available hardware resources (flops, bandwidth, and whatever fixed-function RT resources exist) on the XSS relative to the XSX. Thoughts?

We've got an interesting thread on scalability on this very sub forum! Here's a page where folks start discussing their thoughts on ray tracing:

https://forum.beyond3d.com/threads/the-scalability-and-evolution-of-game-engines-spawn.61872/page-8

Basically, while the number of intersections per ray won't change automatically with resolution (though you could probably build a resolution based lod adjust that would help), the number of rays cast tends to scale pretty directly with resolution.

In pure path tracing (the most expensive type of ray tracing afaik) the load scales almost directly with resolution. Casting rays into the world is expensive, and you normally do it with a close relationship to your rendering resolution. You don't want to do it any more than you have to for the image you want to create.
 
What if I told you devs should place stuff that needs fast bw in the fast pool and stuff that doesn't need fast bw in the slower pool. I am hoping they'll be capable of doing that optimization.
 
We've got an interesting thread on scalability on this very sub forum! Here's a page where folks start discussing their thoughts on ray tracing:

https://forum.beyond3d.com/threads/the-scalability-and-evolution-of-game-engines-spawn.61872/page-8
Thanks!

Basically, while the number of intersections per ray won't change automatically with resolution (though you could probably build a resolution based lod adjust that would help), the number of rays cast tends to scale pretty directly with resolution.
Oh right, of course. Duh! Thanks again.
 
The CPU being clocked 200Mhz less on Series S seems pretty weird. The thermals for an extra 200Mhz on the CPU cant be much. Seems like this will be a nuisance for developers. Nothing that they cant deal with, but is it a hurdle that really needs to be there? Like others have said, seems like this might have been intentional to make sure Series X has superior performance in every way.
From a niche marketing point of view, I guess it slightly lessens the chance the XSS shows up the XSX in a framerate comparison?

Wouldn't the driver load on the CPU be slightly lower at a lower res? Plus, the PS5's lower CPU clocks also means the multiplatform gap is <= 153MHz.
 
Back
Top