Xbox Series S [XBSS] [Release November 10 2020]

Eolirin · Sep 10, 2020

ToTTenTranz said:
So unless Microsoft is actively forcing all game devs that launched a OneX patched game to patch it again for the Series S, I see no way how the SeriesS will be running OneX code.
They could maybe go around the bandwidth deficit with GPU optimizations and getting maybe more GPU time and compensating on less CPU time, but not on having a whole 2GB less memory.

The Series S shouldn't have 4k textures at all; that's the whole point of having Smart Delivery. The ram, memory bandwidth and ssd size are scaled down precisely for that reason. So of course it won't be running One X versions of games, they have 4k textures. But the automatic BC enhancements will make that mostly a moot point, nevermind games with Series X|S optimizations, like Gears 5 having 120fps on both, targeting different resolutions.

Strange · Sep 10, 2020

iroboto said:
There is 2GB that can only be accessed on its own, that's where you place memory that is very small in nature that doesn't need to be interleaved over many chips.

This isn't the same as the GPU having it's pool split. Where you are splitting the buffer work over as many memory chips as possible and this remaining 0.5 GB is holding up things making the other chips useless. Because the GPU cannot proceed to the next step until the last step is complete.

You've already identified the issues and caveats to this configuration and we know that this configuration is sub optimal.
It's adds unnecessary work for the developers to pre-plan for this configuration because it will result in some sort of impact on bandwidth efficiency.

iroboto · Sep 10, 2020

Strange said:
You've already identified the issues and caveats to this configuration and we know that this configuration is sub optimal.
It's adds unnecessary work for the developers to pre-plan for this configuration because it will result in some sort of impact on bandwidth efficiency.

Its not the same because the GPU shouldn't be the one placing the data it's needs into that 2GB slot.
I'm not sure if that part is coming across: the data it needs to do its job is not in there.

Yes the memory setup is sub optimal when compared to not having any sort of split pool. The price however is optimal. The question will come down to how much you will lose, certainly more on S than X, considering how generous X is with the split pool.

There is extra work on the developer though; that part is undeniable. It's not as easy as just dumping in there and moving on. But it's going to be less work than managing something like esram.

tldr; GPUs scale memory usage via resolution.
CPUs cannot really scale their memory usage via scaling resolution. You'd have to scale the number of mobs on the screen or what not. If you do CPU culling, that could be a resolution change. Or a distance change. But if you do these on the GPU, then once again CPU a non issue.

If the CPUs require so much memory per frame it's not going to change based on resolution.

And yet we see 120fps mode for Gears of War on Series S.
So if there is sufficient enough memory to run Gears 5 at 120fps, I think this proves that the split memory pool is not yet an issue. If you're willing to put in the time to use the features.

There are options available for reducing the memory footprint. Some games will struggle with this more than others.

Eolirin · Sep 10, 2020

Strange said:
You've already identified the issues and caveats to this configuration and we know that this configuration is sub optimal.
It's adds unnecessary work for the developers to pre-plan for this configuration because it will result in some sort of impact on bandwidth efficiency.

The system reserved memory on the Series X, 2.5 gigs, is larger than the slow pool on the Series S. I very much doubt they managed to shave off more than a half a gig of system reserved memory, so it's unlikely that developers will need to deal with the different pool speeds because they probably won't be able to write to that slower pool at all.

Things are a bit different on the Series X though, but the slower memory is much faster there. And the system reserved memory is explicitly taken out of the slower pool there too.

scently · Sep 10, 2020

Eolirin said:
The system reserved memory on the Series X, 2.5 gigs, is larger than the slow pool on the Series S. I very much doubt they managed to shave off more than a half a gig of system reserved memory, so it's unlikely that developers will need to deal with the different pool speeds because they probably won't be able to write to that slower pool at all.

Things are a bit different on the Series X though, but the slower memory is much faster there. And the system reserved memory is explicitly taken out of the slower pool there too.

Yeah, the slow pool on the XSS shouldn't pose any challenge as the OS should be domiciled there entirely meaning the game data should be exclusively in the 8gb.

Silent_Buddha · Sep 11, 2020

Strange said:
You've already identified the issues and caveats to this configuration and we know that this configuration is sub optimal.
It's adds unnecessary work for the developers to pre-plan for this configuration because it will result in some sort of impact on bandwidth efficiency.

Eh?

On PS5 there's memory contention when the CPU needs to access memory. On XBSS and XBSX there's memory contention when the CPU needs to access memory. There's literally no difference.

However, on XBSX and XBSS you're still reading/writing to the other memory channels while the CPU is accessing one memory channel. As opposed to the CPU interleaving all memory accesses across all channels.

And even then on both systems, the CPU access should be over with relatively quickly meaning the impact is going to be low on both systems regardless.

Regards,
SB

Deleted member 13524 · Sep 11, 2020

Eolirin said:
The Series S shouldn't have 4k textures at all; that's the whole point of having Smart Delivery. The ram, memory bandwidth and ssd size are scaled down precisely for that reason. So of course it won't be running One X versions of games, they have 4k textures. But the automatic BC enhancements will make that mostly a moot point, nevermind games with Series X|S optimizations, like Gears 5 having 120fps on both, targeting different resolutions.

What's "4K textures"? Do you mean the marketing term or the standard definition of a texture that is 4096*4096 pixels big? AFAIK those are pretty rare.

see colon · Sep 11, 2020

Strange said:
I feel like I'm falling on deaf ears.
If there is no impact and doing this was fine, we'd see a lot more of gtX550 situations and Microsoft would see no reason to explicitly say this at all.
If this is fine there would be simply four banks at 56GB/s each and we would just add them. the fact that they didn't tells.

Hypothetical situation:

chips A B C D
A is 4GB, other three are 2GB.

Situation A: I left 1.25GB in each of A,B,C, and D.
That's 5GB of data I'm trying to read across 4 chips.
time used would be 1.25 units, and my effective bandwidth is 4GB/unit of time

Situation B: I left 2G of data in A, and 1GB in B,C,and D.
That's ALSO 5GB of data across 4 chips I'm trying to read
Time used would be 2 units, and my effective bandwidth is 2.5GB/unit of time

My effective bandwidth here is 62.5% of what I would have in situation A normally.

Of course you can say that I can find work for the other chips to do while I read A but that's, again, explicit planning you'd have to do.

Of course the best solution is to avoid 2GB at all costs.

Any utilization of the 2G at the same time of the 8G will lead to bandwidth contention because physically the channels for the B C D chips won't be able to access chip A, and by creating an unbalanced workload across 4 chips will create extra inefficiency on top of the usual situation.

What you are explaining here is just how modern memory works. The exact same situation would happen on another piece of hardware if you memory isn't striped across all of it's buses in the most optimal way. The reason why Mircosoft is explicitly saying it is because it is true for their hardware, there is a portion of the memory that is read at a slower rate than the rest, and they would be lying if they were to advertise it as a faster speed than it is.

Why I think it will have little impact on games is because the main bandwidth limited part of 3d rendering is writing and reading the render buffers, which need to be written at least once per frame. As long as this buffer is positioned correctly, there shouldn't be an issue there. Also, the system reserve is rumored to be more than 2GB, and all information points to that occupying the slower parts of the memory first, so the slower memory shouldn't even come into play for games anyway.

ToTTenTranz said:
You mean the One X, right?
You think game developers would leave free RAM completely unused in the One X? That would be a pretty bad use of resources.
If you have free memory, you'll at least load part of the assets needed for the next area / level so you can reduce loading times.

So unless Microsoft is actively forcing all game devs that launched a OneX patched game to patch it again for the Series S, I see no way how the SeriesS will be running OneX code.
They could maybe go around the bandwidth deficit with GPU optimizations and getting maybe more GPU time and compensating on less CPU time, but not on having a whole 2GB less memory.

On PC, and I know it's a different world, but there are tools and sometimes driver level switches that allow you to alter the mip level or general quality of textures. RDNA also has better framebuffer compression and the shaders can read and write in the compressed formats, as can the display portion of the GPU. You and theoretically overcome a memory deficit if you can compress what you can in lossless formats and shave some quality off the textures in a fairly course fashion, but I'm doubtful they would do this as a global policy.

Kugai Calo · Sep 11, 2020

Strange said:
You've already identified the issues and caveats to this configuration and we know that this configuration is sub optimal.
It's adds unnecessary work for the developers to pre-plan for this configuration because it will result in some sort of impact on bandwidth efficiency.

All developers have to do is to specify GPU Bandwidth Optimized when requesting new memory page(s), which they are already doing if working with D3D12, see https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_heap_type

And I believe even on modern PlayStation systems (PS4 and on) developers have to specify the type of memory when requesting new page(s) from the OS, as there are different coherence behaviour. So really this is nothing new.

Kugai Calo · Sep 11, 2020

On multichannel memory: Memory is “stripped” across channels with alignment to the cachelines (“across cacheline boundaries”), and each channel is independent as it will have its own request queue in the memory controller and cmd/addr wires and data wires. Coupled with high clock granted by clean signalling, this is how GDDRn memories achieve high bandwidth versus (LP)DDRn.

So imagine memory setup as pneumatic pipes that deliver one parcel at a time each pipe, a multichannel GDDR6 setup is essentially like many pipes delivering small parcels independently instead of one thick pipe delivering huge parcels. And the size of each chip is therefore the size of the warehouse on one end of the pipes.

Strange · Sep 11, 2020

Kugai Calo said:
On multichannel memory: Memory is “stripped” across channels with alignment to the cachelines (“across cacheline boundaries”), and each channel is independent as it will have its own request queue in the memory controller and cmd/addr wires and data wires. Coupled with high clock granted by clean signalling, this is how GDDRn memories achieve high bandwidth versus (LP)DDRn.

So imagine memory setup as pneumatic pipes that deliver one parcel at a time each pipe, a multichannel GDDR6 setup is essentially like many pipes delivering small parcels independently instead of one thick pipe delivering huge parcels. And the size of each chip is therefore the size of the warehouse on one end of the pipes.

I understand that, all this works very well when all chips have the same workload, especially by striping data evenly across all channels
When the workload becomes imbalanced due to certain channels servicing more memory than the other channels (apparently it's now 5 2GB chips and 2 in clamshell mode, as the Series S image apparently points to one chip on the back of the board) you're introducing workload imbalances, and this will result in performance issues, and it will be that the other channels will idle when that channel has to finish its load.

Unless there are other bottlenecks that make this an non-issue, the access patter doesn't even hit the theoretical limit or you don't access that 2 GB at all, I find it hard to believe that this can be completely mitigated.

Sycologist · Sep 11, 2020

DSoup said:
It's pretty thicc. As somebody who really loves the aesthetic of the white One S, the black circle/grill of Series S doesn't work for me. And now we have as bunch as stupid memes as we did for PS3.

I agree with this. The black circle just looks odd against the all white. Probably would look better if the black was a full black rectangular panel that stretched across the whole side facet. Doesn't really matter for me as I'm getting the XSX anyway to play BC 1X enhanced games at 4K. As for the memes well none of the nexgen console designs are immune to that. You got a fridge, router, and now a grill.

Kugai Calo · Sep 11, 2020

Strange said:
...it will be that the other channels will idle when that channel has to finish its load.
...

That's not true. The advantage of channel parallelism is complete independency, so while the channel occupied by the 2GB is serving its request, the other channels don't have to idle and they can serve whatever request lying in their queues. Your description fits a single wide channel, not multiple channels.

Strange · Sep 11, 2020

Kugai Calo said:
That's not true. The advantage of channel parallelism is complete independency, so while the channel occupied by the 2GB is serving its request, the other channels don't have to idle and they can serve whatever request lying in their queues. Your description fits a single wide channel, not multiple channels.

What happens when all requests are exhausted and we're waiting for one channel to finish?
when workloads are not symmetric there will come a point in time where the pipeline will have to wait for the channel with the heaviest workload to finish.
In that situation there IS no queue for other channels to fulfill. Thus they idle. They have to idle unless you give them meaningless work.

That certainly works if the all channels have access to all banks, but they obviously don't.
If the workload on one channel is much higher than the other three, and I simply don't see a way for the other channels to access an address on another chip. How does the 32 bit bus for chip B going to read/write data on Chip A?

Kugai Calo · Sep 11, 2020

Strange said:
How does the 32 bit bus for chip B going to read/write data on Chip A?

They don't. Physical address is stripped across cacheline boundaries. Say you have a 4KiB page allocated and the beginning physical address is [X], then (in decimal):

Code:

[X:X+63]      is on channel A of chip 0 (channel 0)
[X+64:X+127]  is on channel B of chip 0 (channel 1)
[X+128:X+191] is on channel A of chip 1 (channel 2)
... and so on

This way one CPU request can be served by a single channel, while one GPU request can be served by 2 channels of a single chip.

It's favourable to have each channel does a burst transmit of 64 bytes than having many of them transmitting ~~1 byte~~ 2 bytes each simultaneously, because DRAM does precharge and stuff.

Deleted member 7537 · Sep 11, 2020

Unpopular opinion: way more prettier than any of the other two. Really like the big black circle for the ventilation, seems to be the perfect to console to carry to your friends house. Apart from the Switch we hadn't had one of these since PS2 Super Slim.

zed · Sep 11, 2020

yes to me this is the best looking of the consoles this gen

except for the atarivcs (now this looks good)

Globalisateur · Sep 11, 2020

iroboto said:
Its not the same because the GPU shouldn't be the one placing the data it's needs into that 2GB slot.
I'm not sure if that part is coming across: the data it needs to do its job is not in there.

Yes the memory setup is sub optimal when compared to not having any sort of split pool. The price however is optimal. The question will come down to how much you will lose, certainly more on S than X, considering how generous X is with the split pool.

There is extra work on the developer though; that part is undeniable. It's not as easy as just dumping in there and moving on. But it's going to be less work than managing something like esram.

tldr; GPUs scale memory usage via resolution.
CPUs cannot really scale their memory usage via scaling resolution. You'd have to scale the number of mobs on the screen or what not. If you do CPU culling, that could be a resolution change. Or a distance change. But if you do these on the GPU, then once again CPU a non issue.

If the CPUs require so much memory per frame it's not going to change based on resolution.

And yet we see 120fps mode for Gears of War on Series S.
So if there is sufficient enough memory to run Gears 5 at 120fps, I think this proves that the split memory pool is not yet an issue. If you're willing to put in the time to use the features.

There are options available for reducing the memory footprint. Some games will struggle with this more than others.

???. Gears of War 5 is a game designed on 5GB ram so it's going to be easy to make it run on 7.5GB and double the framerate. The problem of memory won't be seen on XB1 games (or cross-gen games).

iroboto · Sep 11, 2020

Globalisateur said:
???. Gears of War 5 is a game designed on 5GB ram so it's going to be easy to make it run on 7.5GB and double the framerate. The problem of memory won't be seen on XB1 games (or cross-gen games).

Yes a gamemode also designed around a megre amount of memory but also needed to keep everything in memory and not be offloaded.

You bolded: Is not yet an issue.

Until we see a game that is struggling with memory capacity issues right now please let me know.
For all intents and purposes, a 3070 is 8GB of memory too. And the cards that will basically have the same performance classes (and above) all have about 8GB as well.
We're just going to have to hope that the SSD does it job in allowing for there to be enough transfer speeds to make up for the capacity deficit.

Allandor · Sep 11, 2020

iroboto said:
Yes a game also designed around a megre amount of memory but also needed to keep everything in memory and not be offloaded.

You bolded: Is not yet an issue.

Until we see a game that is struggling with memory capacity issues right now please let me know.
For all intensive purposes, a 3070 is 8GB of memory too. And the cards that will basically have the same performance classes (and above) all have about 8GB as well.
We're just going to have to hope that the SSD does it job in allowing for there to be enough transfer speeds to make up for the capacity deficit.

You are missing one details. It is a >$500 card. Even more expensive cards don't have that much more of memory. This is all because of the GDDR6 prices that don't went down.

Xbox Series S [XBSS] [Release November 10 2020]

Eolirin

Strange

iroboto

Daft Funk

Eolirin

scently

Silent_Buddha

Deleted member 13524

Guest

see colon

All Ham & No Potatos

Kugai Calo

Kugai Calo

Strange

Sycologist

Kugai Calo

Strange

Kugai Calo

Deleted member 7537

Guest

zed

Globalisateur

Globby

iroboto

Daft Funk

Allandor

Similar threads