Xbox Series X [XBSX] [Release November 10 2020]

Saying dropping the resolution is a bit misleading unless it's changed.
It's running in 1S mode I believe it was, which is a lot more than just resolution.

But that's a lot of games being added in one go after the initial 4 or 6 originally dropped.
Xbox seems to have something positive every week at the moment.
Xbox should release with fps graphs and details, they take those sorts of things into account when they changed their QA'ing for BC, save DF some work.
Then DF can cherry pick ones their intrested in for independent scrutiny.
 
What is the purpose to split the memory pool?
cost reduction. To gain bandwidth you need either faster clock speed or more chips, or more bits sent per clock.

If you lock clockspeed and bits sent down bus, increasing the number of chips is the only way to increase bandwidth.

10 chips = 560GB/s
8 chips = 448 GB/s

But you still require capacity, in this case 16GB is sufficient and 10GB is not.

So with 8 chips, you can do 8x2 to get 16GB
But with 10 Chips, 10x2 is much more than than 8x2, technically 16.5% more expensive I guess.
So they go 6x2 + 4x1 to get 16GB, but still maintain 560GB/s bandwidth. This might be close to the cost of 8x2, I dunno.

The idea is to compromise to obtain bandwidth and keep costs down. But it comes with some downsides on memory management on behalf the developers.

It's not a split pool, because split pools can operate independently of each other.
ie Xbox One with it's esram is a split pool of memory.
The esram was capable of supporting up to 192GB/s (simultaneous r/w)
and memory was capable of supporting up to 68GB/s (read or write, r/w is going ot be lower)

The GPU is capable of ingesting from main memory 68GB/s, while simultaneously reading and writing to esram. Giving a theoretical bandwdith of over 270 GB/s. One may look at this as ROPs doing work (r/w) on esram, while bringing in new data to process on. Or you're doing work in smaller chunks r/w to esram while streaming in new textures from main memory.

Sounds good on paper, but the limitation of esram being 32MB was likely a headache for many developers to exploit this interaction consistently.
 
Last edited:
cost reduction. To gain bandwidth you need either faster clock speed or more chips, or more bits sent per clock.

If you lock clockspeed and bits sent down bus, increasing the number of chips is the only way to increase bandwidth.

10 chips = 560GB/s
8 chips = 448 GB/s

But you still require capacity, in this case 16GB is sufficient and 10GB is not.

So with 8 chips, you can do 8x2 to get 16GB
But with 10 Chips, 10x2 is much more than than 8x2, technically 16.5% more expensive I guess.
So they go 6x2 + 4x1 to get 16GB, but still maintain 560GB/s bandwidth. This might be close to the cost of 8x2, I dunno.

The idea is to compromise to obtain bandwidth and keep costs down. But it comes with some downsides on memory management on behalf the developers.

It's not a split pool, because split pools can operate independently of each other.
ie Xbox One with it's esram is a split pool of memory.
The esram was capable of supporting up to 192GB/s (simultaneous r/w)
and memory was capable of supporting up to 68GB/s (read or write, r/w is going ot be lower)

The GPU is capable of ingesting from main memory 68GB/s, while simultaneously reading and writing to esram. Giving a theoretical bandwdith of over 270 GB/s. One may look at this as ROPs doing work (r/w) on esram, while bringing in new data to process on. Or you're doing work in smaller chunks r/w to esram while streaming in new textures from main memory.

Sounds good on paper, but the limitation of esram being 32MB was likely a headache for many developers to exploit this interaction consistently.
But wouldn't it be cheaper to keep pure 16 but with smaller speed? Like 500 GB or something.
 
But wouldn't it be cheaper to keep pure 16 but with smaller speed? Like 500 GB or something.
So then you'd need a faster clockspeed on memory, or transfer more bits per clock.
The clockspeed one is doable, but also costs a lot more than the standard speed. I don't think increasing bits per clock is possible without changing the design of the memory itself
 
But wouldn't it be cheaper to keep pure 16 but with smaller speed? Like 500 GB or something.

If you go with pure X amount, then choose 2:
  • High capacity
  • High speed
  • Low cost
You can have 2 of those but not all of them. MS required a bit of all of them. So they kept High Capacity (16 GB).

They compromised on low cost. 10 chips with 6x 1 GB chips and 4x 2 GB chips. That was instead of 16x lower speed 1 GB chips or 8x higher speed 2 GB chips. Lower speed chips will be cheaper for a given capacity but wiring and signal integrity becomes a significant problem with that number of chips at the total aggregate speed they wanted. 8 chips make wiring and signal integrity much easier, but that comes at a greatly increased monetary cost for high speed (higher than what they used) high capacity chips.

That means they also compromised on High Speed. Instead of having their target speed for the entire RAM pool, the target speed is only achievable when accessing 10 GB of memory. Accessing the remaining 6 GB of memory will come at a lower speed. However, they determined that a game doesn't need maximum speed for all memory accesses by any given game. Some things just do not benefit from higher memory speeds. Basically some memory accesses aren't as time sensitive as others. Keeping track of the game state (AI, location of interactable items in the world, etc.) doesn't need even as much bandwidth as you have on the speed limited portion of memory.

The big question going forward is whether 10 GB of really fast memory is enough for all rendering purposes going forward. I think a big part of this is going to be how easy and how robust SFS is on XBS-X (and XBS-S as well). SFS serves dual purposes in that it effectively increases memory access speeds by loading less data (only fragments of MIP levels) which in turn also reduces the memory required if you load less data. IF SFS works as advertised once developers use it, that 10 GB is unlikely to ever be a limit for any cross platform game this generation. Even without SFS, it's possible that the 10 GB wouldn't be limit as again not all memory accesses by games will saturate even the low speed memory addresses.

Regards,
SB
 
Discussed a few times already, TL/DR summary is: It's not a split memory pool, the different speeds are to Maximize Performance and Minimize Cost.

The electrical engineer that actually came up with this split design explicitly explained that this was due to signal integrity issues on a wide bus in one of the Gamestack videos.
 
What Xbox has the most straightforward architecture? Cell aside, it feels like Playstation usually has simpler architecture each gen.
What metric is being used to measure complexity? Amount of hardware in general, or what was exposed to the programmer?
Also who is being compared in each generation?
The original Playstation had a CPU and some dedicated processors on-die, with a separate graphics chip.
The PS2 had a CPU with dedicated processors, though it included a non-standard on-die bus between the CPU and vector units as well as scratchpad memory.
In these cases, there was an element of CPU die silicon that went towards something to be programmed for geometry processing. The PS2's graphics chip had EDRAM, although the graphics elements were primarily related to pixel processing.
The PS2 also included a PS1 processing element that served in an IO capacity if not being tasked with backwards compatibility.
Much of this was exposed at a lower level and without the level of hardware management and protection common today.

The original Xbox had a variant of a commodity x86 processor, which was straightforward to program despite having a comparatively large amount of internal complexity. The GPU was a variant of a PC architecture GPU with hardware T&L.

The PS3 had a similar CPU+processing element concept, although the SPEs were tasked with more than geometry (they did rather well with the geometry tasks they were given). There was one general purpose core that could be programmed in a relatively straightforward manner, and the SPEs were architecturally distinct programming targets with an explicit and non-standard memory organization. This was paired with an unusually standard GPU, for Sony. The apparent story there is that Sony's original plan for a more exotic solution fell through.
The XBox 360 had a custom CPU, but it was a uniform set of 3 general purpose cores. The GPU was a unified architecture with an EDRAM pool.

The PS4 design is an APU that is mostly standard. The Xbox One had the ESRAM, which was a memory pool that introduced complexity, although in terms of how it was integrated into the system it was intended to be even easier to use than what was considered acceptable with the Xbox 360's EDRAM.
The current gen consoles are APUs and it's down to secondary hardware blocks and ancillary elements like IO or variations in IP or bus width to distinguish them.



Maybe that's the issue we have be seeing in some games.
Is this the claim that was corrected a few posts ago? This seems like a mistatement and a mislabelling. The OS's primary footprint is in the 6GB region, but it's a fraction of it.

The electrical engineer that actually came up with this split design explicitly explained that this was due to signal integrity issues on a wide bus in one of the Gamestack videos.
Was this the choice of 320 bits versus wider? The differently handled address ranges wouldn't seem to matter electrically. The split is a matter of the capacity of the chips on the bus. The bus is not affected by the chip's capacity.
 
What metric is being used to measure complexity? Amount of hardware in general, or what was exposed to the programmer?
Also who is being compared in each generation?
The original Playstation had a CPU and some dedicated processors on-die, with a separate graphics chip.
The PS2 had a CPU with dedicated processors, though it included a non-standard on-die bus between the CPU and vector units as well as scratchpad memory.
In these cases, there was an element of CPU die silicon that went towards something to be programmed for geometry processing. The PS2's graphics chip had EDRAM, although the graphics elements were primarily related to pixel processing.
The PS2 also included a PS1 processing element that served in an IO capacity if not being tasked with backwards compatibility.
Much of this was exposed at a lower level and without the level of hardware management and protection common today.

The original Xbox had a variant of a commodity x86 processor, which was straightforward to program despite having a comparatively large amount of internal complexity. The GPU was a variant of a PC architecture GPU with hardware T&L.

The PS3 had a similar CPU+processing element concept, although the SPEs were tasked with more than geometry (they did rather well with the geometry tasks they were given). There was one general purpose core that could be programmed in a relatively straightforward manner, and the SPEs were architecturally distinct programming targets with an explicit and non-standard memory organization. This was paired with an unusually standard GPU, for Sony. The apparent story there is that Sony's original plan for a more exotic solution fell through.
The XBox 360 had a custom CPU, but it was a uniform set of 3 general purpose cores. The GPU was a unified architecture with an EDRAM pool.

The PS4 design is an APU that is mostly standard. The Xbox One had the ESRAM, which was a memory pool that introduced complexity, although in terms of how it was integrated into the system it was intended to be even easier to use than what was considered acceptable with the Xbox 360's EDRAM.
The current gen consoles are APUs and it's down to secondary hardware blocks and ancillary elements like IO or variations in IP or bus width to distinguish them.




Is this the claim that was corrected a few posts ago? This seems like a mistatement and a mislabelling. The OS's primary footprint is in the 6GB region, but it's a fraction of it.


Was this the choice of 320 bits versus wider? The differently handled address ranges wouldn't seem to matter electrically. The split is a matter of the capacity of the chips on the bus. The bus is not affected by the chip's capacity.


@ 4:23 mins
 

@ 4:23 mins

This may need more specificity on what you are referring to. There's over a minute of discussion. 4:23 occurs after a change in topic from the memory space division to physical considerations. There's a brief return to discussing the split around 5:20, but my interpretation is that this was concerning that the devices are physically different, on top of all the other unexpected challenges from GDDR6. The address space split is a consequence of Microsoft's choice on how to the different devices would interface with the memory subsystem, but it isn't the only way to do so. If Microsoft had chosen a different interleaving strategy or virtual to physical mapping, the device considerations would remain.
 
Was this the choice of 320 bits versus wider? The differently handled address ranges wouldn't seem to matter electrically. The split is a matter of the capacity of the chips on the bus. The bus is not affected by the chip's capacity.

That's the only conclusion which makes sense to me here.
 
Regarding SFS presentation.
I found a comment in the presentation questionable.
Said that the reason that PRT wasn't used much in gen 8 titles was due to speed.
The fact that studioes rolled their own texture streaming and never used hardware functionality that was common across platforms and PC meant there was other inherent issues with the hardware implementation. Think @sebbi may have mentioned flexability?

The big question going forward is whether 10 GB of really fast memory is enough for all rendering purposes going forward. I think a big part of this is going to be how easy and how robust SFS is on XBS-X (and XBS-S as well). SFS serves dual purposes in that it effectively increases memory access speeds by loading less data (only fragments of MIP levels) which in turn also reduces the memory required if you load less data. IF SFS works as advertised once developers use it, that 10 GB is unlikely to ever be a limit for any cross platform game this generation. Even without SFS, it's possible that the 10 GB wouldn't be limit as again not all memory accesses by games will saturate even the low speed memory addresses
I'm still unsure what exactly PS5 doesn't have apart from the minor customization that xbox had made like texture filter format.
In the presentation said that other platforms didn't have it, but it was said in such a general way have no idea what it was in regards to. As sampler feedback isn't new, it's just exposed?

And snce it's coming to PC also, it could mean that it's a level playing field and so memory usage will just get used up across the board leaving XSS with the same amount of work to manage it anyway.
What it should mean though is texture quality should see a decent rise in compassion to even 1X.
 
Said that the reason that PRT wasn't used much in gen 8 titles was due to speed.
The fact that studioes rolled their own texture streaming and never used hardware functionality that was common across platforms and PC meant there was other inherent issues with the hardware implementation. Think @sebbi may have mentioned flexability?
Tile sizes are fixed in hardware PRT. I forget the amount but 64K tile size seems to come to mind. IIRC it was too large for them. If you want a custom tile size, PRT+ won't do it for you. Gains from the hardware portion of it, were loss else where down the chain to support the size of the tile if that's sort of what he was getting at.
 
Back
Top