Xbox Series X [XBSX] [Release November 10 2020]

There was a rumor that S is not a tower.
Also X devkit looks like Scorpio devkit (dunno if it has single board).
There is no reason to make not very power hungry device with 2 boards.

We don't know how power hungry or not it is and if it's dramatically lower power draw than XBSX then we're probably looking at an equally dramatic performance discount. The reasons not to do this are all about manufacturing complexity, two distinct board layouts introduces complexity and uncertainty into your stock plans.


MS have been managing the One S/X just fine. They're arguable more divergent that the Series S/X. Same for PS4/Pro.

They never chose to find themselves where they are and they killed X Series as soon as they could. They had to make the X Series to rebuild credibility so their successor platform had a shot. Starting from a blank sheet I would expect much greater parts commonality or barely any and radically lower cost.

Sony had market dominance and were margin harvesting, the €399 price point had slipped away and a mild refresh of their platform let them reclaim it. It doesn't cause these stock issues because your core demographic has already bought into your platform and are far less likely to buy your competitors box because you have no stock. Console launches are the traditional moment when that all falls apart so execution is much more important than during a mid life refresh

Honestly the choices MS is making itself between XBSS and XBSX are as interesting to me as the PS5 to XBSX is. I mean they still haven't officially confirmed it's a thing and they're allegedly planning to start selling it in <90 days
 
Last edited:
Back on topic, anyone want to take a guess at how much MS can shave of the XSX die for Lockhart? As we have a die shot for XSX, and we know roughly about TFlops and memory configuration we can probably start making some rough guesses now. Time to crack open MS paint and load up this:

The GPU section takes roughly 170mm^2 (green parts not including the Fabric Coherency). I think each DCU is roughly 4mm^2, so I've pegged the 56 CUs at roughly 112mm^2. The front-end of the GPU would then be roughly 58mm^2.

If we just chop the GPU section and fabric in half, that's already about 96-98mm^2 (14DCUs, half front-end, single fabric area). Of course, it wouldn't be that many CUs as the frequency would be pretty low even by RDNA1 standards.

Let's say by coincidence that the sweet spot for RDNA1/2 might be something closer to the base clock of Navi 10, which is around 1.5-1.6GHz (Navi10). Working backwards, that gives us a potential candidate of 20 active CUs. If we further assume half the front end (single shader engine, 2 arrays), then we've either got 10 or 12 DCUs (5 or 6 DCUs per array)
  1. 20 / 20 CUs, no redundancy = 1.56GHz
  2. 18 / 20 CUs, disable 1 CU = 1.736GHz (would be preferable for the front-end, but maybe not for keeping things low heat & yields for the low cost box)
  3. 22 / 24 CUs, disable 1 CU = 1.42GHz
  4. etc.
So going by the above, about 40-48mm^2 for the DCUs. So maybe 29mm^2 (half Ana front-end) + 40-48mm^2 -> 70-80mm^2 for the GPU instead of 170mm^2.

Of course, then there would be die savings from removing GDDR6 slices. I'm still of the opinion it'll be 128-bit bus (40% bandwidth for 1/3rd GPU profile), so that'd be 3x13.6mm^2 savings ~41mm^2.

----

So I dunno. 90-100 +41 = 131-141mm^2 smaller, which would put the device at just around 220-230mm^2.

Roughest of ballparks disclaimer etc. blahb lahblalskfdjaksdjfldkjf
 
Last edited:
Splinter Cell on the Xbox was leagues above the PS2/GCN versions.
Splinter Cell was a holiday 2002 release. That's a year after the launch of Xbox, and well outside the launch window. Splinter Cell is indeed a looker on Xbox, but it wasn't a title available at launch so it can't really be held up as a title that showed the power of a new generation.

And Riddick: Escape from Butcher Bay... That looked so much better on the OG Xbox than on anything else at the time.
It also came out in 2004. See my comment above. The context was that the last console that released with titles that really showed off an increase in power was Dreamcast, and I'd agree with that. If you compare something like NFL 2K to anything on PS1 or N64, and you get something like 4x the pixels and maybe even 4x the polygons (not really sure about this) at twice the framerate with better lighting, texture filtering, and higher quality artwork. It was a big step up.

There was a rumor that S is not a tower.
Also X devkit looks like Scorpio devkit (dunno if it has single board).
There is no reason to make not very power hungry device with 2 boards.
If you have all the IO/storage on a board and the APU/RAM on another, you can use the IO board for every console and have the expensive APU part swapped out per SKU. And that wouldn't prevent you from changing the shape of the box, you could have the S connected side side by side and connect the boards with a flexible cable of some sort, assuming the connectors are not orientated properly. This could lower BOM if the IO board is cheaper in bulk and the cost of making the APU board is not more than making a discrete board when taking into account the IO board.
 
From the Hotchips Q & A
https://www.anandtech.com/show/1599...ft-xbox-series-x-system-architecture-600pm-pt

Q: Can you stream into the GPU cache?
A: Lots of programmable cache modes. Streaming modes, bypass modes, coherence modes.
Q: Coherency CPU and GPU?
A: GPU can snoop CPU, reverse requires software
Q: Why do you need so much math for audio processing?
A: 3D positional audio and spatial audio and real world spaces if you 300-400 audio sounds positional in 3D and want to start doing other effects on all samples, it gets very heavy compute. Imagine 20 people fighting in a cave and reflections with all sorts of noises
 
The GPU section takes roughly 170mm^2 (green parts not including the Fabric Coherency). I think each DCU is roughly 4mm^2, so I've pegged the 56 CUs at roughly 112mm^2. The front-end of the GPU would then be roughly 58mm^2.

If we just chop the GPU section and fabric in half, that's already about 96-98mm^2 (14DCUs, half front-end, single fabric area). Of course, it wouldn't be that many CUs as the frequency would be pretty low even by RDNA1 standards.

Let's say by coincidence that the sweet spot for RDNA1/2 might be something closer to the base clock of Navi 10, which is around 1.5-1.6GHz (Navi10). Working backwards, that gives us a potential candidate of 20 active CUs. If we further assume half the front end (single shader engine, 2 arrays), then we've either got 10 or 12 DCUs (5 or 6 DCUs per array)
  1. 20 / 20 CUs, no redundancy = 1.56GHz
  2. 18 / 20 CUs, disable 1 CU = 1.736GHz (would be preferable for the front-end, but maybe not for keeping things low heat & yields for the low cost box)
  3. 22 / 24 CUs, disable 1 CU = 1.42GHz
  4. etc.
So going by the above, about 40-48mm^2 for the DCUs. So maybe 29mm^2 (half Ana front-end) + 40-48mm^2 -> 70-80mm^2 for the GPU instead of 170mm^2.

Of course, then there would be die savings from removing GDDR6 slices. I'm still of the opinion it'll be 128-bit bus (40% bandwidth for 1/3rd GPU profile), so that'd be 3x13.6mm^2 savings ~41mm^2.

----

So I dunno. 90-100 +41 = 131-141mm^2 smaller, which would put the device at just around 220-230mm^2.

Roughest of ballparks disclaimer etc. blahb lahblalskfdjaksdjfldkjf

I’m still sticking with it being repurposed XSX chips with bad cores.
 
Hey guys, thanks for posting the HotChips Xbox stuff here, I wasn't easily able to do so myself, as I had spilled coffee with sugar all over my desktop keyboard yesterday and it's trashed.
 
There is now two distinctions with RDNA2 RT acceleration:

1- It can't accelerate BVH Traversal, only ray intersections, traversal is performed by the shader cores.
2- Ray Intersections is shared with texture units.

In comparison, Turin RT cores:
1- Accelerate BVH traversal on their own
2- Ray Intersections is independent and is not shared with anything else

So in a sense RDNA2 solution is hybrid, as it is shared between both textures and shaders compared to Turing's solution.

(also posted in the RDNA 2 PC thread)

I have an idea on what accelerating BVH traversal does, but what is ray intersections?
 
Why not package the salvaged die to the Lockhart pinout?
The number of dies that fits the criteria to do this is probably too low to make it worth while. It has to have enough defects that disabling 4CUs isn't enough while still functional enough to even act as a cut down chip. My guess is going to be that there won't be that many of these chips even if you were trying to do this. I mean, maybe they can find use for another 0.01% of chips if they do this but it seems like a lot of work for very little gain.
 
The GPU section takes roughly 170mm^2 (green parts not including the Fabric Coherency). I think each DCU is roughly 4mm^2, so I've pegged the 56 CUs at roughly 112mm^2. The front-end of the GPU would then be roughly 58mm^2.

If we just chop the GPU section and fabric in half, that's already about 96-98mm^2 (14DCUs, half front-end, single fabric area). Of course, it wouldn't be that many CUs as the frequency would be pretty low even by RDNA1 standards.

Let's say by coincidence that the sweet spot for RDNA1/2 might be something closer to the base clock of Navi 10, which is around 1.5-1.6GHz (Navi10). Working backwards, that gives us a potential candidate of 20 active CUs. If we further assume half the front end (single shader engine, 2 arrays), then we've either got 10 or 12 DCUs (5 or 6 DCUs per array)
  1. 20 / 20 CUs, no redundancy = 1.56GHz
  2. 18 / 20 CUs, disable 1 CU = 1.736GHz (would be preferable for the front-end, but maybe not for keeping things low heat & yields for the low cost box)
  3. 22 / 24 CUs, disable 1 CU = 1.42GHz
  4. etc.
So going by the above, about 40-48mm^2 for the DCUs. So maybe 29mm^2 (half Ana front-end) + 40-48mm^2 -> 70-80mm^2 for the GPU instead of 170mm^2.

Of course, then there would be die savings from removing GDDR6 slices. I'm still of the opinion it'll be 128-bit bus (40% bandwidth for 1/3rd GPU profile), so that'd be 3x13.6mm^2 savings ~41mm^2.

----

So I dunno. 90-100 +41 = 131-141mm^2 smaller, which would put the device at just around 220-230mm^2.

Roughest of ballparks disclaimer etc. blahb lahblalskfdjaksdjfldkjf

Interesting!

I spent a while in paint today, drawing lines around what I thought were the 56 CUs, which mirrored down the middle. Above each of these 28 DCUs, but still within the GPU section, were two further mirrored blocks which I took to be "stuff" related to each of the two shader engines. There was also a none symmetrical block top and middle in the GPU section, with the CPUs and media engine stuff directly above. I assumed for safety's sake this is stuff you probably can't just divide into two.

Like you I came to the conclusion that if die area reduction is critical they're going to have to go with a single shader engine. And I still agree with on the 128-bit bus, those things take up a fair amount of space. Knowing that MS have opted for the full four channels per controller (for a total of 20 on XSX) means that even with a 128-bit bus they've still got 8 channels, which is a lot more than Renoir!

Anyway, assuming one shader engine, the none symmetrical GPU block staying as it is, removing 6 memory controllers, removing one of the two Fabric Coherency blocks (I have no idea whether that would fly), and going with 20 CU's from 22, my hi-tech methods *cough* led me to exactly ... roughly ... 146 mm^2 smaller, for a guestimated area of 214 mm^2.

Which means I'm being a little more optimistic than you, and as we know, optimism is wrong.

My scientific gut feeling is that LH will be 35~40% smaller than Anaconda.
 
The number of dies that fits the criteria to do this is probably too low to make it worth while. It has to have enough defects that disabling 4CUs isn't enough while still functional enough to even act as a cut down chip. My guess is going to be that there won't be that many of these chips even if you were trying to do this. I mean, maybe they can find use for another 0.01% of chips if they do this but it seems like a lot of work for very little gain.
Given how much of the die area is GPU, I think this is a little narrow-minded. Keep in mind it’s not just about defect density. It could also fail speed and/or power limits. This is how CPU and GPU guys have been doing it for ages.
 
Last edited:
https://www.anandtech.com/show/1599...ft-xbox-series-x-system-architecture-600pm-pt

09:35PM EDT - Q: Are you happy as DX12 as a low hardware API?
A: DX12 is very versatile - we have some Xbox specific enhancements that power developers can use. But we try to have consistency between Xbox and PC. Divergence isn't that good. But we work with developers when designing these chips so that their needs are met. Not heard many complains so far (as a silicon person!). We have a SMASH driver model. The games on the binaries implement the hardware layed out data that the GPU eats directly - it's not a HAL layer abstraction. MS also re-writes the driver and smashes it together, we replace that and the firmware in the GPU. It's significantly more efficient than the PC.

Interesting to see how the GPU implementation differs from PC. No need for a HAL, for instance. Also custom firmware for the GPU as well as a console specific driver.

Regards,
SB
 
Do "disabled" circuits on a die still leak?
if they are disabled by software maybe, via lasercut, no.
But they can still transfer the heat. deactivated parts are like a really good heat-spreader ;)

I'm willing to bet they are not going to disable half the die to repurposed them. It's just way cheaper to use another die if you are going to ship a substantial numbers of units.
Depends on how good/bad the yields are. If they can repurpose many chips and get them cheaper, it might be a good thing until the yields are better. Better than just wasting the resources and throw them into a trash can (which also costs the manifacturer money). With so many deactivated chip-parts it could happen that allmost all chips can be used in some way (if there isn't an error in another region of the chip).
 
which metric makes more sense in your opinion?
Wouldn't ray-tri be better than rays shot in terms of understanding how much performance is available? You can shoot rays into nothingness and never get a hit-return on your intersection.

Neither is all that useful. AMD's one is more revealing, but actual real world performance will depend very heavily on caching, which makes the raw performance number not mean much. The only actual way to compare is to measure performance against the same scene.

I have an idea on what accelerating BVH traversal does, but what is ray intersections?

You get a ray (origin+direction), and answer the question: "does this ray intersect with this polygon, if yes, at what point".
 
Do "disabled" circuits on a die still leak?

That's a good question! From what I've been able to gather from readin' things I think .... it depends.

For example, there used to be graphics cards with disabled CUs that you could flash to be the fully enabled version of the card. Whether it worked probably was another matter - if it was disabled to maintain market segmentation or stay within power it might. If it was genuinely a defective chip then oops, better hope you can flash it back. I think you could attempt this as late as some of the RX5xx parts.

As I don't think things like CUs can be individually gated, my guess is that the disabled elements in products like these were just constantly leaking as if they were idle.

In the case of parts with more complex or more targetted power gating, like a CPU, you're probably leaking a lot less with firmware disable parts. But probably still some. Again, there used to be some AMD CPUs that allowed you to attempt a bios reactivation of disabled cores. I think you could try this up to around the AMD Phenom 2. Not seen any talk of this for a long time though, so I think that's long in the past.

Fusing off or lasering off disabled parts seems to be common now, and in theory this should allow complete isolation of elements. Given how power conscious computing has become, hopefully this is what's done now. But I suppose you could just fuse or laser off whatever's needed to reactivate something and let the rest leak. Would seem silly though.

Sorry for the lack of a real answer. My speciality is vague general, answers (with errors).
 
Back
Top