Xbox One (Durango) Technical hardware investigation

Betanumerical · Nov 27, 2013

liquidboy said:
In relation to the sRAM sitting between the 2 jaguar modules ...

At the past Hotchips we saw IBM's Power8 use an L3 cache that sits between the cores..

Also in Kaveri we see L3 sRAM cache making an appearance ...

It would appear at least to me that the SRAM is connected to the GPU and not the CPU.

Cjail · Nov 27, 2013

Betanumerical said:
It would appear at least to me that the SRAM is connected to the GPU and not the CPU.

Well the VGleaks schemes said ESRAM was not accessible by the CPU.

Betanumerical · Nov 27, 2013

Cjail said:
Well the VGleaks schemes said ESRAM was not accessible by the CPU.

The small block is what I'm talking about.

liquidboy · Nov 27, 2013

Betanumerical said:
It would appear at least to me that the SRAM is connected to the GPU and not the CPU.

im talking about the SRAM sitting between the jaguars

Betanumerical · Nov 27, 2013

liquidboy said:
im talking about the SRAM sitting between the jaguars

As am I, to me it seems to be going towards the GPU more then anything more, but then again I don't know a lot about this kinda stuff.

fellix · Nov 27, 2013

How about the "move engines"? Those co-processors are supposed to deal with data flow management, so why not slap some scratch pad memory.

Cyan · Nov 27, 2013

liquidboy said:
In relation to the sRAM sitting between the 2 jaguar modules ...

At the past Hotchips we saw IBM's Power8 use an L3 cache that sits between the cores..

Also in Kaveri we see L3 sRAM cache making an appearance ...

That is my theory too, that it is some kind of L3 cache for the CPU.

A fellow forumer hinted at the fact that the Xbox One CPU has a 50% faster access to its cache, but if that small amount of eSRAM is L3 cache perhaps he meant another cache type.

Just a guess...

Cyan · Nov 27, 2013

Anandtech wrote a small article on the SoC design, mostly focused on the GPU and its two redundant CUs.

http://www.anandtech.com/show/7546/chipworks-confirms-xbox-one-soc-has-14-cus

Microsoft claims it weighed the benefits of running 12 CUs (768 cores) at 853MHz vs. 14 CUs (896 cores) at 800MHz and decided on the former. Given that the Xbox One APU only features 16 ROPs and ROP performance scales with clock speed, Microsoft likely made the right decision. Thermal and yield limits likely kept Microsoft from doing both - enabling all CUs and running them at a higher frequency. Chances are that over time Microsoft will phase out the extra CUs, although it may take a while to get there. I'm not sure if we'll see either company move to 20nm, they may wait until 14/16nm in order to realize real area/cost savings which would mean at least another year of shipping 14/20 CU parts at 28nm

Betanumerical · Nov 27, 2013

To people who think this is something which will have large performance ramifications, why haven't Microsoft mentioned it yet?. They have been pimping every part of the SoC that they can and yet they have made no mention of this, why leave it out. If they made changes to the CPU which would give them a further performance advantage in part of the system which is largely believed to be under powered. I think it would be mentioned.

dobwal · Nov 27, 2013

Betanumerical said:
To people who think this is something which will have large performance ramifications, why haven't Microsoft mentioned it yet?. They have been pimping every part of the SoC that they can and yet they have made no mention of this, why leave it out. If they made changes to the CPU which would give them a further performance advantage in part of the system which is largely believed to be under powered. I think it would be mentioned.

Given that literally no one here has made such an assertion, whats your point?

MS hasn't really described their memory system in depth. Outside of esram, cpu/gpu caches and some small pools here and there, MS has never bother to explain it 47 mb figure.

It seems like this image has triggered a pure curious mode for most of us.

If anything your statement is likely to act as a catalyst to drag this moment into the dirt.

Betanumerical · Nov 27, 2013

dobwal said:
Given that literally no one here has made such an assertion, whats your point?

MS hasn't really described their memory system in depth. Outside of esram, cpu/gpu caches and some small pools here and there, MS has never bother to explain it 47 mb figure.

It seems like this image has triggered a pure curious mode for most of us.

If anything your statement is likely to act as a catalyst to drag this moment into the dirt.

We already had claims of it being secret sauce and a L3 cache for the CPU. So I think its been mentioned more then once. Microsoft have been touting everything they have as an advantage. The fact that it is not mentioned makes me think it has no large performance benefit and is instead a implementation detail of something (DME's, SHAPE, etc).

McHuj · Nov 27, 2013

fellix said:
How about the "move engines"? Those co-processors are supposed to deal with data flow management, so why not slap some scratch pad memory.

I would expect those to sit between the memory controllers and the two big SRAM's. For all the decode and encode (JPEG and LZ7&), you'd want some scratch space.

From the Hot chips presentation slide #2, it does look like the AV in comes straight into the SOC and then out of the SOC. In slide #3, the AV In and AV Out interface with the Audio DMA so that implies some sort of additional memory other than the coherent access.

I could be completely wrong, but it seems to me that this could be a scratch pad for the scaling of the frame buffer, merging of display planes and the tv input, along with mixing of the audio in from the TV. In theory, this could be done at a lower power state as well with the most of the GPU and other SRAMs turned off.

fellix · Nov 27, 2013

Well, I haven't heard of Jaguar being designed with an L3 in mind. This is, after all, an ultra mobile architecture that already spots a big shared last-level cache (the L2) for each module. The whole thing is probably licensed as a monolithic hard macro for direct SoC-type implementation without much to fiddle with.

Gipsel · Nov 27, 2013

On top of that, the two Jaguar modules include 2 MB of L2 each (4MB total). What sense could it make to add an additional 2.5MB L3 cache shared between the two modules? That's probably something else, not an L3. Would it make sense to have a faster storage for page tables? Appears to be a bit excessive compared to small TLB caches to me, but who knows?

dobwal · Nov 27, 2013

Betanumerical said:
We already had claims of it being secret sauce and a L3 cache for the CPU. So I think its been mentioned more then once. Microsoft have been touting everything they have as an advantage. The fact that it is not mentioned makes me think it has no large performance benefit and is instead a implementation detail of something (DME's, SHAPE, etc).

I took it as a secret sauce joke and 2.5 MB L3 doesnt equal large performance ramification. MS and Sony barely mention each other when touting their consoles and do so only when someone else provides the context.

MS has made a concerted effort to remain vague. And the sram has been mentioned in the 47 MB of Sram given by MS. Outside of the core logic, what we know is built from tidbits of info that still leaves us with a rather gaping hole of what functions the modification provides.

Ceger · Nov 27, 2013

dobwal said:
I took it as a secret sauce joke and 2.5 MB L3 doesnt equal large performance ramification. MS and Sony barely mention each other when touting their consoles and do so only when someone else provides the context.

MS has made a concerted effort to remain vague. And the sram has been mentioned in the 47 MB of Sram given by MS. Outside of the core logic, what we know is built from tidbits of info that still leaves us with a rather gaping hole of what functions the modification provides.

Is it possible it is extra SRAM for tiling memory mapping and other things to avoid using either DDR or the 32MB SRAM set for GPU operations? Something undisclosed for HSA purposes?

3dilettante · Nov 27, 2013

Gipsel said:
On top of that, the two Jaguar modules include 2 MB of L2 each (4MB total). What sense could it make to add an additional 2.5MB L3 cache shared between the two modules? That's probably something else, not an L3. Would it make sense to have a faster storage for page tables? Appears to be a bit excessive compared to small TLB caches to me, but who knows?

Possibly a buffer for data coming from Kinect? It beats routing it through the PCIe interface, then to RAM, then back again.

Another idea is that it's sized large enough to move pages in and out of the eSRAM in large chunks without going off-chip. If it's over 2MB, then it covers x86 pages that aren't too big to fit on-die, and the size of PRT tiles.
The various on-chip processing blocks could pass around pages through a northbridge scratch space instead of modifying multiple memory pipelines that aren't designed to talk to each other.

Cyan · Nov 27, 2013

3dilettante said:
Possibly a buffer for data coming from Kinect? It beats routing it through the PCIe interface, then to RAM, then back again.

Another idea is that it's sized large enough to move pages in and out of the eSRAM in large chunks without going off-chip. If it's over 2MB, then it covers x86 pages that aren't too big to fit on-die, and the size of PRT tiles.
The various on-chip processing blocks could pass around pages through a northbridge scratch space instead of modifying multiple memory pipelines that aren't designed to talk to each other.

Extreme Tech wrote what seems to be the most complete article on the matter. I don't know what you or Gipsel think but it is quite interesting nonetheless.

There are some very curious findings in there.

http://www.extremetech.com/gaming/1...ered-reveals-sram-as-the-reason-for-small-gpu

3dilettante · Nov 27, 2013

Cyan said:
Extreme Tech wrote what seems to be the most complete article on the matter. I don't know what you or Gipsel think but it is quite interesting nonetheless.

There are some very curious findings in there.

http://www.extremetech.com/gaming/1...ered-reveals-sram-as-the-reason-for-small-gpu

I'm not sure, but the 47MB figure didn't say it was only for caches. The author seems to have forgotten the nearly 5MB of register file and LDS in the GPU, and some ancillary caches.
The mystery SRAM block doesn't need to be 10 MB.

I'm again not sure that the Durango picture's labelling is accurate. The lines are not drawn consistently between the two APU pictures.
The author's mention of the possibility of photoshopping the picture of a chip their business revolves around revealing seems weird.

I'm also not sure there are no analagous structures in the PS4.
On Orbis, look at the very far left next to the C in "Controller", and the very far right about a quarter of the way from the top.
There are at least two blocks that look a bit like the mystery x86 blocks in the Durango shot.
This, coupled with the possibility that the Xbox diagrammer incorrectly included a swath of silicon from the uncore, makes me think these might not be changes to the x86 blocks specifically.
There are potentially more in Durango, but they might not be absent in Orbis.

For certain functions related to cross-die communication, blocks can be placed in different places to optimize for different constraints, like space utilization.

Cyan · Nov 27, 2013

3dilettante said:
I'm not sure, but the 47MB figure didn't say it was only for caches. The author seems to have forgotten the nearly 5MB of register file and LDS in the GPU, and some ancillary caches.
The mystery SRAM block doesn't need to be 10 MB.

I'm again not sure that the Durango picture's labelling is accurate. The lines are not drawn consistently between the two APU pictures.
The author's mention of the possibility of photoshopping the picture of a chip their business revolves around revealing seems weird.

I'm also not sure there are no analagous structures in the PS4.
On Orbis, look at the very far left next to the C in "Controller", and the very far right about a quarter of the way from the top.
There are at least two blocks that look a bit like the mystery x86 blocks in the Durango shot.
This, coupled with the possibility that the Xbox diagrammer incorrectly included a swath of silicon from the uncore, makes me think these might not be changes to the x86 blocks specifically.
There are potentially more in Durango, but they might not be absent in Orbis.

For certain functions related to cross-die communication, blocks can be placed in different places to optimize for different constraints, like space utilization.

Thanks for the explanation, 3dilettante. That might explain their theory about the cross-die communication which to me was one of the most interesting things they mentioned in the article. Guess they could be mistaken there... -it sounded somewhat plausible but still-.

They also managed to interpret the size of the CUs and I thought that the CUs were the same size in both consoles since they were longer and thinner in a console whereas they are wider but not as long in the other. It turns out that's not the case.

Fascinating stuff, that's more than I can make out of those pictures.

Xbox One (Durango) Technical hardware investigation

Betanumerical

Cjail

Fool

Betanumerical

liquidboy

Betanumerical

fellix

Cyan

orange

Cyan

orange

Betanumerical

dobwal

Betanumerical

McHuj

fellix

Gipsel

dobwal

Ceger

3dilettante

Cyan

orange

3dilettante

Cyan

orange

Similar threads