Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
There's definitely some difference in the CU layout -- mostly the space between the inner SIMD block and the shared instruction and constant caches, while the rest seems to be in order. But the shots' resolution is too low for any detailed analysis.
 
The question in my mind is whether the sram is part of a low latency data path between the cpu and the special processors that offload tasks from the cpu.

VGleaks stated that the gpu's bandwith is 10-15 GBs for cache hits and 30 GBs for cache misses. But that seems odd as I haven't read that AMD's gpus of their apus have the capability to avoid onion and directly address the cacheable portion of its system ram.

Is it possible that 30 GBs of coherent bandwith refers to this small pool of sram. The cpu can't read from local or uncacheable main memory worth a damn. So all data going over the NB from the io side would be limited to 10-15 GBs over onion/onion+ and destined to migrate into the cpu caches or through the cpu caches into the cacheable portion of ram.

It would seem rather backwards that MS would design Durango's memory system to push 30 GBs worth of data generated by the cpu to the gpu and the additional accelerators. Then hamper the modified data with 10-15 GBs on its return to the cpu.

Furthermore, the vgleaks states that the cpus can't access the gpu caches or esram which makes me doubtful that it can read and write into the memory of the other accelerators. So whats the point of offloading cpu tasks if you're going to force the accelerators to pull that data from offchip memory. Seems to me Durango would be better off with more cpu cores versus stacking a bunch of latency onto those task from data migrating to and from DDR3.

Its seems to me that that small amount of sram may be useful in providing a low latency way for the cpu and i/o devices to share data.
 
Last edited by a moderator:
There's definitely some difference in the CU layout -- mostly the space between the inner SIMD block and the shared instruction and constant caches, while the rest seems to be in order. But the shots' resolution is too low for any detailed analysis.
The SRAM banks of the register file and LDS got rotated by 90 degrees. A Durango CU is indeed slightly thinner and taller than an Orbis CU. But the area is about the same. May have been an optimization to fill the die more efficiently.
 
Looks like this particular part of the CU is the most diverse across all the GCN implementation out there -- from the original Tahiti going through Kabini and now between Durango and Orbis.
 
There's a fair amount of logic attached to this SRAM pool, with its very own clock generator:

nZJF32p.png

HDMI-In frame buffer?
 
Why does it have to be some secret feature they haven't talked about? Why can't it just be redundant memory?

Tommy McClain
 
Why does it have to be some secret feature they haven't talked about? Why can't it just be redundant memory?

Tommy McClain

Not saying it must be secret but it is very unlikely to be redundant memory because it is no where near the other two SRAM blocks. A redundant block would be in the same location, and usually it is not a "visible" (separate) block. Usually the block is "uniform" looking and the redundant portions are inside that one block. Chips layouts are not done like that.
 
Is it possible it is extra SRAM for tiling memory mapping and other things to avoid using either DDR or the 32MB SRAM set for GPU operations?

It looks like it's pretty far from the GPUs.

Why does it have to be some secret feature they haven't talked about? Why can't it just be redundant memory?

For the kind of memory pools that the chip uses, latency, and thus distance, matters very much. Because of how it's placed, it cannot be used as a redundant pool to replace any other part of the chip.

Also, it appears to have quite a bit of logic and a clock generator attached to it. So it does something on it's own.

My bet it's the shape audio block. Another possibility would be the memory pool for a security coprocessor -- putting all the security functionality including the memory inside the silicon chips guarantees that you can't mod them out.
 
How about the "move engines"? Those co-processors are supposed to deal with data flow management, so why not slap some scratch pad memory.

That would be my first guess too. To really help "move" data they would need some additionnal cache.
 
On top of that, the two Jaguar modules include 2 MB of L2 each (4MB total). What sense could it make to add an additional 2.5MB L3 cache shared between the two modules? That's probably something else, not an L3. Would it make sense to have a faster storage for page tables? Appears to be a bit excessive compared to small TLB caches to me, but who knows?

It would be an advantage when working with data accessed by both modules at once.
It would be unusual though to have less L3 cache than L2 cache.
 
Not saying it must be secret but it is very unlikely to be redundant memory because it is no where near the other two SRAM blocks. A redundant block would be in the same location, and usually it is not a "visible" (separate) block. Usually the block is "uniform" looking and the redundant portions are inside that one block. Chips layouts are not done like that.


It looks like it's pretty far from the GPUs.



For the kind of memory pools that the chip uses, latency, and thus distance, matters very much. Because of how it's placed, it cannot be used as a redundant pool to replace any other part of the chip.

Also, it appears to have quite a bit of logic and a clock generator attached to it. So it does something on it's own.

My bet it's the shape audio block. Another possibility would be the memory pool for a security coprocessor -- putting all the security functionality including the memory inside the silicon chips guarantees that you can't mod them out.

Awww. I seeee. Ok. Carry on...

Tommy McClain
 
Also, it appears to have quite a bit of logic and a clock generator attached to it. So it does something on it's own.

That clock generator would have been present with or without that SRAM block. It's in the northbridge for Orbis, and there's also one in the northbridge of Kabini.


I'm wondering if the mystery blocks that were drawn as if they were included with the Durango Jaguar L2s even more separate than their physical proximity would suggest.
They might be part of the uncore, and possibly work more with the GPU memory subsystem.

They might be logic blocks for the on-die crossbar to the memory controllers. There are two in Orbis that try to stick at the midline between the two memory interfaces, and Kabini has one stuck up in the corner next to its memory interface.

Per the Microsoft designers, Durango sports two semi-independent memory subsystems, DDR3 and the eSRAM that sit as endpoints of a crossbar. This means Durango has double the clients to hook into its crossbar, and the two pairs of blocks try to maintain symmetry with both the DDR3 and eSRAM.
Since the GPU is diagrammed as being the one consumer able to draw fully from both bandwidth sources, and it is more tightly integrated with the eSRAM, the four blocks at least spend most of their time servicing the GPU section. If two of them are for the eSRAM, they may be associated primarily with the GPU.
 
The small bit of sram would be strange to be a L3, its on the complete other side of the L2 and the memory controller. It does seem to connect to the gpu and the memory controller. There is also no way the DSP could be that big and need that much memory.

I have no idea really but maybe its the move engines but you would assume those would be closer to the larger esram blocks.
 
If the small SRAM pool is indeed attached to, what appears to be the system interconnect cross-bar, one of the ways it could be tied to the CPU clusters is to act as a directory cache. This way, it won't require any modification to the host architecture, unlike an L3 cache. But then again, the thing seems to be too large for such role, indexing only two puny L2 caches. It certainly involves other agents -- GPU, co-processors, etc.
 
Kinect 2 is by far the most advanced piece of technology in either system. It is state of the art. It is cutting edge.

But tech discussion has a habit of degenerating into comparisons of the relatively low end CPU and GPU of the Xbone compare to the relatively low end CPU and GPU of the PS4.
 
Kinect 2 is by far the most advanced piece of technology in either system. It is state of the art. It is cutting edge.

But tech discussion has a habit of degenerating into comparisons of the relatively low end CPU and GPU of the Xbone compare to the relatively low end CPU and GPU of the PS4.

Its functionality is not valued the same by everyone and neither is the experience consistent with everyone. People buy consoles to play games so thats what they are going to talk about.
This thread is related mostly to performance and not with hardware that provide extra functions. Its implementation in games is very limited and questionable too at this point
 
Perhaps Ahead Of The Times

Kinect 2 is by far the most advanced piece of technology in either system. It is state of the art. It is cutting edge.

But tech discussion has a habit of degenerating into comparisons of the relatively low end CPU and GPU of the Xbone compare to the relatively low end CPU and GPU of the PS4.

It is fascinating technology and I am sure to enjoy digging through the tear down. It does feel and look like the future, at least one idea of it.

That said I never bought the original Kinect and don't think I would be caught dead in front of one (at least for games). I could see the voice control for non-gaming, but later (just a personal choice, sure many enjoy it now).

I think MS should have played the Kinect 2.0 card the way they played the Kinect (original). Introduce it 3-4 years later as an optional device, and introduce it as a later generation sales booster. The sales booster effect worked well with the original Kinect and was bought by, guess what, people who either actually wanted it or were willing to pick it up in an optional bundle.



I think MS is doing neat stuff but I don't think they played their quite hand right. Yet it still might work out nicely.

If they had introduced the Xbox One this year with 20CU, no Kinect and no TV I think many more would be happier. Then a year later quietly roll out the TV in O/S updates (with the bugs worked out and providers on board). [I think the 360 evolution/OS updates over the years were quite successful.] Then another couple of years introduce parallel bundles with optional Kinect 2.0. The at that point both the console and the Kinect would be cost reduced.



In the end I think they would actually achieve more of their TV and Kinect goals. I think the argument is overly simplistic to say that you need to include the Kinect to avoid fragmentation. That is true but only as far as it goes, but there is far more to the picture than that. I think they are trying to stubbornly achieve their goals as opposed to being more patient and clever about it. I think they can hurt their goals more by alienation and backing off on the core oriented hardware choices.

Shipping every box with Kinect 2 and then almost every launch game without "real" Kinect support/need/real integration is a misstep IMO. The two just don't go together, yet the second was entirely predictable. I really wonder who at MS is the big champion of Kinect? It is the favorite tech/project of one or more high ups or what?



But who knows, I don't think engineers looking at tear downs, or brand fans or even people in the industry are very good at predicting where something will go. With terrible hardware and mistakes it is possible to do very well with the fan base, brand and patient effort. And the opposite can be made true. And in this generation none of the hardware choices look like the terrible category at all. Questionable and/or unfortunate choices perhaps but overall perhaps the worst that can be said is going to the fairly low power/low end route out of the gate.
 
As far as tech goes, I think it would be a huge mistake to remove Kinect from the bundle. I'm not sure why other people struggle with the voice commands. My guess is calibration. Navigating by voice command is fantastic, and it would be a huge loss to the overall system to lose that.
 
Status
Not open for further replies.
Back
Top