Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
If they had introduced the Xbox One this year with 20CU, no Kinect and no TV I think many more would be happier.

Well, according to highly skilled technical Fellows (who assisted/provided DF with quite a few articles), more than 12 CU's would be un-balanced. The extra CU's would not lead to a noticeable performance increase and would be wasted, so to speak.

While I am no technical Fellow (rather far from it), I'd also say 20CU's would be wasted silicon. But that has more to do with limited DDR3 memory bandwidth, because I think most people around here (that arent technical Fellows) know that graphics are highly parallelizable. That is, if you have the bandwidth and the ROPS to spare ;)
 
Their comment about the system being more balanced with 12CUs takes into account using DDR3 bandwidth. If they were going with 20CUs they'd obviously have to go with more than their current DDR3 bandwidth as well as needing more ROPS.
 
Because otherwise it would just take regular CPU and memory ressources for moving data. But they talked so much about those "move engines" that would alleviate CPU tasks.
What?
The move engines basically do a DMA copy from one memory location to another. What CPU resources is that supposed to need? They autonomously read x bytes at location A and write the same x bytes at location B. That doesn't need 2.5MB of SRAM. Even the optional de-/compresssion is done on the fly, it only changes that x bytes are read and y bytes are written. As there is no reuse of data for the copy operation, cache would be wasted, some working memory/scratchpad storage is simply not needed for the purpose.
 
XB1 -IS- kinect.

Some people either refuse or can't accept that.

For those people, there is another console exactly matching their expectations. You think MS should remain a copy cat to Sony. MS thinks differently. Time will tell.
 
What?
The move engines basically do a DMA copy from one memory location to another. What CPU resources is that supposed to need? They autonomously read x bytes at location A and write the same x bytes at location B. That doesn't need 2.5MB of SRAM. Even the optional de-/compresssion is done on the fly, it only changes that x bytes are read and y bytes are written. As there is no reuse of data for the copy operation, cache would be wasted, some working memory/scratchpad storage is simply not needed for the purpose.

It would require some working area for both JPEG decompression and for tiling. But probably not 2MB.
 
What?
The move engines basically do a DMA copy from one memory location to another. What CPU resources is that supposed to need? They autonomously read x bytes at location A and write the same x bytes at location B. That doesn't need 2.5MB of SRAM. Even the optional de-/compresssion is done on the fly, it only changes that x bytes are read and y bytes are written. As there is no reuse of data for the copy operation, cache would be wasted, some working memory/scratchpad storage is simply not needed for the purpose.
Gipsel, I wonder if people can tell the exact type of eSRAM -is it eSRAM 6T? 8T?- from the pictures.

5 billion transistors is more that anyone can count but that would explain the amount of transistors and the size of the eSRAM in the APU.
 
It would require some working area for both JPEG decompression and for tiling. But probably not 2MB.
Tiling is basically just a swizzle of the target addresses (or the source addreses when reading from a texture), that needs no additional storage.
And the window size for the de-/compression was pretty small, iirc. I would consider this small buffer to be an integral part of the move engines themselves and probably so tiny one could hardly see it on the available pictures (if one would know where to look). You are completely right that it doesn't explain the 2.5 MB array.
Gipsel, I wonder if people can tell the exact type of eSRAM -is it eSRAM 6T? 8T?- from the pictures.

5 billion transistors is more that anyone can count but that would explain the amount of transistors and the size of the eSRAM in the APU.
From the published pictures? No. To tell that, one would need a look at the actual SRAM cells with an electron microscope. One would struggle to really see something on the level of individual transistors even with the best optical microscopes. And the published pictures are several magnitudes away from that. An indirect possibility would be to compare the area with known implementations on the same process. But there are also some uncertainties.
 
Last edited by a moderator:
Tiling is basically just a swizzle of the target addresses (or the source addreses when reading from a texture), that needs no additional storage.

While that's true, if you do it without some sort of cache, or destination buffer, it's going to be extremely inefficient in the way it hits the memory bus.

I'm not clear how ATI is doing tiling these days, but the buffer probably only has to be a few K at most, certainly doesn't account for the 2.5 MB.
 
While that's true, if you do it without some sort of cache, or destination buffer, it's going to be extremely inefficient in the way it hits the memory bus.

I'm not clear how ATI is doing tiling these days, but the buffer probably only has to be a few K at most, certainly doesn't account for the 2.5 MB.
I would go for 64 bytes, just what is in flight during a memory operation anyway, i.e. in some sense zero additional storage. ;)
Why should it become inefficient without cache? The address swizzling is completely deterministic (one knows beforehand which memory location to fetch next) and each address is only read once. It's not like swizzle access needs a smaller access granularity than the length of a burst memory transaction.
 
Because none linear access reading or writing from memory without a cache is more expensive on DDR like memories.

In fact the reason tiling is done is to allow the texture cache reads to make more linear requests from the underlying memory.
 
Because none linear access reading or writing from memory without a cache is more expensive on DDR like memories.

In fact the reason tiling is done is to allow the texture cache reads to make more linear requests from the underlying memory.
You can read/write it that way (linear) from/to the DDR3 also while copying it to/from eSRAM. But as the access granularity is 64 bytes anyway, the effect of a series of nonlinear bursts (each transferring 64 byte) shouldn't be that bad as long as they hit an open page even when doing the tile/untile operation on a DDR3 to DDR3 copy. The possible optimization of reordering the accesses would require a small buffer (What would be appropriate? 16 chunks of 64byte each would be almost generous I think), that is right. But no idea if the DMA engines contain such a buffer or if they just rely on the reordering capabilities of the memory controller itself.
 
Last edited by a moderator:
Gipsel, I wonder if people can tell the exact type of eSRAM -is it eSRAM 6T? 8T?- from the pictures.

5 billion transistors is more that anyone can count but that would explain the amount of transistors and the size of the eSRAM in the APU.

If you look at this link you can see the SEM images (poly level) for an AMD 6T-SRAM made on the TSMC HP 28nm process:

http://www.chipworks.com/en/technic...og/a-review-of-tsmc-28-nm-process-technology/

It is the 4th SEM image. That is the sort of image you would need to answer your question.

That image is from a de-layered RADEON 7970. It is 0.16 square microns in area. Now there is more than just that in the SRAM (block), however it is a starting metric. You can't just look at the area of the block and divide by the cell area, but you can start there especially if you have a picture of a similar large SRAM of known size on another TSMC 28nm chip (to compare ratios to). You would want to compare with a similar type/purpose SRAM if at all possible. You can look at images of processors (or the Jaguar cores on the PS4 and Xbox One SoC die, since they are known sizes) to get some rough density ideas but others on the forum can likely provide more information on how they vary in SRAM type and density relative to a general/location memory. I don't know the ratio of how much larger the L1 or L2 would be (or how much lower the density would be).

With enough effort one could verify that it indeed looks like 2x8MB (+? spares/redundancy) in both the top and bottom GPU SRAM blocks. If it looks like much more then that could get interesting. (I am not saying that it is, just implying (for fun) that the rumor people should do such math and comparisons if they are interested.)
 
Their comment about the system being more balanced with 12CUs takes into account using DDR3 bandwidth. If they were going with 20CUs they'd obviously have to go with more than their current DDR3 bandwidth as well as needing more ROPS.

I certainly meant a system that was balanced, meaning the other parts would need to be adjusted to match.
 
As far as tech goes, I think it would be a huge mistake to remove Kinect from the bundle. I'm not sure why other people struggle with the voice commands. My guess is calibration. Navigating by voice command is fantastic, and it would be a huge loss to the overall system to lose that.

I am curious if you felt the same way about Kinect back at the reveal? Or did you warm up to the Kinect after using the Xbox One console?

I do like the idea of voice commands, however I think 3 or 5 mics built into the front panel of the Xbox One could achieve that (with the directionality and noise rejection of the (DSP) beam steered/processed array) without 95% of the cost of the Kinect.

If they did that and added the camera parts later (and optionally) I would buy into it more easily.



It isn't that I don't like the technology. What I am concerned about is how moving such a large percentage of the BOM $ cost into the Kinect affected the rest of the design. If the cost had not been so high I would not suggest making it an optional accessory.

I think the Kinect 2.0 also had a second large cost beyond the dollars to make the Kinect. Specifically the IR scans are hot and the power supply circuitry inside it suggests that it consumes lots of power.

That might be another reason for a small (and lower power consuming GPU), and then the lower power DDR3 and then the eSRAM. The Kinect gobbled up the money and the power budget.
 
Regarding the mystical block. Is the DDR3 multi-ported? Otherwise 1920 * 1080 = ?
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top