Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
http://www.vgleaks.com/durango-memory-system-overview/

durango_memory.jpg




Much more at the link.

Hmmm....what does the vertex indices and commands mean?
 
VGleaks has also published this.

Interesting:

This diagram shows our prediction of the typical bandwidth for the north bridge clients and the typical available bandwidth for the GPU clients (which are shown in blue).

Let’s start by describing the CPU. Although each CPU module can request up to 20.8 GB/s of bandwidth for read and for write, the typical bandwidth you should expect for the CPU is 4 GB/s per CPU module per direction—about 16 GB/s altogether.

You can expect typical bandwidth to be around 3 GB/s per direction for the: audio, HDD, Camera, and USBs.

The Kinect Sensor is the main consumer of the bandwidth. For example, peak bandwidth to and from the HDD is only about 50 MB/s, so the HDD cannot be seen as a major bandwidth consumer.

Because the GPU is usually pushed to the maximum, you can expect typical coherent bandwidth to be about 25 GB/s. However, this amount depends on how many resources are made snoopable.

Currently, we are not able tell exactly how much of that access will be hitting the CPU’s caches and how much of the access much will go to DRAM. So as we said above, this figure is highly speculative at the moment.

The estimated 25 GB/s of bandwidth for coherent memory access does not account for the non-coherent memory access of the GPU.

The coherent bandwidth that can flow through the north bridge is a limited at 30 GB/s. Under typical conditions, this limit shouldn’t cause you problems. But during a high load on the coherent memory traffic, the north bridge might become saturated. Once the north bridge becomes saturated, you may notice increased latencies for memory access.

CPU memory access that is Write Combined does not fall under this limitation nor does GPU memory access that is non-coherent.

Finally let’s compute how much bandwidth is left for the non-coherent GPU access to consume. Let’s assume that:
•The sum of bandwidth from the north bridge to DRAM is 25 GB/s.
•Some portion of the GPU coherent bandwidth misses the L2 caches.
•Non-coherent CPU bandwidth is 3 GB/s.

This leaves 42 GB/s of DRAM bandwidth available to the GPU clients.
 
I thought you can't add bandwidth like that.
Bandwidth is complicated. It depends what you are trying to measure. According to the diagram, the maximum amount of BW available to GPU is 170 GB/s. In terms of peak usable BW, there are obviously factors at play, like components consuming some of that peak BW. But that's true for every box, and why we compare peak metrics in like-for-like comparisons.
 
Interesting:
Where are they getting their estimates from? These sorts of breakdowns, although good for educating readers about how BW consumption is spread throughout the system, kinda gloss over the complete flexibility the developers have. If some developers chooses to consumer all available CPU BW, they can, leaving less for the GPU, and they can have the CPU doing barely anything freeing up more for the GPU. That's why we list peak BW speeds, so devs know what resources they have ready to choose how to use them.
 
Well it has HDMI in, the vgleaks system diagram shows it. It will probably record games like PS4.

The inclusion of HDMI in could just be for the ability to do overlay the OS, game or app content on another source like your Cable or Satellite TV. Why does it have to be about recording? I just think it's a bit of reach. Microsoft is about IPTV now. Recording content from other devices just doesn't seem to fit their focus.

Tommy McClain
 
Hmmm....what does the vertex indices and commands mean? The article states: "Read bandwidth of the command buffer and index buffer is 4 GB/s." I guess what I am asking is the command and index buffer located in the DMEs? are they not supposed to be in the gpu itself?

I supposse those are the command queues from CPU to GPU via ACEs. So, data from CPU kernels to be computed in the GPU. The same as in PS4, but we do not know the number of ACEs. Standard in Southern islands is two.
 
But why is it being read from the DMEs section? Shouldn't it be in the gpu itself?

I think is a way to indicate it doesn´t travel through the ESRAM bandwidth, and that has its own path. Talking about this, where is the HSA memory management unit?. Is it the GPU memory system?. Or the move engines act like it?.
 
Last edited by a moderator:
Bandwidth is complicated. It depends what you are trying to measure. According to the diagram, the maximum amount of BW available to GPU is 170 GB/s. In terms of peak usable BW, there are obviously factors at play, like components consuming some of that peak BW. But that's true for every box, and why we compare peak metrics in like-for-like comparisons.

Interestingly enough the max read bandwidth is 170GB/s, the max write bandwidth for the GPU is 102GB/s.
 
Where are they getting their estimates from? These sorts of breakdowns, although good for educating readers about how BW consumption is spread throughout the system, kinda gloss over the complete flexibility the developers have. If some developers chooses to consumer all available CPU BW, they can, leaving less for the GPU, and they can have the CPU doing barely anything freeing up more for the GPU. That's why we list peak BW speeds, so devs know what resources they have ready to choose how to use them.

Specific numbers aside, it is obvious they CPU and other non-GPU subsystems are sharing the DDR bandwidth with the GPU and that the 32MB eSRAM is left dedicated to the GPU. Does this not point the architecture of the system around the eSRAM to being used like the eDRAM in the PS2/360? What was the point in all the low latency talk if the CPU cannot see or use the eSRAM?
 
Specific numbers aside, it is obvious they CPU and other non-GPU subsystems are sharing the DDR bandwidth with the GPU and that the 32MB eSRAM is left dedicated to the GPU. Does this not point the architecture of the system around the eSRAM to being used like the eDRAM in the PS2/360? What was the point in all the low latency talk if the CPU cannot see or use the eSRAM?

Low latency could help the GPU too. Look up ERP's posts on the subject.
 
Low latency never hurts. My point is the eSRAM integration looks designed around bandwidth. GPUs have L2 for latency issues.

wouldn't that pretty much require the framebuffer reside there then?

and i think erp's posts suggested the documentation suggests the frame buffer to reside in the ddr, thus treating the esram as a gpu cache. a totally different paradigm than the 360 edram.

but i could be totally off base.
 
wouldn't that pretty much require the framebuffer reside there then?

and i think erp's posts suggested the documentation suggests the frame buffer to reside in the ddr, thus treating the esram as a gpu cache. a totally different paradigm than the 360 edram.

but i could be totally off base.

The framebuffer in Durango is recommended by MS to lie on DDR3. ESRAM and L2 act like GPU scratchpads, to keep on saving the computed results to be computed again when necessary.
 
Status
Not open for further replies.
Back
Top