Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
I must say i like the xbox architecture design. It is not simple but seems elegant. I supposse it fills my exhotic hardware itch. Although i would like its performance to be good ( and believe to those who claim its efficiency will reach the 3 tflops of other gpus as i think we need heavy players in the graphics world to mantain this hobby as we like it ) I am afraid this time Sony could be left a little alone in the graphic whoreness department...
 
ERP - would it make any sense to have the ESRAM be a large LLC, perhaps with extensions to make parts of it useable as a scratch pad if desired, and with the CPU/GPU communicating through it? In other words, something along the lines of how Intel interfaces it's GPU and CPUs... or as a dev would you prefer manual management?
 
With a 32 MB pool and 64B lines, that would require 512K tag entries.
Depending on the arrangement, it's easy to require over a MB in cache tags alone.
That would be atypical for a non-server chip (a high-end one at that), but this is potentially an unusual situation.
 
Do we expect the kinect port to be a custom thing, or could it be a simple dedicated USB 3.0 that was modified to provide enough current? I was thinking maybe they decided they could do a much better job with something custom, so they could enable longer wire lengths, and maybe even help lower the latency. USB is a messy protocol :p
 
That's what I would imagine as well. Except in this case it would likely be USB 3.0 combined with increased power delivery using a different physical port so you don't accidentally plug in something else. :)

Regards,
SB
 
With a 32 MB pool and 64B lines, that would require 512K tag entries.
Depending on the arrangement, it's easy to require over a MB in cache tags alone.
That would be atypical for a non-server chip (a high-end one at that), but this is potentially an unusual situation.

MRTs and texture data should have reasonable spatial coherence, so you can probably us a longer cache line length without losing too much efficiency.

CPU code would have to be optimized for this cache line size; Data structures at, or just below, 512 bytes, aligned on 512 byte boundaries.

32MB LLC with 512 byte lines with 8 sectors would use 512KB for tags.

Edit: I could imagine controls for locking down fractions of the LLC for ROP, texture or CPU use.

Cheers
 
That most likely results in higher power and heat.

this is surely taken in account by amd and microsoft, the main goal is a balanced machine
I think that this heat is overstimate in comparison to cpu, gpu and dissipation capacity
 
Well, Arthur Gies in neogaf comments that the improvement in efficiency doesn´t come from the ESRAM, so if true forget about the low latency. He says it comes from the way the GPU simds are managed and from real-time asset compression/decompression(?¿?). Wasn´t GCN architecture suppossed to greatly increase the efficiency of the vector units?. How could this be increased even more?.

Suppossing all of this has any sense:

Would it be possible to make the GPU out-of-order and capable of execute the wavefront instructions not in-order?. ( If then there is a block inside the GPU that is not shown in the leaked graphic ).

Asset compression/decompression could refer to increasing the bandwitdth to memory pools in practice?.
 
Last edited by a moderator:
this is surely taken in account by amd and microsoft, the main goal is a balanced machine
I think that this heat is overstimate in comparison to cpu, gpu and dissipation capacity

Sure...mabe its Thyristor (t-ram). Thats what global foundries has been working on.
 
Could the listed GPU specs be the 'exposed' GPU not the actual GPU? I was thinking that if they have 3GB supposedly set aside for various agendas and all or part of 2 cores as well, they could have also partitioned off part of the GPU as well for processing purposes. Perhaps the GPU really has 14 or 16 CUs but they are reserving some for Kinect?
 
Well, Arthur Gies in neogaf comments that the improvement in efficiency doesn´t come from the ESRAM, so if true forget about the low latency. He says it comes from the way the GPU simds are managed and from real-time asset compression/decompression(?¿?). Wasn´t GCN architecture suppossed to greatly increase the efficiency of the vector units?. How could this be increased even more?.
Vector length is still quite long, and that can make it a poor fit for problems with naturally smaller granularity. It's also more likely the longer the SIMD vector that branch divergence can affect performance, and more complex code can increase the number of paths a single wavefront will need to loop over.
That's an area that could stand for improvement, although a number of the possible fixes like varying the vector length or coalescing divergent threads are significant modifications to the hardware.
At the very least, there's more storage nearby to play with, possibly.

At a higher level, there may be changes in how wavefronts are scheduled and sent to the CUs. The handoff from the front end to the CU arrays seems to be one area that could improve, as well as conflicts over common global resources like the GDS.

Would it be possible to make the GPU out-of-order and capable of execute the wavefront instructions not in-order?. ( If then there is a block inside the GPU that is not shown in the leaked graphic ).
At least with GCN, this is going to run into measures already defined to provide some parallelism.
By default, the vector memory instructions and the throughput design of the CU are already very good at generating memory traffic--one of the primary benefits of OoO, and the ISA provides a limited form of software-guided runahead for exports and memory operations. For example, up to 16 memory instructions can be fired off before the wavefront has to stall.
These counters may be less effective or broken if the CU goes out of order, since the compiler's statically determined wait counts are based on sequential issue and completion of the instruction stream.
 
Could the listed GPU specs be the 'exposed' GPU not the actual GPU? I was thinking that if they have 3GB supposedly set aside for various agendas and all or part of 2 cores as well, they could have also partitioned off part of the GPU as well for processing purposes. Perhaps the GPU really has 14 or 16 CUs but they are reserving some for Kinect?
Even if this is true, it would be mostly irrelevant for purposes of game performance.
 
Status
Not open for further replies.
Back
Top