Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
VGleaks just simply didn't fully understand the information they were given?

Edit: Or what this fine chap below me suggests.
 
Last edited by a moderator:
Yeah that write speed of 102GB/s is interesting

Probably a ROP limitation
16ROPSx8(Color+Z)x800e6
is basically the bandwidth to the ESRAM
That would mean that the high 16x ZFill rate, and 16bit color is relying on compression to reach the peak, which is probably normal.
 
Probably a ROP limitation
16ROPSx8(Color+Z)x800e6
is basically the bandwidth to the ESRAM
That would mean that the high 16x ZFill rate, and 16bit color is relying on compression to reach the peak, which is probably normal.

You guys must crunch those 32 rops like doritos ;)
 
I'm not sure I'd put much stock into that. Some things look a bit questionable.

Take the bandwidth between the GPU and GPU Memory system for example. It can read at 170 GB/s but only write at 102 GB/s?

I'd suggest taking this with a big pile of salt. :)

Regards,
SB

Well it comes from vgleaks (I know since I rehosted it on imgur). The original image is here: http://www.vgleaks.com/wp-content/uploads/2013/02/memory_system.jpg

If we are going to believe the rest of their info it seems silly to throw this one at as inaccurate.

edit: I thought the speed difference might be due to an inability to write to both pools at once (102 would then be the ceiling), but I'm sure ERP's answer makes more sense
 
Last edited by a moderator:
Probably a ROP limitation
16ROPSx8(Color+Z)x800e6
is basically the bandwidth to the ESRAM
That would mean that the high 16x ZFill rate, and 16bit color is relying on compression to reach the peak, which is probably normal.

What do you make of the entire setup? Is there something that they could change / add to increase performance without needing to scale up the rest of the system and run into diminishing returns?

EDIT: I wonder were the Kinect MEC processor and SHAPE Audio block are.

http://www.vgleaks.com/wp-content/uploads/2013/01/durango_arq1.jpg

The old diagram was vague.

I guess they're off the southbridge if the newer diagram is accurate?
 
Last edited by a moderator:
If the latency can make such a dofference why didn't we see it on the 360 , after all the Xenos GPU should be worse at hiding latency then the rumored parts and yet it still didn't storm ahead of its desktop counterparts did it ? (I can't remember)

Further more why didn't it slaughter the ps3 for the entire generation ? A better architecture plus low latency ram that can make massive power increases.

Because the EDRAM in Xenos is not general purpose, and because it's a DX9 part, there's no GPGPU to speak of.

Gah... if we are talking about GPGPU type work this gen, then it's more a comparison between the VMX and Cell. The LocalStores have small latency (6 cycles constant), which indeed contributes to the SPUs' speed. But more importantly, it's achieved via high data locality.

As long as the developers structure their data and code properly, we should be hitting the CPU and GPU caches (or LocalStore) most of the time. GCN is more efficient than traditional VLIW units. The HSA architecture also minimizes/avoids unnecessary copying between the CPU and GPU, so most of the time should be spent on the jobs themselves.

For more random access type jobs, it will be more problematic. Nonetheless people have done tree walk and CABAC calculations on Cell's measly 256K LocalStore + DMA.

Durango and Orbis both have more lenient memory hierarchy. They should behave just fine, with reasonable efficiency, as long as the developers are diligent.

[size=-2]Please pay the developers and artists... They are the ones who will make the most differences.[/size]
 
What do you make of the entire setup? Is there something that they could change / add to increase performance without needing to scale up the rest of the system and run into diminishing returns?

EDIT: I wonder were the Kinect MEC processor and SHAPE Audio block are.

http://www.vgleaks.com/wp-content/uploads/2013/01/durango_arq1.jpg

The old diagram was vague.

I guess they're off the southbridge if the newer diagram is accurate?

I would assume that they are drawing these diagrams themselves from the info they have. Their document on the durango gpu has it accessing both memory pool at full speed.
 
I'm sure everyone has done the math by now, but 170GB/s means it can read both the ESRAM (102) and DDR3 (68) at their maximum speeds simultaneously.
So, only once the 68GB/s DDR3 fills the ESRAM/eDRAM can a burst of 170GB/s be achieved, right? Once that burst is done, it's back to 68GB/s, until the 32MB ESRAM/eDRAM is full, right?
 
I wouldn't assume that at all I'd say these diagrams are most likely from a developer presentation. Probably several developer presentations.
When you draw diagrams like this you are trying to give an overview, it's not intended to be physically accurate representative of the system.
 
I wouldn't assume that at all I'd say these diagrams are most likely from a developer presentation. Probably several developer presentations.
When you draw diagrams like this you are trying to give an overview, it's not intended to be physically accurate representative of the system.
If it's DDR3, what other choices are there? Unless it's stacked for greater than 68GB/s, that's the main memory bandwidth (only game in town), right? ESRAM/eDRAM can't make DDR3 give data faster than it's limit, can it?
 
If it's DDR3, what other choices are there? Unless it's stacked for greater than 68GB/s, that's the main memory bandwidth (only game in town), right? ESRAM/eDRAM can't make DDR3 give data faster than it's limit, can it?

Wasn't responding to your post, I was responding to the one above, I should have quoted it.
 
What about supercomputers? If SRAM can make a 7770 run like a 680gtx (the original point Love_In_Rio is defending), why isn't nVidia adding it to its Tesla range? Or are only MS able to see the benefits and nVidia's going to be kicking itself when they see the incredible performance Durango gets with such cheap silicon?

Its not cheap silicon. SRAM isn't inconsequential from a cost standpoint.

There wasnt a unified shader part in the PC space for a full year after the release of the 360. So MS doing something nVidia hasn't done is not new.

nVidia nor AMD are really pushing for a high performance low bandwidth part especially if its built around adding more silicon in an effort to alleviate the cost of memory. There is a reason the 360 daughter die has never been replicated on PC gpus. For the most part the cost of memory is absorbed by their clients as AMD and nVidia business revolves around selling chips not cards.

If you want a low bandwidth part from nVidia or AMD its going to come with rather subpar performance because for the most part no one expects high performance from cheap PC products. AMD apu perform pretty well from a lowend part but that seems more a consequence of AMD's desire to make their apu really relevant in the gpgpu space and that manufacturing small apus is better on their bottom line than putting out highend parts.

I think people seem to underestimate MS's focus on their gaming division. Looking at a list of MS's patents that have gaming implication will show you how much R&D they are putting towards this area. Furthermore, their efforts are broader than either nVidia or AMD as their focus is hardware while MS basically covers a broader spectrum of gaming. Their R&D department dwarfs either nVidia and AMD combined.
 
Supercomputing apps may be more concerned with solving large problems. They may be able to hide latency well.

They may want higher "density" too (more performance per watt, and more precision).

Efficiency is also important (e.g., least overhead).


Games devs need 360 to focus on graphics work. MS only used it for Kinect computation.

The VMXes are the main compute units.
 
Perhaps this is a silly question. Quick preface: Something doesn't wash here. Just a gut feeling from myself. Hard to explain. I just don't quite buy the assumptions that are being passed around to fill in the missing parts/ lack of detail in the VGleaks docs. Specifically I just don't buy that it's a fairly straightforward mod of a 7xxx.

Assuming that the 32mb of ESRAM is correct, how would anyone here change the GPU to take better advantage of that? MS seems to have spent a fair amount on that addition and the comments here seem to state it is not enough to make up for a lack main RAM bandwidth.

Maybe I am just in a state of shock. MS didn't miss a trick last time out. Something just doesn't feel right.
 
Status
Not open for further replies.
Back
Top