Xbox One (Durango) Technical hardware investigation

bkilian · Oct 6, 2013

Betanumerical said:
They never wen't with Yukon, we never got a leak early about the the roadmap for Durango, which is what they are commenting on.

Actually, they did. Yukon and Durango are the same project. Even when I left, I was still checking code into the Yukon tree. In the early days of the 360, the vision doc was called "The book of Xenon". The equivalent doc for the X1 was called "The book of Yukon"

Betanumerical · Oct 6, 2013

bkilian said:
Actually, they did. Yukon and Durango are the same project. Even when I left, I was still checking code into the Yukon tree. In the early days of the 360, the vision doc was called "The book of Xenon". The equivalent doc for the X1 was called "The book of Yukon"

Interesting I always assumed they were seperate projects. Its strange it changed so much over time.

bkilian · Oct 6, 2013

Betanumerical said:
Interesting I always assumed they were seperate projects. Its strange it changed so much over time.

Why is that strange? For one, the leaked doc was an early vision doc, not a technical specs doc, and for another, the design changed a number of times as things firmed up in talks with suppliers and the software team did exploratory proof-of-concepts.

Betanumerical · Oct 6, 2013

bkilian said:
Why is that strange? For one, the leaked doc was an early vision doc, not a technical specs doc, and for another, the design changed a number of times as things firmed up in talks with suppliers and the software team did exploratory proof-of-concepts.

Well the fact that they ditched hardware B/C and a bunch of other stuff is what I found as strange.

bkilian · Oct 6, 2013

Betanumerical said:
Well the fact that they ditched hardware B/C and a bunch of other stuff is what I found as strange.

Standard cost/benefit analysis on that one, I'm afraid. The added cost over the life of the console for a feature that is only really useful or popular in the first year is a bit of a no-brainer. Look at what Sony did for the PS3, for instance. The feature got removed pretty quickly.

taisui · Oct 6, 2013

3dilettante said:
I think the implementation is more that the GPU sends a write over the Onion bus, and the interface logic itself or in conjunction with the request queue handles inserting it into the order of requests in the coherent hierarchy.
A broadcast of invalidations would happen once the write gets to this point, and any future coherent traffic is going to get on the queue as well. All other attempts to use that cache line are going to hit the queue, until it writes to memory and main memory becomes the final arbiter.
.

I'm curious on how it would affect the games in practice.

Take physics/particles for example, CPU preps and batch the data structs
GPU takes the coherent memory address, run simulation on the data
GPU takes the prior result and render output.

Even if CPU needs to read back the result from the coherent memory, what sort of realistic impact is there for the GPU having to flush?

Betanumerical · Oct 6, 2013

taisui said:
I'm curious on how it would affect the games in practice.

Take physics/particles for example, CPU preps and batch the data structs
GPU takes the coherent memory address, run simulation on the data
GPU takes the prior result and render output.

Even if CPU needs to read back the result from the coherent memory, what sort of realistic impact is there for the GPU having to flush?

If I am remembering correctly a cache flush on it flushes out everything, graphics and GPGPU cachelines / work, read only work, texture caches, etc.

You have to read in everything back from memory. If your flushing entirely to the eSRAM at a full 512KB this will take ~4096 cycles.

adev · Oct 6, 2013

taisui said:
I'm curious on how it would affect the games in practice.

Take physics/particles for example, CPU preps and batch the data structs
GPU takes the coherent memory address, run simulation on the data
GPU takes the prior result and render output.

Even if CPU needs to read back the result from the coherent memory, what sort of realistic impact is there for the GPU having to flush?

I think it's going to be generally bad to use the same memory on the CPU and GPU at the same time on Xbox One.

taisui · Oct 6, 2013

adev said:
I think it's going to be generally bad to use the same memory on the CPU and GPU at the same time on Xbox One.

Do elaborate?

adev · Oct 6, 2013

taisui said:
Do elaborate?

You're going to want to minimise cache contention as much as possible for pure performance reasons.

The coherency is good in that you can read and write to the same memory on both the CPU and GPU and guarantee that it will be "correct" but accessing it at exactly the same time on both would still be expensive. Interleaving access would likely be faster.

In Beta's terms the GPU can't "write" coherently technically, it just gets told to flush when the CPU wants to access to data it has in it's cache.

I may have been wrong earlier when I said the GPU can invalidate CPU cache. I'll udate when I know for sure. That was my understanding but it could be wrong.

taisui · Oct 6, 2013

adev said:
You're going to want to minimise cache contention as much as possible for pure performance reasons.

The coherency is good in that you can read and write to the same memory on both the CPU and GPU and guarantee that it will be "correct" but accessing it at exactly the same time on both would still be expensive. Interleaving access would likely be faster.

In Beta's terms the GPU can't "write" coherently technically, it just gets told to flush when the CPU wants to access to data it has in it's cache.

I may have been wrong earlier when I said the GPU can invalidate CPU cache. I'll udate when I know for sure. That was my understanding but it could be wrong.

IMO the right way to use it in to interleave the access pattern. The advantage should be that you don't need to copy/stage the data for the GPU because of the coherent memory, not sure how the sync happens, but I imagine jobs being done in batches in high latency.

pjbliverpool · Oct 6, 2013

astrograd said:
No. You can see in the Yukon leak (which predates the commentary of Baker in the article btw, Yukon was from mid-2010) that they wanted either 32MB of eDRAM or eSRAM back when they had 4GB of DDR4 RAM as the plan.

Ok change the 8GB to 'a large pool of main memory' if you wish. It doesn't change the point of my post though. They wanted DDR as opposed to GDDR for the main memory for power and cost reasons and so esram was also included to supplement the bandwidth.

Low latency does not appear to have been the driving advantage behind its inclusion and from the statements in the article there's at least reasonable reason to doubt whether it will have any major latency based advantages at all. Otherwise this surely would have been touched upon given that the question presented the perfect opportunity to do so.

Shifty Geezer · Oct 6, 2013

I agree with that assessment, pjbliverpool. The RAM choice was entirely for cost and power draw reasons, and the upshot of ESRAM is only regards those aspects. There's no performance advantage from latency that the design gets from this choice. So the trade off appears to be price+heat gains for added development complications.

Airon · Oct 6, 2013

Shifty Geezer said:
I agree with that assessment, pjbliverpool. The RAM choice was entirely for cost and power draw reasons, and the upshot of ESRAM is only regards those aspects. There's no performance advantage from latency that the design gets from this choice. So the trade off appears to be price+heat gains for added development complications.

I see them more as different solutions x the same need: large pool memory + high bandwith.

Sony has made a choice and a bet.

MS just follow their X360 experience with ESRAM.
It is quite telling that we know that Esram/edrams was there from the beginning. It could be possible or reasonable to have esram + gddr5 ? I suspect they never consider gddr5 as a solution.

In the end is a different solution that allow X1 to have an high bandwith (with higher peak than the competitor) , with an esram that this time around allow x more creative uses x developers, and and a large memory pool of ddr3 that is better suited x CPU. Because MS (as Sony) has high concern for the CPU. Without forgetting power and cost.

Are you really sure that considering each platform as a whole one solution is incredibly better than the other?
To me, x ex., it seems that the 2 platforms could be bandwith bottlenecked in a very similar way.

Shifty Geezer · Oct 6, 2013

Airon said:
Are you really sure that considering each platform as a whole one solution is incredibly better than the other?

I never said anything about anyone having a better solution. I never even compared solutions. I never even said it was a bad solution. Every engineering decision has to prioritise, and MS prioritised cost and power draw, and perhaps a little extra peak BW over what they could realistically target with GDDR5, over ease of use. Simple an observation without comparison, and an identifying that a previous theory that the choice of ESRAM included a performance interest due to low latency is pretty much proven invalid.

This isn't a versus thread and Sony's choices are immaterial, save as proof that there was another option available (not that such proof is necessary).

Cranky · Oct 6, 2013

Shifty Geezer said:
I agree with that assessment, pjbliverpool. The RAM choice was entirely for cost and power draw reasons, and the upshot of ESRAM is only regards those aspects. There's no performance advantage from latency that the design gets from this choice. So the trade off appears to be price+heat gains for added development complications.

Well they also ended up with, according the DF article, a 45% bandwidth advantage over the alternative memory configuration (200gbs/(172*0.8)) in typical use cases. it may have been a case of having their cake and eating it as well.

Strange · Oct 6, 2013

Cranky said:
Well they also ended up with, according the DF article, a 45% bandwidth advantage over the alternative memory configuration (200gbs/(172*0.8)) in typical use cases. it may have been a case of having their cake and eating it as well.

What????

How did you come up with these numbers?

Why would you put a 0.8 coefficient before the 172 GB/s figure?

french toast · Oct 6, 2013

Cranky said:
Well they also ended up with, according the DF article, a 45% bandwidth advantage over the alternative memory configuration (200gbs/(172*0.8)) in typical use cases. it may have been a case of having their cake and eating it as well.

Although they dont make a direct case for latency benefits, they do st least mention it a couple of times in the interview..could it be they didnt prioritise latency as bandwidth is their main concern...but an upshot of that decision is the much lower latency...could the reason for them not going into detail about those benefits is because they have simply not looked into it?

Ceger · Oct 6, 2013

Strange said:
What????

How did you come up with these numbers?

Why would you put a 0.8 coefficient before the 172 GB/s figure?

because the equation is 80% of a peak bandwidth is what is attainable in real world as the average in most cases (as MS asserts and makes sense until proven otherwise). The numbers they gave were on such measures between the ESRAM and DDR3 (80% of bandwidth) which they say has been proven with actual real code, not tests.

So Cranky was using that to make comparative points. If you want to leverage the full GGDR5 bandwidth, then make sure to apply full ESRAM and DDR3 bandwidth as well. One party does not get an exclusion while the other doesn't.

And please do not bring up the indie dev that said they got the high end of bandwidth on the other console as no information has been provided as to what exact code or such was used to leverage that. I am sure any dev can write code to maximize bandwidth, but it doesn't represent real world application.

zupallinere · Oct 6, 2013

Whereas we've said that we find it very important to have bandwidth for the GPGPU workload and so this is one of the reasons why we've made the big bet on very high coherent read bandwidth that we have on our system.

I actually don't know how it's going to play out of our competition having more CUs than us for these workloads versus us having the better performing coherent memory.

expletive said:
Is this the ESRAM 109GB/s figure or the 30GB/s figure or something else?

Something I'd like to know to. Other presentations discuss 47 MB of coherent memory IIRC but that doesn't say anything about bandwidth just size ( which is biggest for a console or the like ). The MS guy makes the comparison and states the big bet on coherent memory speed but what exactly is that metric ?

Xbox One (Durango) Technical hardware investigation

bkilian

Betanumerical

bkilian

Betanumerical

bkilian

taisui

Betanumerical

adev

taisui

adev

taisui

pjbliverpool

B3D Scallywag

Shifty Geezer

uber-Troll!

Airon

Shifty Geezer

uber-Troll!

Cranky

Strange

french toast

Ceger

zupallinere

Similar threads