Xbox One (Durango) Technical hardware investigation

Airon · Oct 6, 2013

Shifty Geezer said:
I never said anything about anyone having a better solution. I never even compared solutions. I never even said it was a bad solution. Every engineering decision has to prioritise, and MS prioritised cost and power draw, and perhaps a little extra peak BW over what they could realistically target with GDDR5, over ease of use. Simple an observation without comparison, and an identifying that a previous theory that the choice of ESRAM included a performance interest due to low latency is pretty much proven invalid.

This isn't a versus thread and Sony's choices are immaterial, save as proof that there was another option available (not that such proof is necessary).

Shifty Gear you just concentrate on the 2 last lines of my post.

To me, they go with esram in order to capitalize the experience done with x360 (their experience and developers experience). And, again, esram was there from the beginning.
With this assumption do you see Esram/dram + gddr5 reasonable? It is a honest question, I ask you.

Regarding latency topic, to me, it dwell more with ddr3 and it will be more relevant for the CPU.

Solarus · Oct 6, 2013

theres one thing im confused on. is the 10% gpu reserve just for the os or is kinect a part of that too? i know they said they use GPGPU for kinect, but does kinect have its own resources that it pulls? Has ms ever given a figure as to how much those resources are? kinect 2.0 has its own processor now correct? so is it kinect cpu + XBO gpus gpgpu/compute shaders or is it using XBO's processor as well as its own and XBOs gpu?

astrograd · Oct 6, 2013

Solarus,

theres one thing im confused on. is the 10% gpu reserve just for the os or is kinect a part of that too? i know they said they use GPGPU for kinect, but does kinect have its own resources that it pulls? Has ms ever given a figure as to how much those resources are? kinect 2.0 has its own processor now correct? so is it kinect cpu + XBO gpus gpgpu/compute shaders or is it using XBO's processor as well as its own and XBOs gpu?

The 10% figure is a conservative, estimated reserve on MS's part which includes both the OS functions and some Kinect stuff. Here's some more info on the Kinect aspect specifically:

Durango dev docs said:
System Allocations

On Durango, from the POV of allocations, the NUI architecture is split into two parts.

Core Kinect functionality that is frequently used by titles and the system itself are part of the allocation system, including color, depth, active IR, ST, identity, and speech. Using these features or not costs a game title the same memory, CPU time, and GPU time. These features also provide advantages. For example, the identity system will run across application switches because it is handled by the system, not individual applications, and avoids having to re-engage and sign-in repeatedly.

Functionality used less often has its allocation managed in a pay-per-play model. For example, registering color to depth and active IR (or the other way around) as an infrequently used operation will cost the title some small amount of CPU time.

http://www.vgleaks.com/durango-next-generation-kinect-sensor/

There is an MEC chip in the audio block for Kinect's voice recognition, but some of the other stuff is using some GPU cycles. Exactly what that breakdown is nobody (outside MS) knows.

Shifty Geezer · Oct 6, 2013

Betanumerical said:
Well yes, the other DF number strangely gives us more information the other one never mentions what its was from, how long a time step, what operations it was over, its just '150GB/s' is what we have achieved, when? how? for how long? what operations?.

this has already been discussed in detail in some thread of other, so let's put an end to it. MS measured data across their bus. Over one second it as 100 GB of data. Over another it was 200 GBs. Over another it was 150. And they came up with an average attained BW of 150 GB/s. How can we be sure of this, you ask? How do we know that isn't 150 GB/s for one millisecond and the rest of the second it was only managing 100 GBs? Because 1) That figure makes sod-all sense when trying to communicate to your developers what resources they have at their disposal, and 2) the peak BE is 200 GB/s, so if Ms just wanted the biggest possible measurement, they could have contrived a scenario like that and reported, "we've measured 200 GB/s average use."

So the BW for data in ESRAM, available to developers, is around about 150 GB/s real-world. One can choose to disbelieve if one wants, but there's no more point arguing it. The cards are on the table and it's down to individual interpretation.

Because to be perfectly honest without these kind of metrics measuring either bandwidth is a total waste of time and you will just end up with a number that represents nothing in the real world.

The measurements are for developers. Real world measurement are far more useful for targeting an engine and assets than paper metrics.

So, to be clear of this, the discussion of ESRAM's bandwidth is now off limits. Either one believes it's 150 GB/s as the engineers tell us, or not, but no-one needs to engage into a discussion about how much BW there is.

Shifty Geezer · Oct 6, 2013

BRiT said:
I think people are seeing beyond what Beta is asking. I do not take his questions to be trying to discredit them at all. He's merely curious as to what cases do they happen in.

It would be an extra bonus to know more about the situations in which these arise. That is what we all would like to know, right?

There's been a whole discussion or three on that. The situations in which they arise are tangential to the question of how much BW is available to developers. Betanumerical is saying the 150 GB/s cannot be trusted (off the back of a comment about system bandwidth). At this point, it can be taken as fact that devs have typically 150 GB/s in their usual workloads. The intricacies of how to extract more BW are well worth discussing, but I'd say not in this thread as that discussion requires a low-level software debate, whereas this thread is establishing XB1's hardware, including upper and average bus BWs.

http://forum.beyond3d.com/showthread.php?t=64291

Understanding XB1's internal memory bandwidth *spawn

oldschoolnerd · Oct 6, 2013

Shifty Geezer said:
So, to be clear of this, the discussion of ESRAM's bandwidth is now off limits. Either one believes it's 150 GB/s as the engineers tell us, or not, but no-one needs to engage into a discussion about how much BW there is.

Sweet. Can we also have a similar statement for bandwidths being able to be added together...and overall average system bandwidth has been measured at 200GB/s? Because there are ramifications to that which we should be discussing...like how is the "limited" x1 gpu capable of chewing through so much data?

zupallinere · Oct 6, 2013

oldschoolnerd said:
Sweet. Can we also have a similar statement for bandwidths being able to be added together...and overall average system bandwidth has been measured at 200GB/s? Because there are ramifications to that which we should be discussing...like how is the "limited" x1 gpu capable of chewing through so much data?

What do you mean by x1 gpu ? Is there another number beside x1 that you have in mind ?

Betanumerical · Oct 6, 2013

zupallinere said:
What do you mean by x1 gpu ? Is there another number beside x1 that you have in mind ?

Im pretty sure he means the XBONE GPU, not 1x the GPU

.

zupallinere · Oct 6, 2013

Betanumerical said:
Im pretty sure he means the XBONE GPU, not 1x the GPU .

Yes my mistake.

Gipsel · Oct 6, 2013

Ceger said:
because the equation is 80% of a peak bandwidth is what is attainable in real world as the average in most cases (as MS asserts and makes sense until proven otherwise). The numbers they gave were on such measures between the ESRAM and DDR3 (80% of bandwidth) which they say has been proven with actual real code, not tests.

So how do you run a test without real code? Simulate it?

Anyway, MS just did throw out some arbitrary efficiency numbers in that interview without backing them up at all. Numbers from fillrate test from a respected hardware site (hardware.fr) got already posted in this forum (of course using using real code running on real graphics cards, basically the same as MS will very likely measure their bandwidth numbers) which prove, that alpha blending (also mentioned by MS as a scenario for getting more than 109GB/s out of their eSRAM) is good enough to realize 91+% of the theoretical bandwidth of GDDR5. Pure read (or writes) attain 93% in the tests using a Pitcairn GPU (that is reasonably close to the bandwidth and ROP configuration of the PS4).

So could you please stop to use the arbitrary multipliers. The most straightforward assumption is that this "efficiency" applies to both achitectures, XB1 and PS4, roughly the same way. For this kind of stuff the GPU is very likely able to use the available bandwidth with about the same efficiency, as long as proven otherwise. The picture is likely more complicated if you dive into the different characteristics of DRAM and SRAM (which we don't know in detail) and parallel use scenarios. But your grossly simplified approach is leading nowhere other than to fuel some crappy fanboy advantage arguments.

Shifty Geezer · Oct 6, 2013

oldschoolnerd said:
Sweet. Can we also have a similar statement for bandwidths being able to be added together.

That's discussed in the other thread. One can add bandwidths together but it's a pretty meaningless value.

..and overall average system bandwidth has been measured at 200GB/s?

Yes. 150 GB/s for ESRAM and 60 odd for DDR3, means ~200 GB/s. How devs make use of that bandwidth is a complicated issue as it's not directly comparable to 200 GB/s of a unified RAM pool. For further discussion, how devs use the hardware as opposed to what the HW is, use the existing RAM discussion thread.

onQ · Oct 6, 2013

zupallinere said:
Something I'd like to know to. Other presentations discuss 47 MB of coherent memory IIRC but that doesn't say anything about bandwidth just size ( which is biggest for a console or the like ). The MS guy makes the comparison and states the big bet on coherent memory speed but what exactly is that metric ?

I think the coherent memory is virtual memory that's part of the 8GB DDR3 & the 30GB/s is part of the 68GB/s bandwidth. I have a feeling that this was something mostly done for Kinect.

The 47MB is the ESRAM + all the Cache.

32MB of ESRAM
4MB of CPU L2
512KB of CPU L1
232 KB of Audio Chip Cache/SRAM
512 KB GPU L2
192 KB GPU L1
768 KB GPU LSM
64 KB GPU GSM

_____________

Well I was able to find 38.2MB of the 47MB using the Hot Chips document & the leaked GPU info from VGLeaks, that leaves 8 - 9 MB hidden somewhere on the SoC.

Ceger · Oct 6, 2013

Gipsel said:
So how do you run a test without real code? Simulate it?

Anyway, MS just did throw out some arbitrary efficiency numbers in that interview without backing them up at all. Numbers from fillrate test from a respected hardware site (hardware.fr) got already posted in this forum (of course using using real code running on real graphics cards, basically the same as MS will very likely measure their bandwidth numbers) which prove, that alpha blending (also mentioned by MS as a scenario for getting more than 109GB/s out of their eSRAM) is good enough to realize 91+% of the theoretical bandwidth of GDDR5. Pure read (or writes) attain 93% in the tests using a Pitcairn GPU (that is reasonably close to the bandwidth and ROP configuration of the PS4).

So could you please stop to use the arbitrary multipliers. The most straightforward assumption is that this "efficiency" applies to both achitectures, XB1 and PS4, roughly the same way. For this kind of stuff the GPU is very likely able to use the available bandwidth with about the same efficiency, as long as proven otherwise. The picture is likely more complicated if you dive into the different characteristics of DRAM and SRAM (which we don't know in detail) and parallel use scenarios. But your grossly simplified approach is leading nowhere other than to fuel some crappy fanboy advantage arguments.

Again, the specific 91%+ scenario is to specific functions; has their been a percentage estimation of full bandwidth utilization from actual titles running? This is a question.

As for the stated tests/real apps, I would assume that actual titles they have would be the source of those measures; Forza, Ryse, etc..

So my point is about what seems to be honest talk of real average, not specific tests into bandwidth measurement of which I am sure someone at MS can engineer near peak situations as well. No fanboy argument, actually pushing to look at this past fanboy interpretation.

zupallinere · Oct 6, 2013

onQ said:
I think the coherent memory is virtual memory that's part of the 8GB DDR3 & the 30GB/s is part of the 68GB/s bandwidth. I have a feeling that this was something mostly done for Kinect.

The 47MB is the ESRAM + all the Cache.

32MB of ESRAM

4MB of CPU L2

512KB of CPU L1

232 KB of Audio Chip Cache/SRAM

512 KB GPU L2

192 KB GPU L1

768 KB GPU LSM

64 KB GPU GSM

_____________

Well I was able to find 38.2MB of the 47MB using the Hot Chips document & the leaked GPU info from VGLeaks, that leaves 8 - 9 MB hidden somewhere on the SoC.

Thanks for the response. Since the MS guy was using "coherent read bandwidth" and framing it as the BET against other competing systems, it stands to reason that it was quite important. The big difference is the ESRAM and the audio blocks since most of the other stuff is shared by each system. As such the ESRAM would be the biggest contributor to the "coherent read bandwidth" difference. Seems to be a lot riding on that particular piece of real estate ;-)

Having it all laid out there when they state the coherent read bandwidth "bet" I think they are probably averaging up collective bandwidths and then parsing that out over the 47 MB ...say 47 MB / 140 GB/S ( just making that last number up ) for some ratio of memory to bandwidth or maybe just averaging the bandwidth. Just a supposition.

So forward going in terms of the BET the MS engineer suggests seems to be suggesting that their coherent bandwidth advantage ( depending on how that is defined ) will give the XB1 longer legs compared to some other not to be named system and it's GPGPU bet.

Ah another thread for that discussion.

Gipsel · Oct 6, 2013

onQ said:
The 47MB is the ESRAM + all the Cache.

32MB of ESRAM

4MB of CPU L2

512KB of CPU L1

232 KB of Audio Chip Cache/SRAM

512 KB GPU L2

192 KB GPU L1

768 KB GPU LSM

64 KB GPU GSM

_____________

Well I was able to find 38.2MB of the 47MB using the Hot Chips document & the leaked GPU info from VGLeaks, that leaves 8 - 9 MB hidden somewhere on the SoC.

We had this topic already, conclusion was, they counted all SRAM, including the redundant stuff. That changes a few of the GPU numbers:
14 x 16 kB vector L1 in the GPU = 224 kB (instead of 192kB)
14 x 64 kB LDS = 896 kB (instead of 768 kB)

And you forgot a few things:
14 x 256 kB vector registers in the GPU = 3584 kB
14 x 8 kB scalar registers in the GPU = 112 kB
4 x 16 kB scalar (constant) L1 = 64 kB
4 x 32 kB instruction cache = 128 kB
[strike] GDS = 64 kB[/strike]
4 x (16kB+4kB) ROP tile caches = 80 kB

That together adds [strike]4192[/strike] 4128 kB to your number, leaving a bit over 4 MB unaccounted for. And if you consider, that MS has said the eSRAM actually is ECC protected, you can do some creative counting and come to the conclusion that the 32 MB are actually 36 MB closing that gap. If there should be a few hundred kB missing, there are also a lot of small buffers everywhere all over the die made up from small SRAMs.

onQ · Oct 6, 2013

zupallinere said:
Thanks for the response. Since the MS guy was using "coherent read bandwidth" and framing it as the BET against other competing systems, it stands to reason that it was quite important. The big difference is the ESRAM and the audio blocks since most of the other stuff is shared by each system. As such the ESRAM would be the biggest contributor to the "coherent read bandwidth" difference. Seems to be a lot riding on that particular piece of real estate ;-)

Having it all laid out there when they state the coherent read bandwidth "bet" I think they are probably averaging up collective bandwidths and then parsing that out over the 47 MB ...say 47 MB / 140 GB/S ( just making that last number up ) for some ratio of memory to bandwidth or maybe just averaging the bandwidth. Just a supposition.

So forward going in terms of the BET the MS engineer suggests seems to be suggesting that their coherent bandwidth advantage ( depending on how that is defined ) will give the XB1 longer legs compared to some other not to be named system and it's GPGPU bet.

Ah another thread for that discussion.

Honestly I have no idea what you're talking about right now lol.

The Coherent Bus isn't connected to the ESRAM & it says that "Any DRAM data can be coherent with the CPU caches" it's 30GB/s but it's part of the 68GB/s DDR3.

zupallinere · Oct 6, 2013

onQ said:
Honestly I have no idea what you're talking about right now lol.

The Coherent Bus isn't connected to the ESRAM & it says that "Any DRAM data can be coherent with the CPU caches" it's 30GB/s but it's part of the 68GB/s DDR3.

Thanks again. So I was wrong in remembering the 47 mb as being coherent then

onQ · Oct 6, 2013

zupallinere said:
Thanks again. So I was wrong in remembering the 47 mb as being coherent then

The 47MB is from adding the 32MB of ESRAM with all the Cache & SRAM on the SoC.

kots · Oct 7, 2013

Could deferred renderers be a problem for XB1 , considering the relative small amount of esram or it is a non issue ?

jlippo · Oct 7, 2013

kots said:
Could deferred renderers be a problem for XB1 , considering the relative small amount of esram or it is a non issue ?

You can read and write from/into both ESRAM and DDR3, so no issue.

Xbox One (Durango) Technical hardware investigation

Airon

Solarus

astrograd

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

oldschoolnerd

zupallinere

Betanumerical

zupallinere

Gipsel

Shifty Geezer

uber-Troll!

onQ

Ceger

zupallinere

Gipsel

onQ

zupallinere

onQ

kots

jlippo

Similar threads