Xbox One (Durango) Technical hardware investigation

Laa-Yosh · Sep 25, 2013

DrJay24 said:
Well I"m not the one painting the XB1 as a "200GB/s" system. The system has 99% of the game data in 68GB/s memory with the eSRAM to assist as a separate, independent, small high speed pool. The aggregate system bandwidth is going to be difficult to characterize, but there must be a meaningful and honest way to do it other than "we measured..".

I seriously wonder what you think about the PS2 architecture. You know, lots of data in slow RAM and some in fast EDRAM...

dobwal · Sep 25, 2013

oldschoolnerd said:
The 176GBs is a theoretical figure, not real world. Real world is less. An interesting article is here http://archive.arstechnica.com/paedia/b/bandwidth-latency/bandwidth-latency-1.html
Which explains a lot, though despite the authors best efforts, it's not for the feint hearted!

Ms engineers have stated that whilst monitoring real games running they have measured real world 150MB/s from esram and 50GB/s from ddr3. They can be added together. So real world the x1 has been measured running real code at 200Gb/s. The relevant quotes have been included here a page or so ago.

The rest of your points are fair enough, we don't know all the ins and outs, but there would have to be a proper curve ball to throw these figures right out.

I am interested in your comment about thousands of gpu threads vying for the bandwidth with the cpu. Because to my mind the more that happens, the more contention you get, which will not be good for overall bandwidth utilisation or reducing cpu stalls.

Unless gDDR5 can't deliver 256 bits per access during burst mode, then 176 GB/s is a real world number. Like I said you don't need to travel 60 miles over an hour to reach a speed of 60 mph. 176 GBs is a rate. If you can sustain max bandwidth over a full second you will have moved 176 GBs worth of data. But if you only deliver 44 GB over 1/4 of a second you will still have achieved a rate of 176 GB per second.

Sustained bandwidth is another rate all together and is dictated by a whole host of other variables.

Shifty Geezer · Sep 25, 2013

Why is 'we measured' not good enough? That's actually the best way to ascertain real-world performance; actually use the design in the real world and see how it performs. 150 GB/s should mean they shifted 150 GB within the span of one second. That may have been sustained 150 GB/s for a second, or a mix for 30 GB/s and 200 GB/s over that second.

I don't understand why people are arguing so much over these BW figures. We have an ESRAM bus structure, and explanation why it isn't read+write simultaneous 100% of the time, clockspeeds and bus widths, and finally some measured performance from real-world use that seems eminently plausible, favouring the middle of the road between high and low. We can't get any more info than that unless we get hold of game performance profiles from the dev tools. Maybe with XB1 being an SDK, someone will manage to do that somehow. Until then, this topic seems kinda done to death.

Shifty Geezer · Sep 25, 2013

dobwal said:
I don't think you can readily compare the two system since

1. We know relatively little of the eSRAM in terms of timing or configuration.

2. We know relatively little how either system deals with memory contention introduced by having off chip DRAM that services a cpu, gpu and a number of other processors that may or may not exist on either core. The memory system backed by gDDR5 on a discrete gpu doesn't have worry about timely addressing the needs of a cpu. And CPUs aren't use to having to share off chip memory with thousands upon thousands of gpu threads vying for accesses to memory.

3. One system has a unified memory system while the other has a split memory system where two memory pools play different roles.

4. THE MODS WILL BAN YOUR SORRY HIDE

We are not yet comparing platforms. That conversation can start when the machines are out in 2 months.

dobwal · Sep 25, 2013

Shifty Geezer said:
We are not yet comparing platforms. That conversation can start when the machines are out in 2 months.

That # 4 came from somebody other than me.

pjbliverpool · Sep 25, 2013

taisui said:
Here's another hint, using more BW is not better, it's actually worse.

Really? So a memory system with 200GB/s peak that can only achieve 10% efficiency is better than a different memory system with 200GB/s peak that can achieve 80% efficiency when both are attached to the same GPU?

If you're going to offer hints, please try to make them relevant to the argument at hand.

By suggesting that the 150Gbps is somehow manufactured stats,

Please go back and re-read my posts more carefully, I've said nothing of the sort. The closest I've come to saying that is that one possible interpretation, amongst many, of Microsofts statement if that 150GB/s is the occasionally reached real world peak as opposed to average utilization.

what you are saying is that in real life, the average BW utilized is actually less, meaning that there are even more bandwidth that are available to be used.

No, that's not what I'm saying. That's merely your interpretation of it. See how there can be different interpretations of a statement? If average bandwidth consumed is less than 150GB/s it does not necessarily mean there is "spare" bandwidth up to 150GB/s just waiting to be used. It can easily mean that a particular operation uses memory in such a way that there is simply no way to get more that (say) 130GB/s out of it. And if operations of that sort make up the bulk of your game then 130GB/s is going to be closer to your average than 150GB/s.

That doesn't mean a different kind of memory configuration couldn't achieve higher utilization in the same type of operation with the same GPU. Thus your bandwidth usage goes up, your GPU bandwidth bottleneck is eased and overall performance increases.

This can apply both to esram and GDDR5 so please lets leave platform bias out of this and focus on the facts. And the facts are that no-one hear knows the true meaning of Microsofts statement.

taisui · Sep 25, 2013

DrJay24 said:
Exactly, to think that is the mean bandwidth over the course of some macroscopic time frame is ridiculous. 99% of the data is in the DDR3 pool, but magically the eSRAM is going to be read/writing 150GB/s full time? Remember the original DF article about the eSRAM with the "holes"? They actually gave an example of how they achieved that near max bandwidth number, some FP16 blend operation I think. Are we to expect the eSRAM is always doing some operation that maxes its bandwidth?

In simplistic form, using the eSRAM as frame buffer, a 1080p frame buffer (color+z+stencil) @ 60fps needs about 6Gbps.
Each pass with blending, so add 12Gbps for both the read/write.
So yea, I think the bw on the eSRAM will chewed up pretty quickly.

Deferred shading probably use more bandwidth even more so.

SlimJim · Sep 25, 2013

As a summary of the past year.

Microsoft stated:

"We don't want to target high specs, we believe we have excellent value with the family friendly Kinect 2.0 which is even more accurate than the previous one, the great TV functions which are voice controlled as well, and also: no more switching HDMI ports, ever. All while running skype, a game and a browser, at the same time."

Maybe not literally, but this has been their focus until E3 at least.

The only thing that has changed, is that MS overclocked the cpu by 10% and the GPU by 5 or maybe 6%. No other changes have been made.
They discovered that the ERAM was (roughly) twice as fast as previously imagined.

Aside from this, nothing has really changed, hardware wise.
So either their initial reveal was incorrect, or there is some serious back-pedalling going on.

taisui · Sep 25, 2013

pjbliverpool said:
Really? So a memory system with 200GB/s peak that can only achieve 10% efficiency is better than a different memory system with 200GB/s peak that can achieve 80% efficiency when both are attached to the same GPU?

You are confused, again, on the arbitrary utilization percentage.

If you're going to offer hints, please try to make them relevant to the argument at hand.

Sure, given algorithm A that produce the exact result as algorithm B, provided everything's the same except the difference mentioned:

1) If A does it in less time, wouldn't it make A better?
2) If A does it by using less memory, is A better?
3) If A does it by moving less data, isn't A better?

No, that's not what I'm saying. That's merely your interpretation of it. See how there can be different interpretations of a statement? If average bandwidth consumed is less than 150GB/s it does not necessarily mean there is "spare" bandwidth up to 150GB/s just waiting to be used. It can easily mean that a particular operation uses memory in such a way that there is simply no way to get more that (say) 130GB/s out of it. And if operations of that sort make up the bulk of your game then 130GB/s is going to be closer to your average than 150GB/s.

Right, so that's exactly the case, so what's the problem then? The algorithm will not use the full potential, so the bandwidth is not the bottleneck, the code is running as fast as it can, isn't that the point?

This can apply both to esram and GDDR5 so please lets leave platform bias out of this and focus on the facts. And the facts are that no-one hear knows the true meaning of Microsofts statement.

The irony is unbearable.

Gubbi · Sep 25, 2013

DrJay24 said:
We have all read all that, I'm not sure why it is gospel though. MS has measured something, great. How often and what particular circumstances? Do you think they are read/writing to the eSRAM 100% of the time at 150GB/s?

The read to write ratio is something like 2-4:1 for normal graphics workloads. If we assume we use all of the ESRAM read bandwidth (ie >95%) and 60 % of the DDR3 bandwidth for GPU reads, we get around 45-50GB write traffic to ESRAM with a 3:1 ratio.

That's 150GB/s (100 read, 50 write) of ESRAM bandwidth with aggregate system bandwidth being well over 200GB/s. Lower average latency too.

Cheers

MrFox · Sep 25, 2013

SlimJim said:
They discovered that the ERAM was (roughly) twice as fast as previously imagined.

They discovered that it was designed with a full bidirectional interface. Someone somehow forgot to tell anyone.

Shifty Geezer · Sep 25, 2013

dobwal said:
That # 4 came from somebody other than me.

Yes, me.

As for the conversation, I'll post a summary that'll hopefully bring it to a natural close instead of everyone delving into cyclic arguments to nowhere. Bandwidth is a measure of data capacity over time. Whenever measured, it's an average (even a peak BW measurement is an average over time). That MS measured 150 GB/s means they achieved 150 GB/s. That's with considerable probability. There's no reason whatsoever for anyone to believe that was 150 GB/s for a fraction of a second and the average BW for ESRAM access is notably less than that; that defies the purpose of the measurement as an average and a metric you'd use to inform your development partners. That sort of argument is basically looking for an opportunity to doubt the evidence in front of is, and is unnecessary.

Johnny Awesome · Sep 25, 2013

This was a very nice interview with some cool information. We'll know over the coming years which system performs better than the other, but I'm pretty satisfied with Ryse, Forza 5, and DR 3 as launch titles. They look great and this gives me hope that MS has done a much better job with the hardware than MS haters on the Internet think.

Gubbi · Sep 25, 2013

MrFox said:
They discovered that it was designed with a full bidirectional interface. Someone somehow forgot to tell anyone.

Some pointy hair bosses, marketing drones, and a bunch of internet tards suddenly discovered the aggregate bandwidth numbers.

The engineers who designed the thing didn't do it based on assumptions and luck. They examined various workloads and existing bottlenecks, designed the thing to have separate read and write buses to/from the ESRAM knowing all along what typical aggregate bandwidth could be expected for various graphics engine implementations.

As did the ones working on PS4.

MS ended up with more aggregate bandwidth and lower average latency. Sony ended up with a lot more computational grunt. That's because they looked at different workloads and designed for different price points.

Cheers

pjbliverpool · Sep 25, 2013

taisui said:
You are confused, again, on the arbitrary utilization percentage.

You're right, I am confused, but not about what you think.

Sure, given algorithm A that produce the exact result as algorithm B, provided everything's the same except the difference mentioned:

1) If A does it in less time, wouldn't it make A better?
2) If A does it by using less memory, is A better?
3) If A does it by moving less data, isn't A better?

You've missed the point again but this is so far off the initial argument that this is the last thing I'll say on it. What if there is no algorithm B? What if you simply have algorithm A running on GPU A that demands 200GB/s to match out the GPU's computation resources? And what if you have 2 memory systems (A & B) that both peak at 200GB/s theoretical but A can attain 90% utilization in algorithm A and B only 50% in that same algorithm. Clearly bandwidth utilization is higher with memory system A which is a good thing since it's closer to that the GPU needs to saturate it's computational resources.

The irony is unbearable.

Indeed. So let me put this question to you. Which of these statements do you actually think Microsoft is making and why:

1. We measured 150GB/s average utilization over the course of a 30 minute gaming session
2. We measured 150GB/s peak utilization over the course of a 30 minute gaming session with the average being somewhat lower

Both statements clearly have very different implications but as far as I'm aware, the only part of those statements that Microsoft have made is the bolded part. So please go ahead and explain which of these you hold to be true and what evidence you have to support that belief.

Gubbi · Sep 25, 2013

pjbliverpool said:
1. We measured 150GB/s average utilization over the course of a 30 minute gaming session
2. We measured 150GB/s peak utilization over the course of a 30 minute gaming session with the average being somewhat lower

The first one. There'd be no point in having a 1024 bit write bus if it is utilized significantly less than 50% most of the time. They'd be better off with 1280 bit read and 512/768 bit write buses.

Cheers

DrJay24 · Sep 25, 2013

taisui said:
In simplistic form, using the eSRAM as frame buffer, a 1080p frame buffer (color+z+stencil) @ 60fps needs about 6Gbps.
Each pass with blending, so add 12Gbps for both the read/write.
So yea, I think the bw on the eSRAM will chewed up pretty quickly.

Deferred shading probably use more bandwidth even more so.

6Gbps+12Gbps = 2.25GB/s?

SlimJim · Sep 25, 2013

MrFox said:
They discovered that it was designed with a full bidirectional interface. Someone somehow forgot to tell anyone.

This happens more often than you think: people are working on different specs and they communicate through email. Imagine if a MS engineer sent a message: but it went like this:

" Dear Xbox One Durango Oban Team,
I added a full bidirectional interface because it will help with the bandwidth and look good on paper as well. Also are we going to see Pacific Rim tonight with the team? I heard it is pretty amazing, also I inherited a 250 dollar from a distant uncle, so I am buying!
Greetings, Colonel Nelson, Technical Fellow."

It's possible that the MS Hotmail servers picked the bolded part up, and marked it as spam. So nobody ever got the email, so that is why they didn't know.

That's not too farfetched. MS is a really big company, and MS Hotmail is not so good at recognising spam.

taisui · Sep 25, 2013

pjbliverpool said:
You've missed the point again but this is so far off the initial argument that this is the last thing I'll say on it. What if there is no algorithm B?

Eh, logically if you are going to put more ifs on my ifs, it's already an illogical argument.

What if you simply have algorithm A running on GPU A that demands 200GB/s to match out the GPU's computation resources?

What does this even mean? You'll just finish the computation slower?

And what if you have 2 memory systems (A & B) that both peak at 200GB/s theoretical but A can attain 90% utilization in algorithm A and B only 50% in that same algorithm. Clearly bandwidth utilization is higher with memory system A which is a good thing since it's closer to that the GPU needs to saturate it's computational resources.

Again, if A and B can produce the same result at the same amount of time, clearly B is far superior.

I'll save the rest since I simply can't respond to conspiracy theories that essentially look away from the evidence being presented.

taisui · Sep 25, 2013

DrJay24 said:
6Gbps+12Gbps = 2.25GB/s?

It's a typo, it's Bytes, you can correct my math if it's wrong.

Xbox One (Durango) Technical hardware investigation

Laa-Yosh

I can has custom title?

dobwal

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

dobwal

pjbliverpool

B3D Scallywag

taisui

SlimJim

taisui

Gubbi

MrFox

Deludedly Fantastic

Shifty Geezer

uber-Troll!

Johnny Awesome

Gubbi

pjbliverpool

B3D Scallywag

Gubbi

DrJay24

SlimJim

taisui

taisui

Similar threads