Understanding XB1's internal memory bandwidth *spawn

function · Oct 6, 2013

Betanumerical said:
But didn't they say they measured the 150GB/s in only one case? not all the time, if thats the case then its not the same kind of number.

They only gave one example, but didn't say there's only one case where it can happen.

Situations involving compute or depth ready/modify/write could also be large consumers of simultaneous read/write bandwidth. There's no doubt that there are situations where the esram / DDR3 config can offer significantly more BW than a 256-bit GDDR 5 setup could, but there are probably also situations where effective BW will be significantly lower.

The devil will be in the detail. A single pool will certainly be easier to optimise for and be closer to the PC platforms that many launch games seem to have been built on.

Ceger · Oct 6, 2013

Betanumerical said:
I would like further clarification from someone who knows more to be honest.

Other than the actual X1 architects? Who would that be?

function · Oct 6, 2013

dragonelite said:
Wasn't that around 135GB/s?

That was before the up-clock iirc.

Edit: The whole GPU got the upclock, so ROPs and esram included. Measured BW for ROP or BW limited scenarios should scale linearly with the clock.

mosen · Oct 6, 2013

Betanumerical said:
I would like further clarification from someone who knows more to be honest.

Someone like Nick Baker (from Xbox One hardware architecture team)? :smile:

liolio · Oct 6, 2013

Ceger said:
No, the 208GB/s is not achievable all the time. The 150GB/s is the regularly achievable amount. This is just the ESRAM as well.

I guess that is how much bandwidth the scratchpad can deliver in bandwidth bound scenario.
It is quite an high measurement.
I'm not sure (or worse I suspect why) why so much people are wary of MSFT claims and can't take what they said at its face value, while in the mean time... well...

Anyway they are pretty honest, too much in my opinion, they said that achievable bandwidth from DDR3 is ~55GB/s (out of 67GB/s), not exactly PR friendly when you come out of your way to explain the choice they made in the light of Sony choice for lots of really fast Memory.

Betanumerical · Oct 6, 2013

mosen said:
Someone like Nick Baker (from Xbox One hardware architecture team)? :smile:

Someone telling us how it was achieved and with what ops, because the DF quote I quoted paints a different picture imo. Its a moot point anyway as no one else seems interested in exploring this.

Brad Grenz · Oct 6, 2013

Ceger said:
No, they said that it is actual real code, not tests and such. I think you read it backwards.

They've said they measure it as being that high, but we have no sense of whether that rate is something that can be sustained. Maybe it briefly peaks at that rate during specific operations, or maybe it is an overall average usage across a significant time scale. They've never said one way or another, but the example scenarios that have been given seem to suggest achieving that rate requires optimal access patterns.

function · Oct 6, 2013

Betanumerical said:
Someone telling us how it was achieved and with what ops, because the DF quote I quoted paints a different picture imo.

How so?

Betanumerical · Oct 6, 2013

function said:
How so?

Because it mentions they achieved the rate of 133GB/s (150GB/s now) using alpha blending . Others are assuming that the 150GB/s is over the entire timestep at not at a specific point in time with seems to be contrary to what the quote I quoted is saying.

oldschoolnerd · Oct 6, 2013

What I think is intriguing in this, is that if you accept ms saying that they managed to record 200GB/s real world utilisation over an entire second (30/60 frames of gameplay), what is the GPU doing being able to chew through so much data with only 12cus etc? Compared to a 7870 or something?

Airon · Oct 6, 2013

Betanumerical said:
I would like further clarification from someone who knows more to be honest.

This is what DF said previously.

You have miss the UP-clock

function · Oct 6, 2013

Betanumerical said:
Because it mentions they achieved the rate of 133GB/s (150GB/s now) using alpha blending . Others are assuming that the 150GB/s is over the entire timestep at not at a specific point in time with seems to be contrary to what the quote I quoted is saying.

But that 133 GB/s is from before the upclock. 133 x 1.066 = 142 GB/s. And that may not even be the peak BW, just the BW they measured during 64 bpp alpha blending.

Betanumerical · Oct 6, 2013

function said:
But that 133 GB/s is from before the upclock. 133 x 1.066 = 142 GB/s. And that may not even be the peak BW, just the BW they measured during 64 bpp alpha blending.

Thats exactly my point, the upclock is not the point here, the point is that the timestep of the measured 150GB/s is unknown if the 150GB/s only happens at alpha blending (for example) then the 150GB/s is not a good 'average' figure is it.

function · Oct 6, 2013

Actually, I was wrong, I'd forgotten that there is more than one BW example. Here is another example that uses only 32bpp and does not use blending.

DF article said:
For example, consider a typical game scenario where the render target is 32bpp [bits per pixel] and blending is disabled, and the depth/stencil surface is 32bpp with Z enabled. That amount to 12 bytes of bandwidth needed per pixel drawn (eight bytes write, four bytes read). At our peak fill-rate of 13.65GPixels/s that adds up to 164GB/s of real bandwidth that is needed which pretty much saturates our ESRAM bandwidth.

This doesn't make it sound like a sustained 140 ~ 150 GB/s is some theoretical, pie in the sky, bullshit figure.

Betanumerical · Oct 6, 2013

function said:
Actually, I was wrong, I'd forgotten that there is more than one BW example. Here is another example that uses only 32bpp and does not use blending.

This doesn't make it sound like a sustained 140 ~ 150 GB/s is some theoretical, pie in the sky, bullshit figure.

For one, that more then saturates there eSRAM bandwidth at those figures (albeit by a tiny amount) and yet once again, the argument is not over wether or not these figures are real but how often they occur, if they occur for 1% of the frame (for example) they are useless. If we applied the 80% rule here I think we would out at a good number.

And once again he is talking peak numbers, how often do you actually achieve your peak fillrate?.

function · Oct 6, 2013

Betanumerical said:
Thats exactly my point, the upclock is not the point here, the point is that the timestep of the measured 150GB/s is unknown if the 150GB/s only happens at alpha blending (for example) then the 150GB/s is not a good 'average' figure is it.

The interview says "we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth."

The upclock is the point, as it would get you into the 140~150 GB/s figure that MS give. You're taking the extreme edge of the range that they give and using a small gap with a figure that figure comfortably within that argument to sow FUD. Because that's all you're doing - trying to attach uncertainty and doubt to the claims that they're making

Also, why would you want an "average" figure? The BW used will dependant on the workload, and so trying to get an out of context "average" figure is meaningless. The 140~150 "real world measured" figure (and they give one for the DDR3 too) is already infinitely closer to reality than the 176 GB/s figure that people are comparing it to, yet there's isn't some FUD campaign against that.

They're giving you actual examples of real world bandwidth from specific use-cases. No-one else is doing that. What is actually going on in this thread?

Brad Grenz · Oct 6, 2013

Betanumerical said:
For one, that more then saturates there eSRAM bandwidth at those figures (albeit by a tiny amount) and yet once again, the argument is not over wether or not these figures are real but how often they occur, if they occur for 1% of the frame (for example) they are useless. If we applied the 80% rule here I think we would out at a good number.

And once again he is talking peak numbers, how often do you actually achieve your peak fillrate?.

That rate can't even be achieved with just ESRAM because he's talking about 164GB/s or write only so it outstrip the 109GB/s max for writing to ESRAM. It is not an example of a usage scenario that results in 150GB/s combined read+write for the ESRAM alone.

Betanumerical · Oct 6, 2013

function said:
The interview says "we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth."

The upclock is the point, as it would get you into the 140~150 GB/s figure that MS give. You're taking the extreme edge of the range that they give and using a small gap with a figure that figure comfortably within that argument to sow FUD. Because that's all you're doing - trying to attach uncertainty and doubt to the claims that they're making

Also, why would you want an "average" figure? The BW used will dependant on the workload, and so trying to get an out of context "average" figure is meaningless. The 140~150 "real world measured" figure (and they give one for the DDR3 too) is already infinitely closer to reality than the 176 GB/s figure that people are comparing it to, yet there's isn't some FUD campaign against that.

They're giving you actual examples of real world bandwidth from specific use-cases. No-one else is doing that. What is actually going on in this thread?

I'm just trying to workout over what operations the number is from and over how long, if that is FUD then I honestly think you have its meaning mixed up with something else. The upclock is not relevant because im aware of it and using it as I always have, i'm just finding it strange that the prior DF article said they were getting the equiv of 150GB/s prior to the upclock on ONE case that was alpha blending.

Ceger · Oct 6, 2013

function said:
The interview says "we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth."

The upclock is the point, as it would get you into the 140~150 GB/s figure that MS give. You're taking the extreme edge of the range that they give and using a small gap with a figure that figure comfortably within that argument to sow FUD. Because that's all you're doing - trying to attach uncertainty and doubt to the claims that they're making

Also, why would you want an "average" figure? The BW used will dependant on the workload, and so trying to get an out of context "average" figure is meaningless. The 140~150 "real world measured" figure (and they give one for the DDR3 too) is already infinitely closer to reality than the 176 GB/s figure that people are comparing it to, yet there's isn't some FUD campaign against that.

They're giving you actual examples of real world bandwidth from specific use-cases. No-one else is doing that. What is actually going on in this thread?

And that would be with what they have and know now. I'd expect average results to go up as they learn better ways to utilize the ESRAM as time goes on.

function · Oct 6, 2013

Betanumerical said:
For one, that more then saturates there eSRAM bandwidth at those figures (albeit by a tiny amount) and yet once again, the argument is not over wether or not these figures are real but how often they occur, if they occur for 1% of the frame (for example) they are useless. If we applied the 80% rule here I think we would out at a good number.

And once again he is talking peak numbers, how often do you actually achieve your peak fillrate?.

So you want to apply the 80% number to their 80% number?

How is a BW figure that's actually been measured in real application 'useless', in a world where people like their competitors use peak and utterly, utterly unattainable bus x clock figures in their marketing?

And what would be the point in giving a BW range for regular workloads if they couldn't sustain that BW and those workloads over a meaningful period of time? Why would they be giving those figures to developers in NDA'd docs? It's not like they couldn't check to see.

Understanding XB1's internal memory bandwidth *spawn

function

None functional

Ceger

function

None functional

mosen

liolio

Aquoiboniste

Betanumerical

Brad Grenz

Philosopher & Poet

function

None functional

Betanumerical

oldschoolnerd

Airon

function

None functional

Betanumerical

function

None functional

Betanumerical

function

None functional

Brad Grenz

Philosopher & Poet

Betanumerical

Ceger

function

None functional

Similar threads