Understanding XB1's internal memory bandwidth *spawn

They've said they measure it as being that high, but we have no sense of whether that rate is something that can be sustained. Maybe it briefly peaks at that rate during specific operations, or maybe it is an overall average usage across a significant time scale. They've never said one way or another, but the example scenarios that have been given seem to suggest achieving that rate requires optimal access patterns.

They said actual real code, running apps. Why would they take a sliver of a hit to report what is going on? If 140-150 ws the high, then they would say that the average was lower, not that the average was 140-150 actual results.

Why are these assumptions only made in regards ever to the X1? it really comes across as people searching to discredit as these types of discussion hardly seem to prop up outside of the scope of the X1, whether it be competing platforms, PCs, etc..
 
Ok change the 8GB to 'a large pool of main memory' if you wish. It doesn't change the point of my post though. They wanted DDR as opposed to GDDR for the main memory for power and cost reasons and so esram was also included to supplement the bandwidth.

You're perpetuating a myth. We don't have to guess here. They just TOLD US that it was there by virtue of being an evolution of the eDRAM on 360. Does it have a slew of other benefits? Sure! It's a silver bullet so to speak for a variety of design challenges. That doesn't change the motivating purpose for its inclusion. As I said to Strange, the context of discussing its purpose for inclusion can utterly ruin the interpretation of the design, unfortunately.

Low latency does not appear to have been the driving advantage behind its inclusion and from the statements in the article there's at least reasonable reason to doubt whether it will have any major latency based advantages at all. Otherwise this surely would have been touched upon given that the question presented the perfect opportunity to do so.
1) It's in the dev docs.

2) I said it was a benefit, not that it was the motivating factor for its inclusion.

3) They HAVE talked about the benefits of low latency, specifically in terms of GPGPU functions. Kinect's processing was the example given.
 
Last edited by a moderator:
Are we seriously contending that the figures given by ms are anything other than what they are described as? I.e. recorded, real world, average bandwidth utilisation figures. They go out of their way to explain what the peak figures are, and why the average is lower. If they say they see an average of 140-150GB/s measured, this has to be a measure taken across a whole second...during which 30 or 60 frames were generated. If this figure was over some fraction of a second, doing a certain operation, it wouldn't be an average/sec.
 
So you want to apply the 80% number to their 80% number?

How is a BW figure that's actually been measured in real application 'useless', in a world where people like their competitors use peak and utterly, utterly unattainable bus x clock figures in their marketing?

And what would be the point in giving a BW range for regular workloads if they couldn't sustain that BW and those workloads over a meaningful period of time? Why would they be giving those figures to developers in NDA'd docs? It's not like they couldn't check to see.

I'm just trying to point out that unless we know the details of how they got that numbers (ie for how long a time step it was measured over and if it was intermixed with other ops or just plain out blending) then its as useless as the peak number from any of the consoles.

Ill except it and drop it all if ANYONE can explain this.

Theoretical peak performance is one thing, but in real-life scenarios it's believed that 133GB/s throughput has been achieved with alpha transparency blending operations (FP16 x4).
 
You place a statement without source from DF as "believed to be" over a direct statement from MS as "what is"?

Strange.

Well yes, the other DF number strangely gives us more information :LOL: the other one never mentions what its was from, how long a time step, what operations it was over, its just '150GB/s' is what we have achieved, when? how? for how long? what operations?.

Because to be perfectly honest without these kind of metrics measuring either bandwidth is a total waste of time and you will just end up with a number that represents nothing in the real world.
 
Last edited by a moderator:
Well yes, the other DF number strangely gives us more information :LOL: the other one never mentions what its was from, how long a time step, what operations it was over, its just '150GB/s' is what we have achieved, when? how? for how long? what operations?.

I wouldn't take that first unsourced statement as being worth more since it is at least third hand info and before the upclock. I can't be certain it is exactly what it is portrayed as. It might be mixing two different statements or not. We will never know. There's more potential for it suffering from transformations as it passes through the relays.

Imagine a situation where someone was presenting this information to their audience.

Presenter: "It's possible to get higher bandwidth with the esram than 109 GB/s. We have some average measurements at 133 GB/s."

Audience Member: "Is that possible when alpha-blending?"

Presenter: "Yes."

Later on, Audience Member relays to DigitalFoundry: "They measured 133 GB/s with AlphaBlends."
 
Because it mentions they achieved the rate of 133GB/s (150GB/s now) using alpha blending .

You made this up. Nowhere in the DF article(s) is it suggested that you can only do alpha blending to get 150GB/s. They even specifically state it was real game code running and not some diagnostic. Note also that even with 109GB/s they still have a higher peak BW than the GDDR5 alternative the competition went with (177GB/s vs 176GB/s) so anything between 109 and 150 is a performance advantage in that regard no matter how you try slicing it. Hence, they deserve plenty of credit for their engineering decisions there.
 
Well yes, the other DF number strangely gives us more information :LOL: the other one never mentions what its was from, how long a time step, what operations it was over, its just '150GB/s' is what we have achieved, when? how? for how long? what operations?.

this is just a average number. they could have said "optmal code = 209gb/s" but they know is is not realistic. so common code, code that exists for xbox one right now reaches an average of 150gb/s. some operations get more bandwidth but others get less (because of "non-optimal" code).
the minimal bandwidth of dram (ddr3 or gddr5) is much much worse (68GB/s and 176gb/s are just peak bandwidth) ;), but normally you have something in between of the minimal and the optimal.
 
You made this up. Nowhere in the DF article(s) is it suggested that you can only do alpha blending to get 150GB/s. They even specifically state it was real game code running and not some diagnostic. Note also that even with 109GB/s they still have a higher peak BW than the GDDR5 alternative the competition went with (177GB/s vs 176GB/s) so anything between 109 and 150 is a performance advantage in that regard no matter how you try slicing it. Hence, they deserve plenty of credit for their engineering decisions there.



Solarus,



The 10% figure is a conservative, estimated reserve on MS's part which includes both the OS functions and some Kinect stuff. Here's some more info on the Kinect aspect specifically:



http://www.vgleaks.com/durango-next-generation-kinect-sensor/

There is an MEC chip in the audio block for Kinect's voice recognition, but some of the other stuff is using some GPU cycles. Exactly what that breakdown is nobody (outside MS) knows.

It was from a prior article in DF that talked about the eSRAM boost, they even broke the story iirc.

Theoretical peak performance is one thing, but in real-life scenarios it's believed that 133GB/s throughput has been achieved with alpha transparency blending operations (FP16 x4).

http://www.eurogamer.net/articles/digitalfoundry-xbox-one-memory-better-in-production-hardware

This gives us far more information then the other article but is before the upclock hence the number adjustment.


this is just a average number. they could have said "optmal code = 209gb/s" but they know is is not realistic. so common code, code that exists for xbox one right now reaches an average of 150gb/s. some operations get more bandwidth but others get less (because of "non-optimal" code).
the minimal bandwidth of dram (ddr3 or gddr5) is much much worse (68GB/s and 176gb/s are just peak bandwidth) ;), but normally you have something in between of the minimal and the optimal.

Can you show me where they said it was the average bandwidth over any given tmestep.
 
Here:
With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency.

The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... One out of every eight cycles is a bubble, so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally.

That bolded bit. He then goes on to say how the figures he gives are over a period of time (that's how they differ from peak)...which means it must include all operations to generate 30/60 frames.The measurements given are the real deal. I'd like to get onto discussing the implications...as that is a huge amount of data...
 
It was from a prior article in DF that talked about the eSRAM boost, they even broke the story iirc.

No. The 133GB/s figure was from their story specifically citing alpha blending. Not the 150GB/s figure cited by Baker in the recent article(s). You conflated the two as an excuse to sow doubt about the validity of a figure that you don't like.

http://www.eurogamer.net/articles/digitalfoundry-xbox-one-memory-better-in-production-hardware

This gives us far more information then the other article but is before the upclock hence the number adjustment.
The article you are citing here has been almost universally dismissed because it painted X1 in too pleasant a light relative to the competition. It's what opened the door for so much FUD and misunderstanding about the eSRAM in the first place. You are trying to spin the 150GB/s figure as if it were a diagnostic type test to reach a peak. It's not. No need to guess or speculate on that.

I've been told personally that actual game code runs at 150GB/s from eSRAM alone. Not some catered diagnostic operation; actual game code.

As others noted, the DF article here specifically cites the context of Baker's point about the 150GB/s figure as being sustainable over long periods of time. Baker goes out of his way to erect that context in analogy with the DRAM and then makes sure to carry it over to the eSRAM discussion. Doesn't get more clear than that.
 
I think people are seeing beyond what Beta is asking. I do not take his questions to be trying to discredit them at all. He's merely curious as to what cases do they happen in and would like to investigate those further. I know this through active real-time discussions at this very moment.

We both agree that these measures are interesting in that perhaps for the first time ever we have some real measured numbers before the hardware launches. These numbers are more meaningful than theoretical peak numbers.

It would be an extra bonus to know more about the situations in which these arise. That is what we all would like to know, right?
 
You're perpetuating a myth. We don't have to guess here.

No, I am not perpetuating a myth. I'm re-stating the evidence as it was presented to us. Did you read the article? They very clearly stated the reasoning behind the inclusion of the esram was to allow for a large amount of main memory, while maintaining high bandwidth at a low cost and low power draw. Let me re-post the relevant part of the article in case you missed it:

Digital Foundry said:
Perhaps the most misunderstood area of the processor is the ESRAM and what it means for game developers. Its inclusion sort of suggests that you ruled out GDDR5 pretty early on in favour of ESRAM in combination with DDR3. Is that a fair assumption?

Nick Baker: Yeah, I think that's right. In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place. Having ESRAM costs very little power and has the opportunity to give you very high bandwidth. You can reduce the bandwidth on external memory - that saves a lot of power consumption as well and the commodity memory is cheaper as well so you can afford more. That's really a driving force behind that. You're right, if you want a high memory capacity, relatively low power and a lot of bandwidth there are not too many ways of solving that.

Can it really get any plainer than that?

They just TOLD US that it was there by virtue of being an evolution of the eDRAM on 360. Does it have a slew of other benefits? Sure! It's a silver bullet so to speak for a variety of design challenges. That doesn't change the motivating purpose for its inclusion. As I said to Strange, the context of discussing its purpose for inclusion can utterly ruin the interpretation of the design, unfortunately.

Are you suggesting that the main reason for including esram in the XB1 had nothing to do with design goals and was purely driven by the fact they had edram in the XB360 and thus wanted to evolve that design into the XB1? Clearly that's not the case. They had a set of design goals and challenges with the XB1's memory system, these are already stated in the quote above. They decided that esram was the answer to those challenges and obviously leveraged their experience with the XB360's edram to implement the XB1's esram in a more flexible and useful fashion.

Obviously they are going to leverage their previous experience with a similar memory setup but that doesn't change the reasons why they went with a similar memory setup in the first place rather than something completely different. They didn't go into the XB1 design stage saying "we must have embedded memory and screw everything else" they'll have made the decision to include embedded memory because it was what they saw as the best answer to their design goals.

1) It's in the dev docs.

Links, quotes and numbers showing this to be a genuine advantage over alternate memory configuration options?

2) I said it was a benefit, not that it was the motivating factor for its inclusion.

Yes, you say it's a benefit. And yet when specifically asked if it will have an impact on GPU performance Nick Baker says this:

Digital Foundry said:
: There's been some discussion online about low-latency memory access on ESRAM. My understanding of graphics technology is that you forego latency and you go wide, you parallelise over however many compute units are available. Does low latency here materially affect GPU performance?

Nick Baker: You're right. GPUs are less latency sensitive. We've not really made any statements about latency.

Do you have an explanation as to why he wouldn't extol the benefits of the low latency esram at this seemingly perfect moment to do so if they were as real as you claim?

3) They HAVE talked about the benefits of low latency, specifically in terms of GPGPU functions. Kinect's processing was the example given.

I do recall seeing something along these lines but again a link would be good so that we can be sure the statement isn't being taken out of context. i.e. are they talking about a specific performance advantage afforded by the esrams comparative low latency in relation to a different form of graphics memory and are they stating the Kinect example as something very specific or as one general example of a much wider range of benefits.

GPGPU has generally been accepted as one of the areas that could benefit from low latency memory access if that's what the esram affords but it seems strange that the article doesn't mention it at all while focusing quite heavily on other XB1 advantages for GPGPU.
 
Last edited by a moderator:
Just because someone gives one example does not mean it is the _only_ way of achieving that. The reason they used alpha blending is because it uses a combination of reads and writes. The ESRAM _needs_ a combination of reads and writes to get more than 109GB/s. The amount more than 109GB/s depends on your balance of reads and writes. If you had some perfect algorithm that perfectly balanced reads and writes, you could get the 204GB/s peak. No real-life code does that.

Remember, 204GB/s is the peak. Mentioning that they measured 150GB/s in a normal use case does not make that a _new_ peak.

DDR3 and GDDR5 doesn't have to worry about the mix of reads and writes for achieving their peak, they only have a single port, so it doesn't matter if they're doing pure read, alphablend, or pure write, you're going to get about the same amount of bandwidth, which will be some amount less than the peak.
 
Also, bandwidth is there to cover edge cases that demand more bandwidth. It isn't as if the current generation sits near peak the majority of the time.
 
Just because someone gives one example does not mean it is the _only_ way of achieving that. The reason they used alpha blending is because it uses a combination of reads and writes. The ESRAM _needs_ a combination of reads and writes to get more than 109GB/s. The amount more than 109GB/s depends on your balance of reads and writes. If you had some perfect algorithm that perfectly balanced reads and writes, you could get the 204GB/s peak. No real-life code does that.

Remember, 204GB/s is the peak. Mentioning that they measured 150GB/s in a normal use case does not make that a _new_ peak.

DDR3 and GDDR5 doesn't have to worry about the mix of reads and writes for achieving their peak, they only have a single port, so it doesn't matter if they're doing pure read, alphablend, or pure write, you're going to get about the same amount of bandwidth, which will be some amount less than the peak.
dram loses bandwidth when you change from read to write, needs longer to get the address and so on. you loose cycles on every new read or write (address change).
than the bursts come into play, if the bursts do not work, you may only get 1/8th of the possible bandwidth. and yes, this shouldn't be possible with average code.
so no, you can't get the peak bandwidth with dram, only if you're making one really big read or write.
dram absolut worst case szenario for the 68Gb/s (ddr3/gddr5) would be 8.5 GB/s (absolut worst case if no burst is working out). 68gb/s is the absolut peak. so you've got several reads and writes in a second so you can get to ~50gb/s maybe. but it can be worse.

the esram is limited to 109gb/s in worst case read or write. so if you have code that does something really bad with your bandwidth, you will still reach 109gb/s, there are no tricks to reach this bandwidth. do some reads/writes at the same time (e.g. from the move engines) and you are over 109gb/s.
 
No, I am not perpetuating a myth. I'm re-stating the evidence as it was presented to us. Did you read the article? They very clearly stated the reasoning behind the inclusion of the esram was to allow for a large amount of main memory, while maintaining high bandwidth at a low cost and low power draw. Let me re-post the relevant part of the article in case you missed it:

No, they cited a whole bunch of benefits (Baker did) while Goossen later on suggested that it was in place by virtue of the successes of the 360's eDRAM. There's a difference between Baker saying "it solves a whole bunch of our design goals" and Goossen saying "we view it as an extension of the eDRAM in 360". There's an implied timeline there that seems to run counter to the one you've described.

Can it really get any plainer than that?
Keep reading. Goodden echoes the point I made about context. When you view the eSRAM's inclusion as the result of 8GB of DDR3, you do so incorrectly, and the context of that improper perspective colors the interpretation of its inclusion. Read what Baker said again, this time in context of the questioning...

Digital Foundry: Perhaps the most misunderstood area of the processor is the ESRAM and what it means for game developers. Its inclusion sort of suggests that you ruled out GDDR5 pretty early on in favour of ESRAM in combination with DDR3. Is that a fair assumption?


Nick Baker: Yeah, I think that's right. In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place.
Note the context. DF asked specifically about the eSRAM's role in deciding to rule out GDDR5 here. It was NOT a question premised on why the eSRAM was included to begin with. Baker is simply saying that there are lots of engineering benefits to eSRAM + DRAM. Goossen notes that their thinking on the eSRAM was very deliberately as an evolution of the 360's eDRAM.

Andrew Goossen: I just wanted to jump in from a software perspective. This controversy is rather surprising to me, especially when you view ESRAM as the evolution of eDRAM from the Xbox 360. No-one questions on the Xbox 360 whether we can get the eDRAM bandwidth concurrent with the bandwidth coming out of system memory. In fact, the system design required it. We had to pull over all of our vertex buffers and all of our textures out of system memory concurrent with going on with render targets, colour, depth, stencil buffers that were in eDRAM.


Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM.
Are you suggesting that the main reason for including esram in the XB1 had nothing to do with design goals and was purely driven by the fact they had edram in the XB360 and thus wanted to evolve that design into the XB1?
The bolded is a strawman. Their design goals consisted of a variety of things, but the specific inclusion, according to them, was as much if not moreso motivated by wanting to expand on the eDRAM's successes and make it more flexible.

They had a set of design goals and challenges with the XB1's memory system, these are already stated in the quote above.
You are confusing benefits with a priori design goals imho. We don't disagree that it solves tons of important challenges and helps achieve their design goals. What we disagree on is the motivating priority for its inclusion. My view is that they decided, immediately upon a post-mortem of the 360, that they wanted to amplify the eDRAM's successes and it happened to be a natural, perfect fit for accommodating other design goals drafted thereafter.

They decided that esram was the answer to those challenges and obviously leveraged their experience with the XB360's edram to implement the XB1's esram in a more flexible and useful fashion.
I disagree on the implied ordering of the decisions you've outlined here. This all would have been born out of a post-mortem on 360. Before they would even bother thinking seriously about design goals they would have looked very hard at what worked and what didn't on 360. My contention is that they likely knew they wanted eSRAM as a direct result of that before targeting bandwidth or power consumption metrics.

Obviously they are going to leverage their previous experience with a similar memory setup but that doesn't change the reasons why they went with a similar memory setup in the first place rather than something completely different. They didn't go into the XB1 design stage saying "we must have embedded memory and screw everything else" they'll have made the decision to include embedded memory because it was what they saw as the best answer to their design goals.
They went in only after doing a detailed post-mortem of the previous console and deciding they'd like the opportunity to expand on its architectural successes and erase some of its weaknesses. Do you really disagree with this?

Links, quotes and numbers showing this to be a genuine advantage over alternate memory configuration options?
You seem to be moving the goalposts as I never claimed to have that detailed of information. I said they mentioned the low latency in the dev docs. Specifically, they are cited as a boon to performance in the CB/DB's.

Durango dev docs said:
The advantages of ESRAM are lower latency and lack of contention from other memory clients—for instance the CPU, I/O, and display output. Low latency is particularly important for sustaining peak performance of the color blocks (CBs) and depth blocks (DBs).

http://www.vgleaks.com/durango-gpu-2/2/

Yes, you say it's a benefit. And yet when specifically asked if it will have an impact on GPU performance Nick Baker says this:
He doesn't actually say much of anything there. He doesn't deny it nor confirm it. Just says they haven't talked about it. Dev docs (above) evidently make note of it. How important is it? No idea. We have no details on it really at all afaik.

Do you have an explanation as to why he wouldn't extol the benefits of the low latency esram at this seemingly perfect moment to do so if they were as real as you claim?
Do you have an explanation for why the dev docs cited by VGLeaks evidently specifically call latency out as a direct benefit of the eSRAM? I seem to recall sebbbi detailing some reasons a while back about why it could be very useful. It's also cited in the Kinect GPGPU comment as being important there.

FWIW, I agree that it seems odd for Baker to not mention it at the ideal opportunity, but on the other hand we have the info from dev docs and ERP/sebbbi have both talked it up as a potential benefit iirc. There's also a patent from way back in March that seems to utilize the eSRAM's low latency very specifically along with 'multiple image planes' to do a different methodology for rendering tiled assets, but I don't have the patent link anymore and MS hasn't mentioned anything about it at all outside the patent itself.

I do recall seeing something along these lines but again a link would be good so that we can be sure the statement isn't being taken out of context. i.e. are they talking about a specific performance advantage afforded by the esrams comparative low latency in relation to a different form of graphics memory and are they stating the Kinect example as something very specific or as one general example of a much wider range of benefits.
Sure thing...here ya go, from the article we are discussing:

Andrew Goossen: I will say that we do have quite a lot of experience in terms of GPGPU - the Xbox 360 Kinect, we're doing all the Exemplar processing on the GPU so GPGPU is very much a key part of our design for Xbox One. Building on that and knowing what titles want to do in the future. Something like Exemplar... Exemplar ironically doesn't need much ALU. It's much more about the latency you have in terms of memory fetch [latency hiding of the GPU], so this is kind of a natural evolution for us. It's like, OK, it's the memory system which is more important for some particular GPGPU workloads.
 
Note the context. DF asked specifically about the eSRAM's role in deciding to rule out GDDR5 here. It was NOT a question premised on why the eSRAM was included to begin with. Baker is simply saying that there are lots of engineering benefits to eSRAM + DRAM. Goossen notes that their thinking on the eSRAM was very deliberately as an evolution of the 360's eDRAM.
I'm not sure why opinions on this matter are so polarised between you two. It strikes me that the solution was both issues pretty much simultaneously. When MS decided on lots of RAM, GDDR5 was pretty much out of the question. Meanwhile, they had a working eDRAM solution that the liked and already had ideas to build on. Ergo, pick the ESRAM solution. We can't place the ESRAM solution as MS's choice and they didn't consider GDDR5, because they explain why they didn't want GDDR5. And we can't say the reason for ESRAM was because GDDR5 was too hot and expensive, because MS said otherwise. Basically, MS said two different things, so both should be taken together. ;)

FWIW, I agree that it seems odd for Baker to not mention it at the ideal opportunity, but on the other hand we have the info from dev docs and ERP/sebbbi have both talked it up as a potential benefit iirc.
There was open speculation on what ESRAM would provide regards low latency, but I take this interview as nailing those ideas on the head. We've never had a metric for that 'low latency', and we don't have MS saying it is of benefit to their system. It would definitely have come up as part of the engineer's design decision if it was part of it. The interview even goes so far to say...

Digital Foundry: There's been some discussion online about low-latency memory access on ESRAM. My understanding of graphics technology is that you forego latency and you go wide, you parallelise over however many compute units are available. Does low latency here materially affect GPU performance?

Nick Baker: You're right. GPUs are less latency sensitive. We've not really made any statements about latency.
That reply will be open to interpretation, but in my opinion it reads as, "yeah, GPUs are good at hiding latency. We never really said anything about GPU latency." The benefits of their memory system's latency will be negligible such that they don't bother mentioning it. There could possibly be GPGPU benefits, but it'll be guesswork at this point (again, we've no idea what constitutes 'low latency'). There's certainly no hard evidence that the ESRAM provides a latency benefit to any activities.
 
Back
Top