Understanding XB1's internal memory bandwidth *spawn

No, they cited a whole bunch of benefits (Baker did) while Goossen later on suggested that it was in place by virtue of the successes of the 360's eDRAM. There's a difference between Baker saying "it solves a whole bunch of our design goals" and Goossen saying "we view it as an extension of the eDRAM in 360". There's an implied timeline there that seems to run counter to the one you've described.

Keep reading. Goodden echoes the point I made about context. When you view the eSRAM's inclusion as the result of 8GB of DDR3, you do so incorrectly, and the context of that improper perspective colors the interpretation of its inclusion. Read what Baker said again, this time in context of the questioning...

Note the context. DF asked specifically about the eSRAM's role in deciding to rule out GDDR5 here. It was NOT a question premised on why the eSRAM was included to begin with. Baker is simply saying that there are lots of engineering benefits to eSRAM + DRAM. Goossen notes that their thinking on the eSRAM was very deliberately as an evolution of the 360's eDRAM.

The bolded is a strawman. Their design goals consisted of a variety of things, but the specific inclusion, according to them, was as much if not moreso motivated by wanting to expand on the eDRAM's successes and make it more flexible.

You are confusing benefits with a priori design goals imho. We don't disagree that it solves tons of important challenges and helps achieve their design goals. What we disagree on is the motivating priority for its inclusion. My view is that they decided, immediately upon a post-mortem of the 360, that they wanted to amplify the eDRAM's successes and it happened to be a natural, perfect fit for accommodating other design goals drafted thereafter.

I disagree on the implied ordering of the decisions you've outlined here. This all would have been born out of a post-mortem on 360. Before they would even bother thinking seriously about design goals they would have looked very hard at what worked and what didn't on 360. My contention is that they likely knew they wanted eSRAM as a direct result of that before targeting bandwidth or power consumption metrics.

They went in only after doing a detailed post-mortem of the previous console and deciding they'd like the opportunity to expand on its architectural successes and erase some of its weaknesses. Do you really disagree with this?

As Shifty said, both would have influenced their decision to use esram. The fact that Microsoft already has a lot of experience with embedded memory and no doubt had lots of ideas on how to expand on and improve that solution would no doubt have influenced them to go with it. However I still maintain that ultimately the deciding factor would have been the requirement to have a large main memory pool. If the XB1 was only ever going to target 2GB of main memory then I think they'd have been far more likely to just go with a high speed GDDR5 interface and be done with it. Afterall, all the improvements that the esram brings in usability over the edram are only aimed at reducing the useability deficit of such a memory configuration compared to a single pool of high speed memory.

The other 2 potential benefits are bandwidth (which according to the initial spec was only going to be on par with a fast pool of GDDR5 anyway) and latency - which the engineers practically went out of their way to avoid talking about.

Your argument hinges on the fact that they would have designed the XB1 by looking at the XB360 design and evolving it (which no doubt would have been part of the equation). But technology is different today than it was in 2005. In 2005 if you wanted high bandwidth from a single memory pool without embedded memory you'd have had to use a prohibitively expensive 256bit memory interface. Today, that's a viable option and so the correct high level choices for 2005 would not necessarily be the correct high level choices for today and thus some element of 'going back to the drawing board' would certainly have been required.

You seem to be moving the goalposts as I never claimed to have that detailed of information. I said they mentioned the low latency in the dev docs. Specifically, they are cited as a boon to performance in the CB/DB's.

http://www.vgleaks.com/durango-gpu-2/2/

My point in asking for that level of detail was to specifically prevent a vague reference being held up as proof of some kind of notable esram latency advantage.

The vgleaks article does reference lower latency. But do we know that's come directly from Microsoft and isn't just a VGleaks assumption about embedded memory? And even if it does come direct from Microsoft, without specific numbers and in light of the downplaying of any latency advantages in the recent interview, how do we know that the stated advantage is significant?

He doesn't actually say much of anything there. He doesn't deny it nor confirm it.

Which in an interview specifically aimed at extolling the advantages of the XB1 architecture is extremely telling IMO.

I seem to recall sebbbi detailing some reasons a while back about why it could be very useful. It's also cited in the Kinect GPGPU comment as being important there.

FWIW, I agree that it seems odd for Baker to not mention it at the ideal opportunity, but on the other hand we have the info from dev docs and ERP/sebbbi have both talked it up as a potential benefit iirc.

I think the key word there is potential. Those guys obviously know what they are talking about so if they say there's a specific advantage of a low latency embedded memory solution then you can take that to the bank. However I don't think anyone has said there is an advantage so far. Merely that if the memory is truly low latency enough then it would have these advantages. The key is whether that low latency is real and it's now starting to look as though it may not be. At least not to a significant enough extent to be notable when questioned about it's performance adding potential.

Sure thing...here ya go, from the article we are discussing:

The highlighted part of that quote (below) has me a little confused. Are they talking specifically about low latency memory or are they talking generally about the latency hiding abilities of GPU's (large number of threads etc..) being the key performance driver for this particular GPGPU application?

Andrew Goossen said:
: I will say that we do have quite a lot of experience in terms of GPGPU - the Xbox 360 Kinect, we're doing all the Exemplar processing on the GPU so GPGPU is very much a key part of our design for Xbox One. Building on that and knowing what titles want to do in the future. Something like Exemplar... Exemplar ironically doesn't need much ALU. It's much more about the latency you have in terms of memory fetch [latency hiding of the GPU], so this is kind of a natural evolution for us. It's like, OK, it's the memory system which is more important for some particular GPGPU workloads.
 
I'm not sure why opinions on this matter are so polarised between you two. It strikes me that the solution was both issues pretty much simultaneously. When MS decided on lots of RAM, GDDR5 was pretty much out of the question. Meanwhile, they had a working eDRAM solution that the liked and already had ideas to build on. Ergo, pick the ESRAM solution.

I agree entirely with this part. I am likewise arguing that the context of how this decisions to include eSRAM was made radically colors the interpretation many online have about the design considerations MS has made. I want to apply pressure to those whose posts reflect a narrative that gets spun into something it isn't, since that may improperly color the discussion's context going forward.

There was open speculation on what ESRAM would provide regards low latency, but I take this interview as nailing those ideas on the head.

It's not cut and dry. He just says they haven't talked about it one way of the other. If that was all the info we had, I'd agree. But we also have the info from the leaked dev docs and it would seem some programmers on the forum here were easily able to think up uses for exploiting it. All of this taken together doesn't explain why Baker didn't talk about it, but it DOES suggest that we shouldn't take his response as ending that discussion. If he had come out against it I would err on the side of caution and lean towards the DF interview. But he didn't do that either, so to me it's still up in the air.

It could also be something the software guys would look to exploit whereas Baker is a hardware guy. The MS patent I referenced earlier was software-based, ERP/sebbbi are software guys obviously, so perhaps it's a potential benefit that played no role in Baker's views on its inclusion. Goossen (another software guy) noted its utility with Kinect, for instance.

We've never had a metric for that 'low latency', and we don't have MS saying it is of benefit to their system.

I quoted the dev docs. So yes, we certainly do have MS saying as much. It's also cited by Goossen vis a vis Exemplar/Kinect. I think you guys should pay more attention to the entire interview and not focus only on that singular quote and Baker's short response.

There's certainly no hard evidence that the ESRAM provides a latency benefit to any activities.

For Kinect it does. It is explicitly cited by Goossen in the article.
 
That rate can't even be achieved with just ESRAM because he's talking about 164GB/s or write only so it outstrip the 109GB/s max for writing to ESRAM. It is not an example of a usage scenario that results in 150GB/s combined read+write for the ESRAM alone.

You're wrong. Read their Example (typical game scenario) again:

The relationship between fill-rate and memory bandwidth is a good example of where balance is necessary. A high fill-rate won't help if the memory system can't sustain the bandwidth required to run at that fill rate. For example, consider a typical game scenario where the render target is 32bpp [bits per pixel] and blending is disabled, and the depth/stencil surface is 32bpp with Z enabled. That amount to 12 bytes of bandwidth needed per pixel drawn (eight bytes write, four bytes read). At our peak fill-rate of 13.65GPixels/s that adds up to 164GB/s of real bandwidth that is needed which pretty much saturates our ESRAM bandwidth. In this case, even if we had doubled the number of ROPs, the effective fill-rate would not have changed because we would be bottlenecked on bandwidth. In other words, we balanced our ROPs to our bandwidth for our target scenarios. Keep in mind that bandwidth is also needed for vertex and texture data as well, which in our case typically comes from DDR3.

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
 
As Shifty said, both would have influenced their decision to use esram. The fact that Microsoft already has a lot of experience with embedded memory and no doubt had lots of ideas on how to expand on and improve that solution would no doubt have influenced them to go with it. However I still maintain that ultimately the deciding factor would have been the requirement to have a large main memory pool.

They were "originally" considering something with 4GB of DDR4 RAM, as per Yukon. I'm not saying that's something they really dug into in a serious fashion, but it was under some sort of consideration prior to 8GB of DDR3. If the agenda at that point was tons of cheap RAM, 4GB of DDR4 isn't really fitting the bill imho. It doesn't fit in well with either of those goals. Yet, even at that point in mid-2010 they were already settled on embedded RAM.

They wanted AT LEAST 32MB of either eDRAM or eSRAM, as per the Yukon leak. So I can't agree with you here. Which is fine. We can agree to disagree. Just noting that we actually do have a specific data point to work with that shows they were already committed to embedded RAM prior to "lots" of "cheap" RAM becoming decisive for their goals.

The vgleaks article does reference lower latency. But do we know that's come directly from Microsoft and isn't just a VGleaks assumption about embedded memory?
We could ask bkilian about that perhaps, but I do recall him saying in the past that it was pretty much verbatim. Their PS4/Orbis articles lately would bear that out to a tee as well, where they showed their info from their older articles came verbatim from Sony's dev docs and there was no paraphrasing on their part really at all as best anyone can tell.

And even if it does come direct from Microsoft, without specific numbers and in light of the downplaying of any latency advantages in the recent interview, how do we know that the stated advantage is significant?
See my comment to Shifty on that. It could well be related to the question being posed to a hardware guy and latency not being a major aspect of the eSRAM's inclusion from his pov, yet for software guys it may be obvious how to exploit it in meaningful ways. The patent I noted was software-based, Goossen noted a Kinect software application that you seem to ignore for whatever reason, ERP/sebbbi offered their own ideas as software guys too iirc. I'd be more willing to let that go if Baker had outright killed it in his response but he doesn't even address it at all, one way or the other. So are we to ignore Goossen and the dev docs in light of Baker's response? I say no. Instead I'd like to see DF press a software guy there on that specifically.

Which in an interview specifically aimed at extolling the advantages of the XB1 architecture is extremely telling IMO.
Like I told Shifty, I'd agree if there weren't other reasons to cautiously avoid that conclusion already out there.

The highlighted part of that quote (below) has me a little confused. Are they talking specifically about low latency memory or are they talking generally about the latency hiding abilities of GPU's (large number of threads etc..) being the key performance driver for this particular GPGPU application?
Seems to say Exemplar needs quick memory fetches to work efficiently. Bear in mind the part you bolded was DF's commentary, not Goossen's.
 
Last edited by a moderator:
It could also be something the software guys would look to exploit whereas Baker is a hardware guy. The MS patent I referenced earlier was software-based, ERP/sebbbi are software guys obviously, so perhaps it's a potential benefit that played no role in Baker's views on its inclusion.
A fair argument, but I'll add that discussion has be mostly hypothetical. The likes of ERP and Sebbbi are theorising benefits (well, Sebbbi could be talking from experience!), and that was predicated on an understanding that 'low latency' meant 'low latency' which, I repeat, we still don't know what latency the ESRAM actually is. It's an unqualified term. For some people, it means an order of magnitude lower access latency. For others, it could be 20% lower latency. Without any idea how many ns/clock cycles we're actually talking about here, it's hard to discuss the impact of ESRAM's latency.

I quoted the dev docs. So yes, we certainly do have MS saying as much. It's also cited by Goossen vis a vis Exemplar/Kinect. I think you guys should pay more attention to the entire interview and not focus only on that singular quote and Baker's short response.
Well I haven't read the article yet :)shock:) so I may have missed something. But I'm talking about the 'whole picture'. We have...

The advantages of ESRAM are lower latency and lack of contention from other memory clients—for instance the CPU, I/O, and display output. Low latency is particularly important for sustaining peak performance of the color blocks (CBs) and depth blocks (DBs).
...from a long time ago. Now we have no further discussion on that in this remarkably open and in-depth interview. Why no word on the advantages of the low latency ESRAM in CB and DB performance if they really benefit? It doesn't make sense to miss out an advantage like that in this interview. Surely the response to Leadbetter's question would be more like...

Digital Foundry: There's been some discussion online about low-latency memory access on ESRAM. My understanding of graphics technology is that you forego latency and you go wide, you parallelise over however many compute units are available. Does low latency here materially affect GPU performance?
Nick Baker: You're right. GPUs are less latency sensitive. However, there are some specific workloads, such as colour and depth workloads on their respective blocks, that can benefit from ESRAM's lower latency and increase efficiency within our target low-power profile.
I cannot reconcile the response with any graphics advantage.


For Kinect it does. It is explicitly cited by Goossen in the article.
I'm reading that differently to you...

Goosen said:
Exemplar ironically doesn't need much ALU. It's much more about the latency you have in terms of memory fetch [latency hiding of the GPU], so this is kind of a natural evolution for us.
...means, "using GPGPU for Kinect required more managing the GPU's innate latency-hiding memory systems than using massive parallel FPU throughput." I don't see anything there saying a low latency memory store is beneficial to GPGPU above and beyond what GPUs are used to working with.
 
See my comment to Shifty on that. It could well be related to the question being posed to a hardware guy and latency not being a major aspect of the eSRAM's inclusion from his pov, yet for software guys it may be obvious how to exploit it in meaningful ways.

...

So are we to ignore Goossen and the dev docs in light of Baker's response? I say no. Instead I'd like to see DF press a software guy there on that specifically.

I think we've both put our arguments forward sufficiently at this point and there's no point beating this one over the head until further evidence presents itself. I did however just want to point out one thing from your post above, and thats that Goosen was present throughout the interview. He had as much opportunity to respond to that question as Baker did (more than once they both gave their own perspective on the same question) yet he chose not to. That should count for something.
 
I'm reading that differently to you...

...means, "using GPGPU for Kinect required more managing the GPU's innate latency-hiding memory systems than using massive parallel FPU throughput." I don't see anything there saying a low latency memory store is beneficial to GPGPU above and beyond what GPUs are used to working with.

The part about GPU hiding latencies is DF's commentary, not Goossen's.
 
I'd like to make a point guys that DF only had the one phone call interview with Baker & Goosen & they have decided to release it piecemeal. MS are beholden to DF & Richard to release the details of that interview & so far we can't be sure how much he's released. MS has no control of what or when they release it. This is one reason why I wished they had recorded a video of the interview. As it stands there is doubt as to what came out or didn't come out of that interview.

Tommy McClain
 
Last edited by a moderator:
I did however just want to point out one thing from your post above, and thats that Goosen was present throughout the interview. He had as much opportunity to respond to that question as Baker did (more than once they both gave their own perspective on the same question) yet he chose not to. That should count for something.

So....you are willing to ignore the dev docs and Goossen's explicit note about Kinect utilizing the low latency, two reliable facts presented to you, in favor of your own interpretation of what someone else could have said in a given opportunity?

You and Shifty are both on the wrong side of the logic here guys. I understand skepticism and share it in terms of wanting more details, but you can't rationalize swapping out reliable facts from the horse's mouth with your own circumstantial efforts to read between the lines. Come on now. :???:

Simply noting how odd it was that they didn't go into detail on the matter doesn't magically erase the dev docs nor Exemplar's utilization.

I also found this quote that we both missed, though not necessarily referring to eSRAM at all, but worth noting:

Baker said:
We wanted to have a single chip from the start and get everything as close to memory as possible. Both the CPU and GPU - give everything low latency and high bandwidth - that was the key mantra.

And another from Goossen:

Goossen said:
There's also quite a number of other design aspects and requirements that we put in around things like latency, steady frame-rates and that the titles aren't interrupted by the system and other things like that. You'll see this very much as a pervasive ongoing theme in our system design.

Idea's what they are referring to in those quotes?

The more I re-read Baker's other comment the more it sounds like he is referring moreso to GPU utilization and not eSRAM. DF opens their statement noting discussions online about eSRAM latency, but the other sentences are about GPU handling latency which is what Baker directly responds to. So I'm growing more apathetic to the assumptions that he was directly addressing the eSRAM latency. Seems more like he merely confirms how most GPU's are designed, including X1's.
 
I'd like to make a point guys that DF only had the one phone call interview Baker & Goosen & they have decided to release it in piecemeal. MS are beholden to DF & Richard to release the details of that interview & so far we can't be sure how much he's released. MS has no control of what or when they release it. This is one reason why I wished they had recorded a video of the interview. As it stands there is doubt as to what came out or didn't come out of that interview.

Tommy McClain

DF claims the latest article was the full transcript.
 
Again, the specific 91%+ scenario is to specific functions; has their been a percentage estimation of full bandwidth utilization from actual titles running? This is a question.
As for the stated tests/real apps, I would assume that actual titles they have would be the source of those measures; Forza, Ryse, etc..
What do you mean with running an actual title?
An actual title will never use continously 150+GB/s of eSRAM bandwith or it means it looks to be severely bandwidth limited (as the requirements usually spike a lot over the course of a frame).
Furthermore, it gets really hard to quantify the used bandwidth exactly outside of slightly more synthetic tests as a lot of factors kick in like the often quite efficient Z compression (that works also without MSAA) and caching in the ROPs for instance. And for a real game you have also likely overdraw from the same batch of geometry at some times during your rendering where the ROP caches can help a bit to reduce the external bandwidth requirements. Or an early Z test kicks the fragments out of the pipeline even before the pixel shader! For texture or buffer accesses it is even more important as the relevant caches are larger. You basically need synthetic test cases to minimize the effect of the caches and other "efficiency helpers" in the GPU to get reliable bandwidth numbers. Otherwise you measure a blend of cache bandwidth and external bandwidth. You can compare that with the latency measurements on CPUs where the prefetchers got smarter and basically broke a lot of the latency benchmarks at some point. While this is of course good for the performance (as they also work in real software), it may tell you not everything about latency or bandwidth in this case (and the PS4 has more/larger caches).
So my point is about what seems to be honest talk of real average, not specific tests into bandwidth measurement of which I am sure someone at MS can engineer near peak situations as well. No fanboy argument, actually pushing to look at this past fanboy interpretation.
The problem is that everything is more complicated in the real world. ;)
One simply can't give an simple average as it varies a lot with the use case. And if one makes up some conditions to derive some kind of "average" it will be a not very meaningful number, as under different conditions it will look different. And taking one game isn't going to help much, as another one using different rendering techniques will have different needs. And as devs will tune their engines to use what is there, this will probably lead to a shift in the observed "average".

Edit:
Sorry Shifty. I guess it should have gone to the other thread?
 
Last edited by a moderator:
So....you are willing to ignore the dev docs and Goossen's explicit note about Kinect utilizing the low latency, two reliable facts presented to you, in favor of your own interpretation of what someone else could have said in a given opportunity?

No, I already addressed both of those in a previous post. And it seems to me that it's you that's applying a specific interpretation to both of those sources to suit your world view. An interpretation which you state as fact but is at best debatable.

You and Shifty are both on the wrong side of the logic here guys. I understand skepticism and share it in terms of wanting more details, but you can't rationalize swapping out reliable facts from the horse's mouth with your own circumstantial efforts to read between the lines. Come on now. :???:

You've made your view quite clear Astro. It clearly isn't shared by everyone and is certainly not as factual as you try to imply it is. So why don't we just agree to disagree and move on.

Simply noting how odd it was that they didn't go into detail on the matter doesn't magically erase the dev docs nor Exemplar's utilization.

It's a little more than odd and while not in itself proof that there are no major latency advantages, it certainly raises serious doubts about that argument.

Regarding your other two sources, as mentioned in earlier posts, one is a generic statement with no specific metrics given to indicate significance which may or may not have been VGL's own speculation and the other could quite easily have been a reference to the GPU's natural latency hiding qualities (your factually stated opinion to the contrary not withstanding).

I also found this quote that we both missed, though not necessarily referring to eSRAM at all, but worth noting:

It seems to be quite clearly talking about the advantages of an APU with shared memory and not specifically low latency esram.

And another from Goossen:

Saying that latency was one of the design aspects they considered is hardly proof or even particularly useful evidence of the esram having game changing low latency. Latency is a problem that must be addressed in almost every aspect of the system and it's no surprise that they would have worked to keep it to a minimum in every area. It doesn't specifically mean that they implemented a very low latency embedded memory pool.

And in fact, if that is what it meant, then why didn't Goossen jump in and say that when that specific question was asked?
 
What do you mean with running an actual title?
An actual title will never use continously 150+GB/s of eSRAM bandwith or it means it looks to be severely bandwidth limited (as the requirements usually spike a lot over the course of a frame).
Furthermore, it gets really hard to quantify the used bandwidth exactly outside of slightly more synthetic tests as a lot of factors kick in like the often quite efficient Z compression (that works also without MSAA) and caching in the ROPs for instance. And for a real game you have also likely overdraw from the same batch of geometry at some times during your rendering where the ROP caches can help a bit to reduce the external bandwidth requirements. Or an early Z test kicks the fragments out of the pipeline even before the pixel shader! For texture or buffer accesses it is even more important as the relevant caches are larger. You basically need synthetic test cases to minimize the effect of the caches and other "efficiency helpers" in the GPU to get reliable bandwidth numbers. Otherwise you measure a blend of cache bandwidth and external bandwidth. You can compare that with the latency measurements on CPUs where the prefetchers got smarter and basically broke a lot of the latency benchmarks at some point. While this is of course good for the performance (as they also work in real software), it may tell you not everything about latency or bandwidth in this case (and the PS4 has more/larger caches).
The problem is that everything is more complicated in the real world. ;)
One simply can't give an simple average as it varies a lot with the use case. And if one makes up some conditions to derive some kind of "average" it will be a not very meaningful number, as under different conditions it will look different. And taking one game isn't going to help much, as devs will tune their engines to use what is there, which will probably lead to a shift in the observed "average".

Thanks for the dive. I see where you are coming from and we do agree about dev to dev and further tuning.
 
Last edited by a moderator:
And another from Goossen:
Idea's what they are referring to in those quotes?
An enumeration that includes frame rate, user interface interruptions, and latency... it looks like that latency is in milliseconds, about human perceivable latency. They are trying to correct the huge latency issues of kinect-1? Perhaps?
 
If bandwidth is measured in actual games it's invalid. (and ms pr & fud)
If bandwidth is measured using synthetic tests it's invalid. (and ms pr & fud)
So any takers for a third way also? (and ms pr & fud)
 
DF claims the latest article was the full transcript.

Awww. My bad. Seems like doing to the first couple of articles before the full transcript didn't help MS out so much(first impressions and all), but I guess it helped DF get some much wanted traffic.

Tommy McClain
 
If bandwidth is measured in actual games it's invalid. (and ms pr & fud)
If bandwidth is measured using synthetic tests it's invalid. (and ms pr & fud)
So any takers for a third way also? (and ms pr & fud)

How about numbers from MS when on a PR junket are not gospel? They are playing up their strength and talking down their weakness, as they should . Don't expect to get any specific information from them, it is not in their interest. The same is true for Sony. Wait for developers to talk off the record.
 
An enumeration that includes frame rate, user interface interruptions, and latency... it looks like that latency is in milliseconds, about human perceivable latency. They are trying to correct the huge latency issues of kinect-1? Perhaps?

Maybe? Hard to glean anything about what kind of latency is being referenced there but that could well be it for Goossen's quote. They did mention it was lower (VGLeaks says down to 60ms for Kinect, HotChips suggests 20ms but unsure what that figure is actually referencing).
 
As Shifty once said, you don't have to respond to people all the time. So I'll leave it at that :)
 
Last edited by a moderator:
How about numbers from MS when on a PR junket are not gospel? They are playing up their strength and talking down their weakness, as they should . Don't expect to get any specific information from them, it is not in their interest. The same is true for Sony. Wait for developers to talk off the record.

Applying this logic to the PS4 leads us to abandon absolutely all known info about these machines without prejudice. Congrats, you've taken us back to the stone age of 2012.
 
Back
Top