The pros and cons of eDRAM/ESRAM in next-gen

One thing to note that is often "twisted" by some people is that even though eSRAM+DDR3 doesn't bring any advantages over the setup PS4 has, we need to keep in mind that without eSRAM, the XB1 probably would perform a lot worse, so eSRAM, in a sense, is designed to reduce that gap between DDR3 and GDDR5 and that's exactly what it does.
 
According to the latest DF interview, the Xbox one has major contention issues between GPU and CPU - seemingly like the PS4 does. Unlike the PS4, however, it doesn't have the main memory BW to comfortably accommodate this. So it's a problem.

Without esram the X1 would be completely boned.
 
One thing to note that is often "twisted" by some people is that even though eSRAM+DDR3 doesn't bring any advantages over the setup PS4 has, we need to keep in mind that without eSRAM, the XB1 probably would perform a lot worse, so eSRAM, in a sense, is designed to reduce that gap between DDR3 and GDDR5 and that's exactly what it does.
Although true, the subject is 'pros and cons of eDRAM/ESRAM' and not 'compare and contrast XB1 and PS4 architectures'. Pros and cons for XB1's implementation is pretty much nil. Actually, there's a con in having to faff about with ESRAM size. There are no immediate, significant pros. There may be a price pro down the line.
 
Shifty I'm not sure about the complete lack of difference, although there's no factual data about ESRAM latencies and such to decide on the matter. I think we should review sebbi's comments on these subjects.
 
Shifty I'm not sure about the complete lack of difference, although there's no factual data about ESRAM latencies and such to decide on the matter.
That's precisely the point. There's a hypothetical difference, but it's completely unprovable and undiscussable. We can't really talk about XB1's ESRAM latency advantage when we have no details about it. With the evidence at hand, we have a pool of RAM that brings no advantages. There's a possibility, I repeat, that it improves some areas of performance, but we can't actually state that as a benefit because we can't determine its actual advantage, especially given no-one's ever mentioned the latency figure.

Whether one believes XB1's ESRAM has a practical latency performance advantage in some areas or not basically comes down to faith at this point.

Now for the wider topic, there's a discussion to be had about ESRAM and low latencies if we give hypothetical numbers, if anyone wants to look into that rather than XB1's particular implementation.
 
sebbbi has been intentionally guarded to protect himself and Red Lynx from the internets, but it does seem that they're able to extract good performance in BW bound situations using the esram.

It seems difficult to gauge the pros / cons of the esram in the Xb1 as there's a lot going on beneath the surface that seems to be affecting performance in other areas of the system. One pro that we can be sure of is that that allows for high BW without being tied to boutique memory on a phat bus - how much of a pro that actually is seems up for debate though.

It should also allows for higher bandwidth operations at relatively lower power consumption. How much of a benefit that actually is in the market place is up for debate too, I guess.

64 MB of esram would have allowed for much phatter g-buffers and for texturing to be done from esram. I suspect that it would have made simple evaluation of esram seem far more positive, though the economics behind the scene might have been less positive.

On 20nm - with little in the way of power saving at the same clocks over 28nm but much higher density, esram might have seemed like a far more compelling option. No-one would want to stick a 512-bit GDDR5 bus in a console, surely ...?
 
Not Xbox 1.5; a hypothetical 20nm console

Just wondering out loud, but on 20nm 32 MB of esram would be about ~40mm^2 (roughly scaling down from 28nm TSMC). Now I'm going to propose doubling the shader count of Xbox 1, so minus 2 CUs for redundancy, there would be 1664 shaders.

Continuing on with this naive line of thinking, assuming we were going to shoot for a 200W console and so could keep clocks roughly were they are now, you'd probably need more BW than a 256 bit GDDR bus could reasonably give you, especially given the CPU contention issues which seem to be affecting this generation of consoles.

Now, if you were to design the esram to be used for only partial residency of buffers (as Xbox esram can be used), you possibly wouldn't even need any more than 32 MB so long as your main memory was also pretty fast. You could stick with 32 MB, with the knowledge that half (or so) your buffer bandwidth could be swallowed up by 40 mm^2 of esram and stay with 256-bit GDDR5 instead of going up to 384 or 512 bit.

Under these condition of partial residency and being at the limit of reasonable GDDR5 bus size, the esram continues to make sense. At least, according to me, here it could.
 
One pro that we can be sure of is that that allows for high BW without being tied to boutique memory on a phat bus - how much of a pro that actually is seems up for debate though.

It should also allows for higher bandwidth operations at relatively lower power consumption. How much of a benefit that actually is in the market place is up for debate too, I guess.
In terms of 'pros', I think it's relative to alternative options. eg. ESRAM provides contention free bandwidth for the GPU as the CPU doesn't really read it, but that could be achieved by split RAM pools a la PC. If eDRAM/ESRAM can achieve the same results as another memory architecture at reduced cost, or can achieve superior performance (crazy bandwidth) to another architecture, is what I understand the question to be.
 
Just wondering out loud, but on 20nm 32 MB of esram would be about ~40mm^2 (roughly scaling down from 28nm TSMC). Now I'm going to propose doubling the shader count of Xbox 1, so minus 2 CUs for redundancy, there would be 1664 shaders.

Continuing on with this naive line of thinking, assuming we were going to shoot for a 200W console and so could keep clocks roughly were they are now, you'd probably need more BW than a 256 bit GDDR bus could reasonably give you, especially given the CPU contention issues which seem to be affecting this generation of consoles.

Now, if you were to design the esram to be used for only partial residency of buffers (as Xbox esram can be used), you possibly wouldn't even need any more than 32 MB so long as your main memory was also pretty fast. You could stick with 32 MB, with the knowledge that half (or so) your buffer bandwidth could be swallowed up by 40 mm^2 of esram and stay with 256-bit GDDR5 instead of going up to 384 or 512 bit.

Under these condition of partial residency and being at the limit of reasonable GDDR5 bus size, the esram continues to make sense. At least, according to me, here it could.

If your main memory is "pretty fast", lets say in the realm of 130GB/s or so for the current Xbox 1, I think they could have just scrapped the eSRAM altogether.
 
As I explained in the post, I wasn't talking about Xbox 1 (or PS4). ;)

Where 130 GB/s (or slightly more for higher clocked RAM) wouldn't be enough, you either have to increase bus width or put some (or all) of your buffers in embedded memory.

For a console, I think anything significantly faster than the PS4Bone would absolutely require embedded memory. I was a little surprised (and impressed) that Sony went with 256-bit, tbh.

Until HBM comes online embedded memory is the only option.
 
If your main memory is "pretty fast", lets say in the realm of 130GB/s or so for the current Xbox 1, I think they could have just scrapped the eSRAM altogether.

With 20/20 hindsight of Kinect's flop, going with a 192-bit bus for 6GB of GDDR5 and same CU count would have given them a considerably cheaper design. Yes, it would still be substantially weaker, but if it had been price $299 at launch (without kinect), I think the X1 would be in a much better position today.
 
If your main memory is "pretty fast", lets say in the realm of 130GB/s or so for the current Xbox 1, I think they could have just scrapped the eSRAM altogether.

and you are always forgetting cross access to the ram. Esram ist really fast and exclusive. And yes latencies are much better than dram, that is something that gets important when more gpgpu stuff gets to the gpu. Also if you want to quickly swap the memory lower latencies are better, so you loose less cycles.

esram now combines the best of both worlds, exclusive access and massive bandwith. And it is small enough so developers care about what they stuff into it. Well a little bit bigger would have been better, but it is much better than nothing (than no exclusive memory pool).

With 20/20 hindsight of Kinect's flop, going with a 192-bit bus for 6GB of GDDR5 and same CU count would have given them a considerably cheaper design. Yes, it would still be substantially weaker, but if it had been price $299 at launch (without kinect), I think the X1 would be in a much better position today.

there is a reason why amd no longer supports gddr5 ram with its apus. It is just not ideal for CPUs.
 
Last edited by a moderator:
there is a reason why amd no longer supports gddr5 ram with its apus. It is just not ideal for CPUs.

I thought that was because Intel had no interest in LP GDDR5, and so everyone dropped it, leaving AMD with Kaveri ready and waiting for memory that would never come?

Kaveri would be a beast of an APU with GDDR5.
 
With 20/20 hindsight of Kinect's flop, going with a 192-bit bus for 6GB of GDDR5 and same CU count would have given them a considerably cheaper design. Yes, it would still be substantially weaker, but if it had been price $299 at launch (without kinect), I think the X1 would be in a much better position today.

Pretty debatable imo. 6GB GDDR5 should be massively more expensive than 8GB DDR3. Even half the amount of GDDR5 should be well more expensive. In other words even 16GB of DDR3 should be well cheaper than 8GB of GDDR5. Have you ever noticed GDDR5 is treated like gold on graphics cards? Hell Nvidia is still shorting people on their $600+ cards (my brother has a Geforce 780 with 3GB of RAM as an example, $650+ card at the time, only 3GB VRAM). And Nvidia has been shorting people on RAM for years (ask all those 570/580 Ti owners now crippled with <=1.5GB)

Then you would save the ESRAM off the die, which may not be that big a deal. The die is the thing that will cost reduce best anyway, and the ESRAM is likely highly redundant for easy fabbing moreso than logic. You would probably end up imo with something like a 275-300mm^2 die anyway. With 6 more CU's, PS4 GPU is ~350mm^2.

Plus then you have 2GB less RAM, which might not be ideal from a PR standpoint or otherwise. MS wants a lot of RAM for their non-gaming shenanigans too. At LEAST 2GB I'm sure, which would have left <=4GB for games.

If MS wanted to salvage this (ESRAM) design they should have just enabled the two redudant CU's and upclocked as far as they could imo. That could have easily got them to 1.5 TF+ (14 CU's at 853)and probably erased much of the competition compute lead, enough to blur the lines (more) sufficiently anyway.

I'm guessing this discussion is probably going to be deemed too business-y though :devilish:

Shifty always wants specifics, well, imo we need more specific knowledge of the cost of GDDR5 vs DDR3 imo. Because that's the main advantage of ESRAM really, or in theory. And that seems like something that could be more easily investigated (though still difficult) than say, ESRAM latency. Unless this is only about technical advantages/disadvantages.

There's also the fact though, the trace routing is supposed to be more complex for DDR3, leading to possibly more expensive motherboard, but I suspect that's overblown.
 
and you are always forgetting cross access to the ram. Esram ist really fast and exclusive. And yes latencies are much better than dram, that is something that gets important when more gpgpu stuff gets to the gpu.
Any details on this beyond the noncommittal answers from Microsoft?

And it is small enough so developers care about what they stuff into it.
That's one way to look at it.
I prefer that my devs be carefree.

there is a reason why amd no longer supports gddr5 ram with its apus. It is just not ideal for CPUs.
It never did outside of Orbis, although it is alleged that if Elpida hadn't gone under the GDDR5M standard could have made it possible.
Standard GDDR5 doesn't support being put on a DIMM, and there are cost and capacity issues for any reasonable bus width.
The devices themselves are not appreciably different in terms of latencies relative to DDR3, and the data sheets for devices equivalent to the ones used for main memory for Durango and Orbis have been compared in this forum.
 
What about the APU cost savings from zero SDRAM? Mush smaller die, higher yields, less heat. Then there is development costs associated with a less straight forward architecture.
 
What about the APU cost savings from zero SDRAM? Mush smaller die, higher yields, less heat.

I think you mean ESRAM.
The smaller die would be true as long as nothing else changed about the chip.
Higher yields is a qualified yes, if absolutely nothing else is put in its place. In general, SRAM is very fault tolerant, so not all additional area is created equal in terms of yields.
Less heat is only true if no attempt is made to increase main memory bandwidth.

An equivalent external DRAM bus is going to burn significantly more power than one that is purely on die.
 
What about the APU cost savings from zero SDRAM? Mush smaller die, higher yields, less heat. Then there is development costs associated with a less straight forward architecture.

If the iSuppy teardown is remotely accurate, then the die area used by the esram might be costing $20~$30?

It shouldn't impact yields much as it's fault tolerant, but it will increase chip prices by virtue of fewer chips per wafer.

I don't think esram is the Xbox 1's issue though. It's there to take a chunk of the high bandwidth work and it's clearly very suited to this. The problem seems to be the weakness of the main memory access, where contention has clearly been hurting.

And what do you do if your CPU and GPU are choking each other, and you lack the tools to know exactly what's happening and where? You reduce the GPU load by cutting the resolution. 1080p am cry ...
 
If MS wanted to salvage this (ESRAM) design they should have just enabled the two redudant CU's and upclocked as far as they could imo. That could have easily got them to 1.5 TF+ (14 CU's at 853)and probably erased much of the competition compute lead, enough to blur the lines (more) sufficiently anyway.


And yet they didn't. That's telling in its own right. Whatever the case is MS clearly did not care about being down on power compared to its competitors. If it was purely a calculated cost loss on 14CU vs12CU in terms of yield that must have been some really long term calculation for them not to go on it. That and also the amount of additional clock they may have been able to up it to if they would be okay with shortening the lifespan of their product.

So far we got:
Lower power usage, centralized cooling, centralized data physically, lower latencies as well as syncing clock speed with the APU are fairly good pros for embedded ram.
 
Back
Top