The pros and cons of eDRAM/ESRAM in next-gen

ESRAM is pretty wasteful from a transistor budget standpoint. I'm sure Devs would have been a lot happier with 192-256MB of EDRAM. Perhaps that would have posed problems in terms of what process and foundry you could use. Challenges like that may have contributed to Sony's decision to go with unified GDDR5 even though they also explored an embedded memory design.
A (the?) chief reason to use unified RAM was simplicity for development. If Sony knew MS were going with scratchpad RAM and devs would still have to design for it, perhaps they could have said, "sod it. Poor devs are stuck with annoying scratchpad RAM. Let's just go for performance." Then they could have put in 128 MBs of 400+ GB/s eDRAM coupled with 4 GBs GDDR5 or 8GBs DDR3 and provided a very different experience, whatever would be needed to make a better overall device to the current PS4. It'd be interesting to see a next-gen console with zero bandwidth limitations like PS2.

But then the SOC would be very different. Either larger and more expensive, or with less CUs. Still, HSA with really fast eDRAM seems quite compelling to me.
 
A (the?) chief reason to use unified RAM was simplicity for development. If Sony knew MS were going with scratchpad RAM and devs would still have to design for it, perhaps they could have said, "sod it. Poor devs are stuck with annoying scratchpad RAM. Let's just go for performance." Then they could have put in 128 MBs of 400+ GB/s eDRAM coupled with 4 GBs GDDR5 or 8GBs DDR3 and provided a very different experience, whatever would be needed to make a better overall device to the current PS4. It'd be interesting to see a next-gen console with zero bandwidth limitations like PS2.

Unfortunately Cerny didn't say what size the EDRAM was, he just said "very small", but the bandwidth was up to 1,088Gb/sec. It could have made for an interesting console generation.

2013-06-28%20Mark%20Cerny%20at%20Gamelabs%20-%20Road%20to%20PS4%20%282%29.png
 
Last edited by a moderator:
Small eDRAM is what poses the development problems. Large eDRAM would make them a lot less of a concern. On Iris Pro 5200 the eDRAM is transparent to the devs AFAIK, working as a cache. In a console, that much eDRAM as full cache and optional direct control would be extremely versatile yet still easy to use. XB1's problem regards developer difficulty is that the cache is to small to be independent and needs full management. I think the idea has considerable merit in principle, but I couldn't even hazard a guess at what the associated costs (sacrifices) elsewhere in the system may be. I just think that unified GDDR5 wasn't the best solution for price/performance, but Sony's goal this gen was entirely simplicity.
 
The cache present in Crystalwell is a real (as in chip-managed) L4 cache, but remember it is split between CPU and GPU with no hard divide.

You can't discern definitively from what Cerny said at Gamelabs ("the EDRAM would be very small and each game team would have to develop special techniques to manage it")whether the alternate PS4 design EDRAM was cache (like Crystalwell) or RAM (like Durango). Now "manage it" may imply managed memory like Xbox but either way you have the same conundrum; only a sliver of RAM is high bandwidth and you have to make your critical accesses fit into this setup. And you have to "manage it".

Of course "very small" could have been more than 32mb. Relative to 4Gb of RAM, 128mb is very small and devs are used to working like this anyway, if you want to optimise 80x86 code you need to be thinking about what will fit into L1 and L2 and how often you'll be accessing the data. If you worked with Cell's local store you had similar constraints. If there is a fault with ESRAM in Xbox One, I think it is just a little too small. Another 16mb could have made all the difference in the world.
 
The choice Cerny mentioned (which IIRC was actually that they evaluated several options with the extreme case being very high BW for very little capacity) isn't the only option discussed here. The question is asked how much eDRAM would be 'enough'? I think 128 MB's would be 'enough' but the reason we didn't see it is that Sony regarded any eDRAM as too developer unfriendly. For a pro and con comparison, I'm thinking 128 MBs of 'fast' eDRAM would be more hassle than unified GDDR5 but possibly give better performance in key areas. What I'm certainly thinking is that PS4's design was not all about maximum performance. PS4 was completely focussed on good performance for an easily tapped device, within the price target. With a different set of priorities, I think an eDRAM enabled PS4 may be the better console regards potential, though maybe devs would never make use of that enough to make it worthwhile. There are so many parameters in effect though, it's really impossible to say. Thankfully, when an engineer's employer is his armchair, consequences for being wrong aren't too severe. ;)
 
The choice Cerny mentioned (which IIRC was actually that they evaluated several options with the extreme case being very high BW for very little capacity) isn't the only option discussed here.

Not quote. Here's the full quote - blame missing words on Dragon.

Cerny @ Gamelabs 2013 said:
We didn’t want the hardware to be a puzzle that the developers would need to solve to make quality titles.

And just to give a specific example, this gets technical really quickly, please forgive me. The architecture we ended up with for PlayStation 4 used a 256-bit bus and a type of memory found in top of the line graphics cards called GDDR5. And the combination of this wide bus and this fast memory gives us a 176Gb per second which, and many of you will have to take my word for it, is quite a lot.

So with that much bandwidth straight forward programming techniques usually result in some pretty impressive graphics. Now we knew that was an alternative architecture that would be a bit easier to manufacture. In this architecture we would use a narrow 128-bit which would drop the bandwidth to 88Gb per second, which is not particularly good in next generation terms and would therefore really hurt the graphic performance.

So we then use some very fast on-chip memory to bring the performance back up. If we used EDRAM for this on-chip memory, we know that bandwidths as much as one terabyte per second, that’s 1,000 gigabytes a second, would be achievable. The catch though, is that the on-chip memory would need to be very small. And each game team would need to develop special techniques in order to manage it.

So to compare these two architectures, the one on the left [actual PS4 design shown] has 176Gb per second for any access, and the one on the right [alternate PS4 design shown] 88Gb per second if data is in system memory or 1,000 gigabytes per second if the data is in that tiny EDRAM. And at first glance the architecture on the right looks far superior to the one on the left. Sure it takes a while to figure out how to use it but once you figure out how to use that little cache of EDRAM you can unlock the full potential of the hardware.But to our new way of thinking, the straight forward approach on the left is definitely advantageous.
So they eschewed the faster, easier to manufacture (so probably cheaper) option in favour of the one that shipped for simplicity's sake.

The question is asked how much eDRAM would be 'enough'? I think 128 MB's would be 'enough' but the reason we didn't see it is that Sony regarded any eDRAM as too developer unfriendly.

I the the answer to 'how much is enough' will change from now, where I think we can safely conclude that 32mb is too little (because devs need time to develop optimisation techniques and several have mentioned ESRAM as a challenge) to 12-18 months down the line (where devs may have adapted) to 5 years where GPU compute could begin to be utilised a lot more and now you're trying to fit graphics and other essential data in your super fast cache.

For a pro and con comparison, I'm thinking 128 MBs of 'fast' eDRAM would be more hassle than unified GDDR5 but possibly give better performance in key areas. What I'm certainly thinking is that PS4's design was not all about maximum performance. PS4 was completely focussed on good performance for an easily tapped device, within the price target. With a different set of priorities, I think an eDRAM enabled PS4 may be the better console regards potential, though maybe devs would never make use of that enough to make it worthwhile.
Agreed. The alternate design could have left the launched PS4 in the dust performance-wise for an awful lot of things. Equally a few years down the road when (or if) GPU compute starts being used more, whatever size EDRAM they chose, maybe suddenly it wasn't big enough. Hardware decisions are tough, tougher still when you platform has to last for 5+ years and you have no idea where graphics technology will evolve too and what software graphics techniques your hardware may need to support that don't exist yet.

Lots of people are still thinking of GPU as hardware for graphics but there's no doubt a fair amount of unused compute time sitting idle on both consoles. It's almost the reverse of the Cell situation. Using the CPU to lighten the GPU load on PS3 has switched to using the GPU to lighten the CPU load on PS4 and One. And no doubt it'll take a while before the approach is part of mainstream development.
 
So to compare these two architectures, the one on the left [actual PS4 design shown] has 176Gb per second for any access, and the one on the right [alternate PS4 design shown] 88Gb per second if data is in system memory or 1,000 gigabytes per second if the data is in that tiny EDRAM.
I like how he paints a totally accurate picture of the 2 systems, nice to know esram has 1,000gb/sec bandwidth

So to compare these two architectures, the one on the left [actual PS4 design shown] has 176Gb per second for any access, and the one on the right [alternate PS4 design shown] 88Gb per second if data is in system memory or 204 gigabytes per second if the data is in that tiny EDRAM.
Not so impressive, still he only inflated the figure by 500%
 
I like how he paints a totally accurate picture of the 2 systems, nice to know esram has 1,000gb/sec bandwidth
Cerny was talking about an alternative PS4 design they considered, using 88 GBs RAM and 1 TF/s EDRAM. They eschewed that design.
 
Microsoft Explains why the Xbox One’s ESRAM is a “Huge Win” and Allows Reaching 1080p/60 FPS

XboxOne4-670x334.jpg


http://www.dualshockers.com/2014/04...is-a-huge-win-and-helps-reaching-1080p60-fps/

What exactly "DNA Engines" means?
So... we went from this:
xbox-one-esram-issue.jpg


To what they comment in your post....

60 fps is what I am interested in the most.

The eSRAM is optimal for certain mathematical operations and will certainly help but it is not a fix all, alas.

Imo, 32MB is not enough to store all things involved with shadow mapping. :smile2:

On modern AMD graphics cards you can test this yourself using the Radeon Pro utility to monitor memory usage and then you can change the shadow map resolution manually. And compare... :smile2:
 
Imo, 32MB is not enough to store all things involved with shadow mapping. :smile2:

why would you want all that in the fast memoy at once? you just need tiny fragments of the big shadowmap while calculating.
if you just fill the whole esram with this thing, most of the esram would be filled just for nothing, because most parts of it would not be processed. that is just inefficient.
yes, the esram is small, but if you use it efficient, you can use it almost as if you got >109GB/s of extra bandwith
like a ms employee said, "it is all about having the right data, at the right time in the right place". yes it is compliacated, but most time efficiency is.
I know it is not the same as the l2 cache, but what does a cpu do, if it has no l2 cache. it uses the main memory and it is really slow without it. that should just be avoided in most cases.
 
What does hint at the use of the eSRAM for shadow mapping in that slide?

why would you want all that in the fast memoy at once? you just need tiny fragments of the big shadowmap while calculating.
if you just fill the whole esram with this thing, most of the esram would be filled just for nothing, because most parts of it would not be processed. that is just inefficient.
yes, the esram is small, but if you use it efficient, you can use it almost as if you got >109GB/s of extra bandwith
like a ms employee said, "it is all about having the right data, at the right time in the right place". yes it is compliacated, but most time efficiency is.
I know it is not the same as the l2 cache, but what does a cpu do, if it has no l2 cache. it uses the main memory and it is really slow without it. that should just be avoided in most cases.
You see, that's what haunts me on the matter, whether they are using the DDR3 memory block or the eSRAM for shadow mapping. I suppose it is the former.
 
That slide basically tells us that they don't need to render far away shadows at all, saving quite a bit of bandwidth there. Also they use 16 bpp shadow map format, halving the sampling bandwidth cost compared to the usual 32 bpp formats. Both these things combined mean, that their shadow map rendering bandwidth cost is very low. Thus they don't even need ESRAM for shadow maps (DDR3 bandwidth is more than enough for reading baked 16 bpp shadow maps). That's the way I understand it.
 
That slide basically tells us that they don't need to render far away shadows at all, saving quite a bit of bandwidth there. Also they use 16 bpp shadow map format, halving the sampling bandwidth cost compared to the usual 32 bpp formats. Both these things combined mean, that their shadow map rendering bandwidth cost is very low. Thus they don't even need ESRAM for shadow maps (DDR3 bandwidth is more than enough for reading baked 16 bpp shadow maps). That's the way I understand it.
Many thanks for the explanation Sebbi, :smile2: you are a forum bonus. It's always great to have people like you around.

Why does it avoid rendering far away static objects? Is it a programming trick like those you programmers use at times? If you run a game at 60 fps the console has to show 60 frames in a second and that's how the shadows would run too.

I figure... thought, that might not be the case in certain games with preredenred backgrounds.
 
Why does it avoid rendering far away static objects?
It's basically just saying that they don't need to re-cast shadows for said static stuff every frame; they can calculate those shadows when you load a level or enter an area, and that's that.

It doesn't avoid rendering far-away static objects, it just removes the redundant task of constantly re-generating their shadows. Objects still need to be re-rendered because they look different from different angles, but shadows are relative to the scene, not the camera. Shadows only need to be recalculated when the scene changes, so far-off static stuff should have static shadows.
 
It's basically just saying that they don't need to re-cast shadows for said static stuff every frame; they can calculate those shadows when you load a level or enter an area, and that's that.

It doesn't avoid rendering far-away static objects, it just removes the redundant task of constantly re-generating their shadows. Objects still need to be re-rendered because they look different from different angles, but shadows are relative to the scene, not the camera. Shadows only need to be recalculated when the scene changes, so far-off static stuff should have static shadows.
That's a superb detailed explanation. Many thanks HTupolev! :smile2: Are you a developer? I wonder... because you know your stuff.

In the end it always look like developers have a tight room for maneuver when they design their games, that doesn't change with every generation. Glad they use smart approaches rather than brute forcing everything.
 
Back
Top