The pros and cons of eDRAM/ESRAM in next-gen

It's a nice story, but from bkilian's info the change from 4GB to 8GB came late. Late 2011.

The particular quantities are immaterial. bkilian's version of events doesn't really disagree with the thrust of Arwin's argument: that MS made what they thought was a safe bet on more, slower RAM. That is undeniable irrespective of when final decisions were made about the final amounts.
 
ESRAM is very good for bandwidth heavy and/or latency sensitive read&modify&write operations (such as alpha blending) where the working set doesn't fit inside the GPU L2 or ROP caches.

Last week I profiled a heavy snowstorm scene (so many particles that you can't almost see anything but snow). In this particular scene Xbox One is very fast indeed. So if your game is filled with HDR particles (using traditional pixel shader based particle blending) inclusion of ESRAM was the perfect hardware decision for you.

For the future (compute shader gather / conservative rasterization) based particle rendering solutions memory bandwidth is not an issue anymore (since the blending occurs inside the thread-shared memory). A particle renderer like this would run happily over DDR3 (even on integrated laptop GPUs). Fast memory doesn't benefit it much at all.
 
What I'm wondering is a scenario where you put 32MB (or a partial amount of that) of data in ESRAM, leave it there and use it over and over for a certain time, and also pull out of DDR3 simultaneously. Can you achieve a lot of bandwidth? And how relevant is this usage model? And of course there would be many "mix n match" scenarios, where one could presumably see a benefit (some transferring in/out of ESRAM, but also using some repeat data out of ESRAM).
This was exactly the kind of thing sebbbi used to love talking about, until platform warriors drove him off like they did to so many other devs who used to post much more frequently here. :rolleyes:

Edit:
LOL, I guess I should read all new posts before chiming in. Hey sebbbi, thanks for sharing!
 
As far as I'm concerned, this thread is spent. ES/DRAM provides a BW advantage at a given cost with the compromise of software complexity and difficulty using that BW. Given the way RAM is progressing, it looks like ES/DRAM are dead-ends for future hardware, meaning the subject ends with this generation where it's not really doing much. Plotting the importance of ES/DRAM over time, it has steadily declined from great back in the PS2 and GC to provide high BW otherwise unobtainable, to irrelevant now and going forwards.

Yea =( This was just one of my more enjoyable threads to read, this and the older DX11.2+ one were fun to follow. I just tried poking the embers to see if there is anything really left exploring here. Sebbbi added a nice touch there at the end, though future methods of said particle effects would run well on DDR3 as well.

Which, granted is something for X1 in particular; if one could context particle rendering in DDR3 and while letting the rest of the GPU do something with esram.

As many have eluded the stacked 3D RAM coming in 2016 would make this architecture obsolete as HBM obtains all the physical benefits of centralizing and increases both bandwidth and size.
 
ESRAM is very good for bandwidth heavy and/or latency sensitive read&modify&write operations (such as alpha blending) where the working set doesn't fit inside the GPU L2 or ROP caches.

I'm unsure of the bounds on what you can provide, but did you profile an alpha blending or other latency-sensitive operation case that would fall outside the ROP caches but was also not bandwidth-heavy enough to overload the DDR3 bus, then comparing if it is retargeted to ESRAM?

I'm curious as to what latency results can be teased out of a benchmark, if wavefront launch and other factors can be controlled for.
I'd expect that there should be measurable numbers, but since the Xbox One architects interview, the official stance has moderated a little on the magnitude of improvement.
 
http://www.slideshare.net/DevCentra...4cfc-bd18-c78592b923e8&v=qf1&b=&from_search=1
pn47dgjjkbjn88e87vu7.png
 
As many have eluded the stacked 3D RAM coming in 2016 would make this architecture obsolete as HBM obtains all the physical benefits of centralizing and increases both bandwidth and size.
Perhaps shared/uniform 3d stacked memory offers the best opportunity if there will ever be a window for Nintendo to get back in the console race. This technology is sort of an inflection point (like they called the iphone) or 'disruptive innovation'.

Not a fan of main vram being shared, with current memory technologies.

Would have preferred gpu had gotten their own dedicated memory, cpu had their own dedicated memory, and perhaps another 2GB coherent ddr3 shared between the cpu/gpu ideal for gpgpu.
 

Nice find!

So add the 10% reservation back on, and that gets you to the ~155 level that the architects were claiming to have seen.

That post process filter was presumably BW limited, and wouldn't have run much (or any) faster on the PS4 despite it's greater ALU count.

And it's early days yet, with the transition to compute barely even begun. BW may end up being a particular strong point for the Xbone if developers can shape workloads to fit within the esram (and manage transfers to and from it accordingly).
 
Nice find!

So add the 10% reservation back on, and that gets you to the ~155 level that the architects were claiming to have seen.

That post process filter was presumably BW limited, and wouldn't have run much (or any) faster on the PS4 despite it's greater ALU count.

And it's early days yet, with the transition to compute barely even begun. BW may end up being a particular strong point for the Xbone if developers can shape workloads to fit within the esram (and manage transfers to and from it accordingly).

heh, it's rather entertaining they have record holders for this type of thing. I wonder how high that number will be at in 5yrs.

I guess the next number worth tracking for them is average bandwidth utilization for esram for the entire game? As opposed to just specific operations.
 
heh, it's rather entertaining they have record holders for this type of thing. I wonder how high that number will be at in 5yrs.

I guess the next number worth tracking for them is average bandwidth utilization for esram for the entire game? As opposed to just specific operations.

Yeah, peak is one thing, but overall utilisation of the resource would interesting. Tied to that would be, I think, how effectively developers can combine operations to avoid intermediate reads and writes (effectively reducing the required BW).

esram gets a lot of negative attention from armchair types, but a number of developers (including King sebbbi) seem to see it as an opportunity to accelerate certain BW bound operations.

It's a pity that MS were so conservative with power draw. Cranking up the GPU and esram to 1 gHz or so (within the limits of GCN parts on TSMC 28 nm) would have led to a more directly competitive part.
 
Depends on how you use it and what you put in there, I guess.

32MB can do it, but depending on what you put in your buffers you might need to tile, put some buffers in main ram, or spill over into main ram (which Xbone buffers apparently can do).

Sort of OT, but does anyone know how efficient depth compression normally is (the kind of size ranges a compressed depth buffer can normally operate within)?
 
Sort of OT, but does anyone know how efficient depth compression normally is (the kind of size ranges a compressed depth buffer can normally operate within)?
Even for a compressed buffer you need the full uncompressed size. It's lossless block compression so you'll have holes throughout the range, but need the entire range.
 
That's a silly response unbefitting of the Tech forum. If one considers the limitations of XB1 to be due to ESRAM, one can consider that a larger amount would have been better (although it'd need to be eDRAM) and solve those issues. Certainly to take XB1 as is and remove the ESRAM, you'd end up with a vastly inferior machine that knocks your point completely on its head.
 
Even for a compressed buffer you need the full uncompressed size. It's lossless block compression so you'll have holes throughout the range, but need the entire range.

Thanks.

I got it into my head that modern compression could use a smaller buffer, depending on level of compression a particular buffer was suitable for, and that while you'd always need to have the full uncompressed size available you could make do with less space in most cases. The idea here was that you could put the most commonly used portion in esram and put the "overspill" component in main ram.

Perhaps you could sort-of achieve this by doing a Z pre pass, then (lossless) compressing depth down using compute? This would reduce the memory footprint of the depth buffer in esram, and in worst cases that didn't compress well you could overspill in main ram, leaving more esram for other buffers?

... or perhaps that would be too expensive. *shrugs*
 
0, it seems to me looking at this gen.

The esram may yet allow the Xbox one to operate more efficiently than the PS4, when all is said and done.

It's easy to look at the Xbox one 'blame' everything on esram, when decisions relating to noise, power, and main memory complexity and cost are primarily responsible for shaping the platform.
 
Regarding the question about what the ideal size would be; I think 32MB is the ideal size. The problem is the games that are designed for more capable hardware; they can't seem to fit inside unless they drop resolution, framerate, or a combination both.

Looking at a title like Forza5, which many people on this board consider a technological marvel, it is apparent that if you design the game for the 32MB esram, then you can still achieve 1080P/60fps graphics. You'd have to drop realtime lighting or HDR-lighting, but if the art style can hide those deficiencies; then you'd end up with a stunning, sharp looking game, that even runs at 60fps.
So if MS can regain marketshare/ force developers to use the Xbox One as the lead platform, then most, of not all cons of the esram will be gone
 
Last edited by a moderator:
Not sure if serious. Engineers develop for the hardware, they take into account its setup and architecture.

Going into this launch year all rendering engines for all consoles mainly became deferred. Fitting multiple 32bit g buffers into 32mb is the problem, so you will need to use overspill or other methods to make this work.

Granted as long as deferred, tiled deferred and other deferred methods are the most used x1 will be hamstrung to a degree. But I don't see why the industry will stop innovating. Already forward+ looks like an answer offering other compromises moving the bottleneck somewhere else for the x1. I do see an eventual solution happening that will benefit both.
 
Back
Top