The pros and cons of eDRAM/ESRAM in next-gen

When MS specced 16 ROPs they also specced 102 GB/s of B/W from the esram.

The fact that AMD delivered something vastly more capable is simply a happy turn of events.
My interpretation of the engineer's statements on this is that they gave a preliminary 102 GB/s figure as the minimum (with no qualifications) that they communicated to developers and outside groups before most of the design had been evaluated.

My strong suspicion is that those involved with the design knew enough about on-die memories and interconnects in general to expect more, but they had no good reason to say more as that would have relied on implementation details for an implementation that had not been fully specced.

I do also wonder how much more expensive it would have been if they asked for double the I/O to the same scratch memory. It somewhat looks like they were not ever going to accommodate MSAA bandwidth requirements or even FP16, which themselves would make 32MB seem that much smaller to work with anyway.

Decisions, decisions...

It's on-die memory, but per the DF interview it's a memory scratchpad that is accessed identically to main memory, post whatever page table setup and mapping to the hardware is done at allocation.
Accesses get routed through a crossbar setup with no additional special handling from the code, hence the ability to split portions of a target across both pools.

I'm speculating at this point, but I wonder if the 16 ROPs and their peak bandwidth faced a design bottleneck with the eSRAM's crossbar requirements.
I think, from the die shots and interviews, that Durango has doubled up on crossbar blocks relative to Orbis in order to service this comparatively generic memory access capability for the GPU memory pipeline.
Having 32 ROPs, or expanding the general memory access bandwidth for 16, would require plugging even more into AMD's crufty uncore--of which I have been having an increasingly jaundiced view as of late. Since the eSRAM's bus is sized to match the ROPs so well, the play to keep things on-die in this manner raises complexity in a way that a relatively straight shot over a wider Garlic to juiced up off-die GDDR might not.

That level of on-die bandwidth is doable, but potentially not with the constraints that the accesses be as generic, done cheaply, and with AMD's bus setup and design capabilities as the basis.

The lack of mention of significant latency benefits from the eSRAM, and the lousy latency numbers for AMD's memory accesses in general (from all appearances it is uniformly and disconcertingly bad across all current APUs including Kaveri) makes me think that there may be benefits on-die memory could have brought, if it weren't shoehorned into tech as old as Llano.
 
What's interesting to me is that whatever bottlenecks exist in the Xbox One, they appear to get worse the higher the framerate being targeted. 30fps games seem to be able to get to 900p or even match the 1080p of PS4 versions, but 60fps games have had the largest resolution disparity. You'd expect the relative performance to be largely fixed allowing devs to trade visual fidelity for framerate freely, but in the case of the Xbox One if you want your game to be 60fps it struggles to get above 720p. That is, frankly, alarming.
 
Because going from 720p to 1080p at 30fps is easier than going from 30fps to 60fps. Plenty of titles are only doing 1080p30 for that matter, citing "artistic design"
 
Last edited by a moderator:
I'm speculating at this point, but I wonder if the 16 ROPs and their peak bandwidth faced a design bottleneck with the eSRAM's crossbar requirements.
I think, from the die shots and interviews, that Durango has doubled up on crossbar blocks relative to Orbis in order to service this comparatively generic memory access capability for the GPU memory pipeline.
Having 32 ROPs, or expanding the general memory access bandwidth for 16, would require plugging even more into AMD's crufty uncore--of which I have been having an increasingly jaundiced view as of late. Since the eSRAM's bus is sized to match the ROPs so well, the play to keep things on-die in this manner raises complexity in a way that a relatively straight shot over a wider Garlic to juiced up off-die GDDR might not.

That level of on-die bandwidth is doable, but potentially not with the constraints that the accesses be as generic, done cheaply, and with AMD's bus setup and design capabilities as the basis.

The lack of mention of significant latency benefits from the eSRAM, and the lousy latency numbers for AMD's memory accesses in general (from all appearances it is uniformly and disconcertingly bad across all current APUs including Kaveri) makes me think that there may be benefits on-die memory could have brought, if it weren't shoehorned into tech as old as Llano.
Actually I had the same thoughts when I posted earlier, I remember reading something not that long ago about Kaveri and the "various" buses connecting the the GPU the cpu the memory controller, etc. It was a bit too much for me but the whole thing looked "dirty" on the verge of a hack job.
AMD still have to implement a clean bus or crossbar that would cleanly connect the whole, pretty off putting when you consider they are the one who invented the APU / fusion concept.

So I was also wondering if something inside would have set a bottleneck on the on die bandwidth between the GPU and the scratchpad memory.
MSFT engineers knew obviously and they should have discarded that option.

I guess it sort of sets the score for this thread, scratchpad has pro and con, wrt Durango it seems something inside the design prevented the pro to materialize... :(

As a side note AMD is lagging at fixing some parts of its APU as 3dilletante pointed out, it affects AMD owns product and it seems also affected MSFT plans, I wonder if AMD actually had what it takes to develop that many products at the same time (part of Nintendo WiiU, Liverpool, Durange, its own APU), it must affect how much time and efforts they can put into fixing and improving critical parts of their designs. Sony was wize to go with something overall simple.

Lots of people think that it was a good thing for AMD to get all the consoles deal, I start to wonder it may actually spread them a bit thin /slash affect the efforts they can put in their main line of products.
 
Given the heavy amount of reuse amongst all the APUs and AMD's spinning of the advantages of R&D being paid for in the semi-custom division, I would question whether AMD would be bringing as much to market period if Sony and Microsoft didn't pay for it.
 
The fact still remains that there are titles running at 1080p 60 fps on the X1.
Alot of them arent really demanding but Forza 5 and Nba2k14 have strong graphics and they hit the mark. I think its too early to say the xbox one is incapable of hitting 1080p 60fps on a normal basis.
I think the gap in alot of games so far is Sdk related
 
Given the heavy amount of reuse amongst all the APUs and AMD's spinning of the advantages of R&D being paid for in the semi-custom division, I would question whether AMD would be bringing as much to market period if Sony and Microsoft didn't pay for it.
It keeps them afloat financially but it may have trade offs.
If AMD plans were clearer, their communication better may be investors would be willing to let them sometime /some credit and credits.
At this point I don't know where AMD is going, their roadmap is constantly changing, their goals and business plan seems undefined, etc. I would not bet a dime a on them.
Anyway I stop the OT here.
 
The fact still remains that there are titles running at 1080p 60 fps on the X1.
Alot of them arent really demanding but Forza 5 and Nba2k14 have strong graphics and they hit the mark. I think its too early to say the xbox one is incapable of hitting 1080p 60fps on a normal basis.


There are titels running full 1080p on ps360 as well.
 
What's still interesssting about the esram is its bandwidth. The 109-204 GB/s are just for the tiny 32 MB space. If you try the same with GDDR5 memory, you wouldn't get this high bandwidth for a tiny fragment of the memory. So this is really really fast, tiny but fast. But what can be done with it. It should be more than enough for a 1080p render target but the large things (textures etc) must come from DDR3 memory.

another thing we don't know of, is the little esram on the die, that is not used for the gpu. it is small, but may be used for some tasks offload them from main memory (lower latency).

Nvidia's Maxwell recently came out, and one of it's primary changes is an increase from 256KB to 2MB cache.

My thought was this validates MS's ESRAM (sort of 32MB of cache). However I feel clues are the ESRAM may be hampered by interface and may never feature latency particularly better than the rest of the memory. As such, I'm thinking it may never be more useful than a bandwidth band-aid and that's disappointing if so.

I've always thought MS hardware engineers were underrated (for their release times I think 360 and OG Xbox are both under appreciated hardware) but this Xbox One appears to lack any special magic so far.

I guess to bring it back on topic, the design still can be not bad imo, it's just a tradeoff of ESRAM for DDR3 which should allow One to undercut the competitor on price (while still being less powerful, but ideally not noticeably so, that'd be the ideal of the magic-less design I guess). But then MS execs nullified all that by packing in a super expensive Kinect sensor.
 
Exactly, it is a transistor budget and die space problem. If EDRAM had been available they could have included 128-192MB in the same space, or doubled it to 64MB and still increased the size of the GPU logic. Those tradeoffs are why the Xbox One's APU design is unprecedented. Spending that many transistors on that little memory is not usually seen as worthwhile. MS was looking with an eye towards future cost reductions and available EDRAM processes and decided they could get away with ESRAM to simplify both current production and future shrinks. Given the current results it is certainly arguable they chose poorly.

We've discussed this to death but, the One APU is overall about the same size as the PS4 APU. So that is in theory a wash (maybe the yields could even be a bit better on the Xbox chip because I'd think memory is more defect redundant than logic).

The tradeoff was really then DDR3 vs GDDR5. I'm quite sure Sony is paying a lot more for it's RAM. Why did MS do this? Well for one they likely never expected Sony to go to 8GB DDR5. It's possible they didn't even think it was technically possible when One was on the drawing board. The designs would look a lot different if it ended up at PS4 4GB and One 8GB. But of course, it didn't, and that colors our view now.

Then again the idea MS decided DDR3/8GB was a necessity early on doesn't square with word from bkilian that One moved to 8GB actually fairly late in the design.

I guess I'm saying I think the greater impetus from MS was for cheap hardware (DDR3) so they could shoehorn Kinect in and still maintain some semblance of a reasonable price, is my point. I dont believe they somehow tried to match Sony's muscle for the same cost and failed, as your post implies. This was still arguably a failure on their part, but more of a strategic one from on high.

I still think One hardware (the ESRAM/DDR3 combo) could have validity as a lower cost, almost as good, design. That's where it's value would shine if there is any, it's just clouded by Kinect currently. As bad as things are looking, I suspect MS is thinking more and more about pulling Kinect though, they almost have to be.
 
You'd expect the relative performance to be largely fixed allowing devs to trade visual fidelity for framerate freely, but in the case of the Xbox One if you want your game to be 60fps it struggles to get above 720p. That is, frankly, alarming.
When you go 30-60fps it's not just the GPU that is working harder (unlike when you just bump the rez); you have to run your game logic at double the rate as well. Do you have enough CPU processing cycles for that...? (Or in a fantasy world, enough GPGPU cycles, cloud computing resources etc. *ahem*)
 
I wonder what the MS engineers feel now about their choices .... game after game after game , all this criticism .
 
64 MB would be a reasonable amount of eDRAM. How much would that cost transistor wise? The PS2 showed you could do quite a bit with eDRAM and that had 4 MB, albeit 32 MB main memory. It was pretty epic for its time. That machine did some amazing things through its lifespan what with awesome guys like Fafalada pushing its boundaries. I feel the 360's eDRAM should have been completely open for devs to do what they want with it, but still has proven its worth. Why 64 MB? Because it's a nice number, perhaos a bit large, but nice none the less. It will help alleviate bandwidth problems all that much more, and if open will allow devs to exploit hardware all the better. 64 MB eDRAM and what, like a 1 to 2 GB of DDR3?

XB1 has a memory architecture main ram / Video Ram very similar to PS2. With PS2 developers used the Vram to store the framebuffer and mainly the textures of their games.

When I see Forza and Titanfall with their average textures/shaders it reminds me the first PS2 games when textures had to be stored in the low 4MB VRam to have optimal performance.

X360 is another matter entirely because its edram can only be used as a framebuffer. it's not technically a main ram/video ram architecture. I see the 10MB of the X360 as a GPU cache specialized for framebuffer operations but the X360 has a real unified memory.

Where there is a problem with XB1 is with the ratio of Vram/main ram. When the ratio was 1/8 for the PS2, the ratio is 1/256 for XB1. When PS2 devs had trouble to fully store their texture for each levels in the 4MB vram we can imagine how it's must be difficult on XB1 to store the many textures/shaders.

32MB is not enough for next gen games, even with an old 3D engine. They have to limit their games with double buffers (screen tearing), have low quality textures (Titanfall/Forza) or just have bad performance if they can't store the texture/high bandwidth assets on the fast VRam (COD Ghosts, Battlefield).

I think after the XB1 no more hardware will ever try the main ram/Vram memory architecture. With the current techniques (deferred rendering, temporal AA, Triple buffering) developers more and more need full bandwidth on the whole memory: A unifed memory.
 
XB1 has a memory architecture main ram / Video Ram very similar to PS2. With PS2 developers used the Vram to store the framebuffer and mainly the textures of their games.

When I see Forza and Titanfall with their average textures/shaders it reminds me the first PS2 games when textures had to be stored in the low 4MB VRam to have optimal performance.

X360 is another matter entirely because its edram can only be used as a framebuffer. it's not technically a main ram/video ram architecture. I see the 10MB of the X360 as a GPU cache specialized for framebuffer operations but the X360 has a real unified memory.

Where there is a problem with XB1 is with the ratio of Vram/main ram. When the ratio was 1/8 for the PS2, the ratio is 1/256 for XB1. When PS2 devs had trouble to fully store their texture for each levels in the 4MB vram we can imagine how it's must be difficult on XB1 to store the many textures/shaders.

32MB is not enough for next gen games, even with an old 3D engine. They have to limit their games with double buffers (screen tearing), have low quality textures (Titanfall/Forza) or just have bad performance if they can't store the texture/high bandwidth assets on the fast VRam (COD Ghosts, Battlefield).

I think after the XB1 no more hardware will ever try the main ram/Vram memory architecture. With the current techniques (deferred rendering, temporal AA, Triple buffering) developers more and more need full bandwidth on the whole memory: A unifed memory.

Forza 5 has low quality textures? Did you ever play the game first hand or just watch YouTube videos?
 
I like the logic that given that ESRAM is an evolution of EDRAM, it somehow became more useful but worse than EDRAM...say what?!?
 
I wonder what the MS engineers feel now about their choices .... game after game after game , all this criticism .

It's not MS engineers that should be taking the heat though. It's the higher up business execs who made a strategic decision to not go very powerful (many bkilian posts illuminate this matter, such as that they expected to be less powerful and didn't care), and I'm actually not even sure who exactly makes those decisions at Microsoft.

Those higher ups I hope, are feeling the brunt of this criticism...

Being less/more powerful is more a business strategic decision than an engineering one. Arguably especially now where nobody is targeting bleeding edge performance at all. Being the most powerful console with ease would involve nothing more than a mid range PC GPU, a 7870 (~2.6 teraflops btw, for those who erroneously equate PS4 GPU with a 7870) decimates the consoles.

But yet again I'd come back to the facts the strength of the ESRAM/DDR3 design (to keep it OT) should be lower costs and good enough performance, and MS simply scuttled the cost advantage with Kinect.

I've posted it a million times but imo the ideal for the Xbox One is a 299 no-Kinect price point. Then we could say "hey this hardware is cheap and pretty fast". MS is nowhere near that aggressive in many years, though. If MS wanted to be aggressive they could have taken those Live fees and subsidized x360 hardware to the tune of market domination last gen.
 
Then again the idea MS decided DDR3/8GB was a necessity early on doesn't square with word from bkilian that One moved to 8GB actually fairly late in the design.

It was increased to 8 GB at the end of 2011, before the first third party disclosures. Is that fairly late in the design?

I think we can also definitely blame the Wii for an underpowered Durango as it showed, far clearer than ever before, that making a box with cutting edge performance and graphics could be completely irrelevant to sales success.

I wonder what the MS engineers feel now about their choices .... game after game after game , all this criticism .

It's not their fault, management wanted a box built to a price (so they could pack in Kinect and sell at break even/profit) with lots of memory for all their non-gaming features
 
Last edited by a moderator:
When you go 30-60fps it's not just the GPU that is working harder (unlike when you just bump the rez); you have to run your game logic at double the rate as well. Do you have enough CPU processing cycles for that...? (Or in a fantasy world, enough GPGPU cycles, cloud computing resources etc. *ahem*)

Why would a CPU limitation result in lower rendering resolutions? The games are running at 60fps, the question is why is the Xbox One struggling to get above 720p at that framerate? This includes MGSV, CoD:Ghosts, Battlefield 4 and Titanfall.
 
Back
Top