The pros and cons of eDRAM/ESRAM in next-gen

They didn't expect 8GB of fast RAM to be affordable for launch, so they went another way. Whether that's bad luck or incompetence or something inbetween, I don't know. Sony didn't seem to be confident about 8 GB either, but in the end it worked out for them. Without know how that kind of industry projection is done, I would think Sony just took a bigger risk in their design and it worked out for them in the end. They were fully prepared to go with 4GB, which I think would have been a problem.

i think the bigger problem than the RAM was that Xbox One was designed with knowledge that it would be technically weaker than the PS4. Which was foolish. Why even provide that inherent advantage? I think 32 ROPs, 1152 Shaders, 18 CU would still whip Xbox Ones ass even if PS4 only had 4GB... and the PS4 would be EVEN cheaper... think about that.

Instead they built a box that required that they remove graphical prowess from the system and added in a "solution" that still doesn't deliver...
 
with the size of the PS4 OS, yea, good luck with that.

I think that's the cart before the horse. The reserved RAM is so big because they have so much. Plus, right now, they have parity with the weaker machine on amount available, it's faster and has no special quirks required to wring performance out.
 
I think that's the cart before the horse. The reserved RAM is so big because they have so much. Plus, right now, they have parity with the weaker machine on amount available, it's faster and has no special quirks required to wring performance out.

Plus if they had settled on 4GB of GDDR5 they probably would have gone ahead with their plans to include the 16GB of flash for OS use and app tombstoning so it isn't even obvious that PS4 would have been at a significant task switching deficit. And we certainly had devs on this very forum arguing that with all the streaming techniques that were developed for PS3/360, it would not have rendered the 4GB PS4 at any significant disadvantage. We'd probably still be seeing the same resolution and framerate differences, although maybe Digital Foundry would be busy trying to discover where LOD transitions differ or assets have been reduced.
 
i think the bigger problem than the RAM was that Xbox One was designed with knowledge that it would be technically weaker than the PS4. Which was foolish. Why even provide that inherent advantage? I think 32 ROPs, 1152 Shaders, 18 CU would still whip Xbox Ones ass even if PS4 only had 4GB... and the PS4 would be EVEN cheaper... think about that.

Instead they built a box that required that they remove graphical prowess from the system and added in a "solution" that still doesn't deliver...

Correct me if I'm wrong.

Xbox 360 used 8 ROPs (@500MHz) and had no problem for outputting 720p, so Xbox One which using 16ROPs (@853MHz) shouldn't have many (or any) problems for outputting 1080p (considering Microsoft term 16 ROPs would be balanced at 164 GB/s !! :smile:).

On the other hand 32 ROPs on PS4 is good for shadow or particle system and being sure that the bandwidth will be used at it's pick performance (no fillrate bottleneck) all the time. PS4 needs +300GB/s bandwidth to use all of it's 25.6Gpixels/s fillrate.

http://8pic.ir/images/64073665923387546641.png

I think the biggest problem is the size of eSRAM. The difference between 720p/60fps and 1080p/60fps (125%) is 3x bigger than the difference between PS4 and X1 GPUs. Microsoft needs 3rd party developer to fully optimize their games for X1 memory system. Using 720p framebuffer on 32MB of eSRAM should be super easy, considering the difference between X360 and X1.

8 ROPs @500MHz (4Gpixels/s)-----> 16 ROPs @853MHz (13.6Gpixels/s) (+240%)
166-240 Gflops (60% efficiency according to Microsoft) -----> 1.28 TFlops (+700-400%)
512 MB GDDR3/22.4 GB/s -----> 5 GB DDR3/68 GB/s (+900%)/(+200%)
10MB eDRAM (32GB/s) ----->32MB eSRAM (109GB/s at each direction) (+300%)

And at the end we are getting a game with 160% boost from last gen. 1.3x resolution (960x720 --> 1280x720) and 2x for frame rate(2x1.3=2.6 --> 160%), and of course better textures & lighting. So the difference between PS4 and X1 is considerable for current gen (for MGSV game) but the little difference between X1 and X360 is another story for itself that I can't understand.

Maybe using eSRAM was a poor choice (long term), but not to this degree.
 
Question the 32 mb of esram has a bandwidth of 204gb/s.
The Ms guys said in the Df article the average actual bandwidth using real game code is 150gb/s.
Does that literally mean that the esram can process 160gb in one seconds time?
 
Question the 32 mb of esram has a bandwidth of 204gb/s.
The Ms guys said in the Df article the average actual bandwidth using real game code is 150gb/s.
Does that literally mean that the esram can process 160gb in one seconds time?
Wondered the same.
Also X360 eDram bandwidth was 256GB/s for the ROPs, although the GPU had write bandwidth to daughterdie of 32GB/s. (Basically for results from shaders/textures that ROPs needed to write into framebuffer.)
 
Well if the esram can actually process 150gb a second on average than it would not hinder the X1 hitting 1080p and 60 fps. I read somewhere that one frame at 1080p in a defered render is around 6 mb withoutany AA. At 150gb/s you could do well over 200 fps at 1080p. Even after you used a double buffer and some form of AA you would still be comfortably over 60 fps. Thats if the esram's bandwidth can be taken literally
 
I think that's the cart before the horse. The reserved RAM is so big because they have so much. Plus, right now, they have parity with the weaker machine on amount available, it's faster and has no special quirks required to wring performance out.

Has anybody heard about 12GB PS4 devkits? No?

Because they probably don't exist yet, or maybe just recently, (how could they when the decision for 8GB was so late). When XB1 devkits probably have 12GB of memory from the beginning.

That explain the rather limited 5, 5.5 then 6GB (with conditions) allocation for games. But I fully expect Sony to eventually release 12GB devkits (maybe they are already released) and give developers at least 7.5GB for games.

Those current 8GB devkits were probably designed to reserve 4GB for the debugging tools, 3.5GB for the game and 0.5GB for the OS. Now they had to strictly reduce (hence the conditions) the size given to the debugging tools to allow more than 4GB for the games.
 
Correct me if I'm wrong.

Xbox 360 used 8 ROPs (@500MHz) and had no problem for outputting 720p, so Xbox One which using 16ROPs (@853MHz) shouldn't have many (or any) problems for outputting 1080p (considering Microsoft term 16 ROPs would be balanced at 164 GB/s !! :smile:).

This assumes that the workload on the rops per pixel stays the same. This is not guaranteed at all, as the various deferred rendering techniques really like fillrate.
 
Has anybody heard about 12GB PS4 devkits? No?

Because they probably don't exist yet,

GDDR5 is point-to-point, using the same silicon you simply cannot put more memory on it until someone makes bigger chips. So, as long as no-one is selling memory chips larger than 4Gb, all PS4 development will be done on 8GB machines.
 
i think the bigger problem than the RAM was that Xbox One was designed with knowledge that it would be technically weaker than the PS4. Which was foolish. Why even provide that inherent advantage? I think 32 ROPs, 1152 Shaders, 18 CU would still whip Xbox Ones ass even if PS4 only had 4GB... and the PS4 would be EVEN cheaper... think about that.
I don't think MSFT knew what set-up Sony were to use.
But they are somehow lucky as you are pointing out, Sony could have shipped a system even cheaper and still competitive. KZ:SF was clearly designed for a 4GB system. With regard to execution units and bus width they may also have done significant cuts and still be competitive, UMA grants the system a consistent advantage, it is obvious looking at how much RAM guerilla uses for its various renders targets, I don't think they are doing for the sake of using lots of ram, it might have performances benefits.
Instead they built a box that required that they remove graphical prowess from the system and added in a "solution" that still doesn't deliver...
I don't think that not delivering the PS4 performances or meeting with any arbitrary performances figure is the issue. To be fair I think that a vanilla UMA design as found in the ps4 is impossible to match as far as convenience is concerned.

One more serious issue may be how they balanced ROPS and ALUs in the design as well as the bandwidth between the GPU and the scratchpad memory. It might be the biggest difference with the 360 imho. The smart daughter die and the 360 was design to draw things, pretty much everything that fit the edram worked as a charm. The big issue I see with Durango is not that it is not an UMA design but that actually working within the limited amount of space granted by the scratchpad comes with no advantage versus working within the roomy ps4 ram. You don't benefit from more bandwidth, actually the ps4 has twice the ROPs with the associated cache. Actually the PS4 will most of the time be a lot faster at drawing things.
In my eyes that is where the issue is. Cerny said that hesitated between a system akin to a reworked 360 with lots of slow RAM and ultra fast scratchpad and an UMA design with fast RAM. Sony chose the later, MSFT did not chose the former the scratchpad while greatly limited in size offer no benefit (~) versus the V-RAM you find in most shipping gaming GPU.

Looking at costs, the scratchpad did not save MSFT the use of a 256bit bus and fast and costly form off DDR3. Wrt to cost saving it is also falling short.

Gotta wonder what the cost projections were if they had gone with an on-package, off-die approach with eDRAM again, especially considering how node shrinks are slowing down and becoming more expensive.
Also wonder what the trade-offs (cost/die-size/bandwidth) would have been with a true cache as opposed to scratchpad...
I wondered about it and I will take 3dilletante's (and others) words for it, pretty much only Intel could have provided such an elegant solution and it might still have ended at a performance deficit against the PS4 UMA backed by 176GB/s of bandwidth.

I think the issue with Durango is whereas it embarks an actually big amount of scratchpad you get none of the usual advantage associated with scratchpads aka really high bandwidth. In my simple word I would say that Durango is not as fast at drawing stuffs as what you would expect from a chip that has access to 32MB of on die memory. The deal should be "once it fits the scratchpad performances should never be a concern" that the promise behind scratchpad, I think it is not delivered.
 
Xbox 360 used 8 ROPs (@500MHz) and had no problem for outputting 720p, so Xbox One which using 16ROPs (@853MHz) shouldn't have many (or any) problems for outputting 1080p (considering Microsoft term 16 ROPs would be balanced at 164 GB/s !! :smile:).

When MS specced 16 ROPs they also specced 102 GB/s of B/W from the esram.

The fact that AMD delivered something vastly more capable is simply a happy turn of events.

If you consider that, and also look at the PS4, and at the vast array of PC graphics cards from recent years, you see that 160~209 GB/s would everywhere else 'balanced' for 24 or 32 ROPs. e.g. 7790 aka R260 is [16 ROPs, 896 shaders, 96 GB/s], where as 7870 aka R270X with have 32 ROPs.

Especially when you consider the additional BW from XB0 main memory, I think 16 ROPs now looks like underkill.

Xbonk was supposed to be even less powerful than it's ended up being. Blame late gen Xbox 360 stats of media consumption and Kinect usage, combined with confirmation bias and Xbox being taken over my media obsessed execs.
 
Looking at costs, the scratchpad did not save MSFT the use of a 256bit bus and fast and costly form off DDR3. Wrt to cost saving it is also falling short.

The payoff for the esram comes later in the machines life, when MS can transition to 8 x 1GB DDR4 with low frequency and very low cost as it'll be slow by DDR4 standards.

I'm assuming those 32-bit DDR4 chips are still coming of course ....

Xbox Once simply wasn't intended to be a high end gaming console. Ultimate entertainment box, that MS assumed would simply inherit Xbox popularity and userbase.
 
The payoff for the esram comes later in the machines life, when MS can transition to 8 x 1GB DDR4 with low frequency and very low cost as it'll be slow by DDR4 standards.

I'm assuming those 32-bit DDR4 chips are still coming of course ....
It is unknown if replacement of DDR3 will be that trivial. I would have wished for MSFT to launch a more open platform running Windows RT + mantle with something like forward compatibilty being a given.
Now we don't know to which extend the virtual machines are portable.
Xbox Once simply wasn't intended to be a high end gaming console. Ultimate entertainment box, that MS assumed would simply inherit Xbox popularity and userbase.
Not my point, with the 360 the edram was designed so everything works in an optimal manner if you render targets fit in the edram. It seems that MSFT was not that ambitious with durango, it seems that they did not consider the edram as an advantage for the system, even an advantage with its set of trade off, which it seems Sony did, but simply as a work around. Ultimately they spent a lot of silicon and resources on things that are a lot less relevant to a gaming machine, even a not high end one.
I think developers are presented with a tricky task and a not rewarding one, you both have to fit your RT within a constrained memory space and you get no (performances) benefits from doing so.
 
What's still interesssting about the esram is its bandwidth. The 109-204 GB/s are just for the tiny 32 MB space. If you try the same with GDDR5 memory, you wouldn't get this high bandwidth for a tiny fragment of the memory. So this is really really fast, tiny but fast. But what can be done with it. It should be more than enough for a 1080p render target but the large things (textures etc) must come from DDR3 memory.

another thing we don't know of, is the little esram on the die, that is not used for the gpu. it is small, but may be used for some tasks offload them from main memory (lower latency).
 
When MS specced 16 ROPs they also specced 102 GB/s of B/W from the esram.

The fact that AMD delivered something vastly more capable is simply a happy turn of events.
I'm not sure they found later, it sounds more like they communicated the hard figure to the devs, then it got in the wild.
If you consider that, and also look at the PS4, and at the vast array of PC graphics cards from recent years, you see that 160~209 GB/s would everywhere else 'balanced' for 24 or 32 ROPs. e.g. 7790 aka R260 is [16 ROPs, 896 shaders, 96 GB/s], where as 7870 aka R270X with have 32 ROPs.
Actually the HD 7790 is different from both the R7 260 and R7 260X. The former is really close to the HD7790 wrt memory and alu speed but with only 12 CUs enabled, the later run faster with faster RAM.
Not that your point does not stand ;)
Especially when you consider the additional BW from XB0 main memory, I think 16 ROPs now looks like underkill.
Indeed 67GB/s ain't too shabby, the older HD x7xx cards were not granted much more.
Xbonk was supposed to be even less powerful than it's ended up being. Blame late gen Xbox 360 stats of media consumption and Kinect usage, combined with confirmation bias and Xbox being taken over my media obsessed execs.
I've no issue with that decision BUT they priced them-selves out of that market, for now at least and they have not presented that audience with compelling software, and cores gamers have even less intensive to go for the system when there is a cheaper alternative outside that performs significantly better.
 
I wondered about it and I will take 3dilletante's (and others) words for it, pretty much only Intel could have provided such an elegant solution and it might still have ended at a performance deficit against the PS4 UMA backed by 176GB/s of bandwidth.

I think the issue with Durango is whereas it embarks an actually big amount of scratchpad you get none of the usual advantage associated with scratchpads aka really high bandwidth. In my simple word I would say that Durango is not as fast at drawing stuffs as what you would expect from a chip that has access to 32MB of on die memory. The deal should be "once it fits the scratchpad performances should never be a concern" that the promise behind scratchpad, I think it is not delivered.

Fair enough.

I do also wonder how much more expensive it would have been if they asked for double the I/O to the same scratch memory. It somewhat looks like they were not ever going to accommodate MSAA bandwidth requirements or even FP16, which themselves would make 32MB seem that much smaller to work with anyway.

Decisions, decisions...
 
Back
Top