The pros and cons of eDRAM/ESRAM in next-gen

XpiderMX · May 20, 2014

In Lionhead are using the esram for targets and compute shaders:

https://twitter.com/LionheadStudios/statuses/466262880011030529

LightHeaven · May 20, 2014

XpiderMX said:
In Lionhead are using the esram for targets and compute shaders:

https://twitter.com/LionheadStudios/statuses/466262880011030529

Hmmm, hope they will talk soon (GDC Europe?) about the benefits of that, if any.

AlNom · May 20, 2014

LightHeaven said:
Hmmm, hope they will talk soon (GDC Europe?) about the benefits of that, if any.

Better than DDR3... >_>

Deleted member 11852 · May 20, 2014

AlNets said:
Better than DDR3... >_>

But three slides if you use a really big font!

LightHeaven · May 20, 2014

AlNets said:
Better than DDR3... >_>

What I meant to say is if they saw any benefit on using a allegedly low latency ram compared to the main ram, for compute shader work. (Which I think, shouldn't be a problem at all for the ddr3, at least compared to gddr5)

Scott_Arm · May 20, 2014

LightHeaven said:
What I meant to say is if they saw any benefit on using a allegedly low latency ram compared to the main ram, for compute shader work. (Which I think, shouldn't be a problem at all for the ddr3, at least compared to gddr5)

Bandwidth

LightHeaven · May 20, 2014

Scott_Arm said:
Bandwidth

Yeah, but are shaders usually bandwidth bound? (Honest question, which is why I thought it would be nice for them to talk)

upnorthsox · May 20, 2014

LightHeaven said:
Yeah, but are shaders usually bandwidth bound? (Honest question, which is why I thought it would be nice for them to talk)

Moreso bandwidth than latency.

MrFox · May 20, 2014

LightHeaven said:
What I meant to say is if they saw any benefit on using a allegedly low latency ram compared to the main ram, for compute shader work. (Which I think, shouldn't be a problem at all for the ddr3, at least compared to gddr5)

I can see bandwidth being an advantage, and concurrent read/write of the ESRAM too... but was there any leak about the ESRAM latency? So far it looks like it doesn't provide much of an advantage.

I believe DDR3 and GDDR5 have a very similar latency.

taisui · May 20, 2014

SRAM always have latency advantage over DRAM, that's just how it is.

Whether that translate into any actual performance gains on the gaming space is hard to say, since graphics workload are highly tolerant to latency.

sebbbi · May 20, 2014

LightHeaven said:
Yeah, but are shaders usually bandwidth bound? (Honest question, which is why I thought it would be nice for them to talk)

Compute shaders are often memory (bw and/or latency) bound. Most CUDA optimizations guides talk extensively about memory optimizations, while ALU optimizations are not discussed as much (since ALU isn't usually the main bottleneck for most algorithms on modern GPUs).

Cyan · May 20, 2014

LightHeaven said:
Hmmm, hope they will talk soon (GDC Europe?) about the benefits of that, if any.

Judging from sebbbi's words, the benefits are many. Compute shaders are the future and ROPs aren't as necessary as they were and aren't the key to performance.

Arwin · May 20, 2014

This discussion has been around before. But my worry is that the Xbox just wont have the CUs to spare for nice compute.

MrFox · May 20, 2014

taisui said:
SRAM always have latency advantage over DRAM, that's just how it is.

But we don't know how it plays out in this particular implementation. It's weird that even months after launch, while we have many devs praising ESRAM for many reasons, none of them ever said it was because latency. I think the chances are slim.

3dilettante · May 20, 2014

taisui said:
SRAM always have latency advantage over DRAM, that's just how it is.

Whether that translate into any actual performance gains on the gaming space is hard to say, since graphics workload are highly tolerant to latency.

On-die SRAM versus external DRAM, this is virtually certain.
It's not so certain on-die, particularly at large capacities where latency starts to become dominated by the time it takes to cross the arrays, and at that point density starts to win out. IBM's eDRAM analysis put the crossover point somewhere around where the ESRAM is.

Perhaps later disclosures can give a better handle on how much of the memory access process is shared between the DRAM and ESRAM paths.
The DDR interface and devices themselves have a sizeable but fixed latency contribution, and AMD's CPU cache miss latencies are such that we know that the DRAM is not the biggest contributor anymore (ns latencies are over twice what Intel can manage, and Intel's latency must include DRAM interface and device latencies already), much less what the GPU does in its own pipeline.

taisui · May 20, 2014

3dilettante said:
Perhaps later disclosures can give a better handle on how much of the memory access process is shared between the DRAM and ESRAM paths.
The DDR interface and devices themselves have a sizeable but fixed latency contribution, and AMD's CPU cache miss latencies are such that we know that the DRAM is not the biggest contributor anymore (ns latencies are over twice what Intel can manage, and Intel's latency must include DRAM interface and device latencies already), much less what the GPU does in its own pipeline.

Huh? Isn't that exactly the point of having a cache to begin with?
and what does the latency of the DRAM has to do with when a cache miss happens?

Scott_Arm · May 20, 2014

Arwin said:
This discussion has been around before. But my worry is that the Xbox just wont have the CUs to spare for nice compute.

That's why they'll go for 900p, or lower, on x1

3dilettante · May 20, 2014

taisui said:
Huh? Isn't that exactly the point of having a cache to begin with?
and what does the latency of the DRAM has to do with when a cache miss happens?

If an access hits in cache, it doesn't incur main memory latency.
Main memory accesses aren't initiated until it has been determined that data is not in cache.
Memory latency is the sum of all the misses incurred on-die, and then the cost of the memory controller, interface, and DRAM.

The remote cache access latencies are documented, and they are higher than half the memory latency documented. AMD spends more time on-chip trying to figure out if it should hit main memory than it takes for said memory to be accessed.

taisui · May 20, 2014

3dilettante said:
If an access hits in cache, it doesn't incur main memory latency.
Main memory accesses aren't initiated until it has been determined that data is not in cache.
Memory latency is the sum of all the misses incurred on-die, and then the cost of the memory controller, interface, and DRAM.

The remote cache access latencies are documented, and they are higher than half the memory latency documented. AMD spends more time on-chip trying to figure out if it should hit main memory than it takes for said memory to be accessed.

Then why bother with caching? Just go to the RAM every time, no?

HTupolev · May 20, 2014

taisui said:
Then why bother with caching? Just go to the RAM every time, no?

3dilettante is referring to the instance where a cache miss happens.

When you get a cache hit, it's much faster than an external DRAM access would be even if the access was made without checking the cache first.

The pros and cons of eDRAM/ESRAM in next-gen

XpiderMX

LightHeaven

AlNom

Moderator

Deleted member 11852

Guest

LightHeaven

Scott_Arm

LightHeaven

upnorthsox

MrFox

Deludedly Fantastic

taisui

sebbbi

Cyan

orange

Arwin

Now Officially a Top 10 Poster

MrFox

Deludedly Fantastic

3dilettante

taisui

Scott_Arm

3dilettante

taisui

HTupolev

Similar threads