The ESRAM in Durango as a possible performance aid

I haven't heard the terms spatial and temporal latency before. Maybe you're thinking of latency vs bandwidth?

Propagation delay does play a role in adding to memory latency but it's very small. Around 0.1ns per cm at most. The memory on PS4 shouldn't be more than a few cm away from the APU so I don't think it'll make a big difference.

Thank you. I don't think the terms are used much when referring to ICs but I have seen them on a few occassions.
 
Here's one thing bothering me about ESRAM to help perfrormance

-Devs havent really talked about it (to be fair all their comments seem super early)

-Microsoft themselves dont talk it up in the vgleaks docs. There is something about maximizing the color ROPS but nothing much.

You'd think if it was a big deal it would be played up in MS docs more.

OTOH I think they typically talked about using the DDR3 for FB, which means they want the ESRAM for tricks I guess.
 
You'd think if it was a big deal it would be played up in MS docs more.

OTOH I think they typically talked about using the DDR3 for FB, which means they want the ESRAM for tricks I guess.

Why it should be a big deal ... isn't it there to help the DDR3 so the system can breathe ?
All the "latency" etc. stuff is grasping at straws. You have brute force or you haven't .
 
Why it should be a big deal ... isn't it there to help the DDR3 so the system can breathe ?
All the "latency" etc. stuff is grasping at straws. You have brute force or you haven't .

nvidia gpu's typically outpower amd ones by 20% or more per flops for example.
 
Is this purely from the cache though or are there other factors.

i dont know, just pointing out for kots that there are differences in flops.

of course we can say both are gcn so differences minimal, but then again 32mb cache vs not is a major architecture difference.
 
Why it should be a big deal ... isn't it there to help the DDR3 so the system can breathe ?
All the "latency" etc. stuff is grasping at straws. You have brute force or you haven't .
The very low latency will make some difference, otherwise why go for this eSRAM?
 
i dont know, just pointing out for kots that there are differences in flops.

of course we can say both are gcn so differences minimal, but then again 32mb cache vs not is a major architecture difference.

That is true, but I am yet to be convinced it will raise performance above a insignificant amount over all tbh.
 
Seems cache is important though. They doubled it Fermi>Kepler



BNogfA2.png



That pdf will not let me cut n paste so I had to make a picture.

ehh you edited your post, must have said something incorrect :p

amd also has the special function units?
 
Seems cache is important though. They doubled it Fermi>Kepler



BNogfA2.png



That pdf will not let me cut n paste so I had to make a picture.

ehh you edited your post, must have said something incorrect :p

amd also has the special function units?

No it doesnt, i edited it because i couldn't discern the context of your post. It was hard to tell if you were talking about PS4 and XB1 or GCN and GTK110 with the they are practically the same.
 
It looks like GK110 has 1.5MB of L2 cache. I wonder how much Tahiti and other AND GPU's has perhaps I will research later.

Kepler has 64KB of L1 cache per SMX which =192 Cuda Cores. This adds up to 896KB or almost another Megabyte on GK110. However GPU's in the Xbone/PS4 class will scale to less obviously.

A 680 only has 512KB of L2 cache.
 
It looks like GK110 has 1.5MB of L2 cache. I wonder how much Tahiti and other AND GPU's has perhaps I will research later.

Kepler has 64KB of L1 cache per SMX which =192 Cuda Cores. This adds up to 896KB or almost another Megabyte on GK110. However GPU's in the Xbone/PS4 class will scale to less obviously.

A 680 only has 512KB of L2 cache.

I believe Tahiti has 768KB of L2 cache per gpu with bandwidth of 614GB/s.
That is what the AMD openCL programmer guide says.

Link to picture containing 7xxx series specs.
http://abload.de/img/gpuspecs7xxxftkld.png
 
i dont know, just pointing out for kots that there are differences in flops.

of course we can say both are gcn so differences minimal, but then again 32mb cache vs not is a major architecture difference.

In the Durango slide they have a page dedicated to the eSRAM explaining its benefits and difference from the eDRAM implementation on the 360. One thing is consistent though; wherever it has been mentioned they have always stated its low latency.
 
I was wrong Haswell NDA is up today. Reading some reviews.

Anand already had this interesting blurb http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3

There’s only a single size of eDRAM offered this generation: 128MB. Since it’s a cache and not a buffer (and a giant one at that), Intel found that hit rate rarely dropped below 95%. It turns out that for current workloads, Intel didn’t see much benefit beyond a 32MB eDRAM however it wanted the design to be future proof. Intel doubled the size to deal with any increases in game complexity, and doubled it again just to be sure. I believe the exact wording Intel’s Tom Piazza used during his explanation of why 128MB was “go big or go home”. It’s very rare that we see Intel be so liberal with die area, which makes me think this 128MB design is going to stick around for a while.

The 32MB number is particularly interesting because it’s the same number Microsoft arrived at for the embedded SRAM on the Xbox One silicon. If you felt that I was hinting heavily at the Xbox One being ok if its eSRAM was indeed a cache, this is why. I’d also like to point out the difference in future proofing between the two designs.

Why is Anand so hung up on the cache thing? I dont think the ESRAM is a cache, so is Xbox doomed?

It also makes me wonder yet again why MS dont go with simple EDRAM? I guess maybe because they dont own bleeding edge foundries that need a use like Intel (Anand points this out)

the 128MB on Haswell is 87mm^2 I think on 22nm. That would make 64 MB only some ~80mm^ on 28nm maybe.
 
I believe the GCN L2 is tied to the number of memory controllers. So XB1/PS4 will be 512KB.

You are right i believe, i just redownloaded the OpenCL manual. And jumped to memory section and saw this.

Each memory channel on the GPU contains an L2 cache that can deliver up to
64 bytes/cycle. The AMD Radeon HD 7970 GPU has 12 memory channels;
thus, it can deliver up to 768 bytes/cycle;divided among 2048 stream cores, this
provides up to ~0.4 bytes/cycle for each stream core.
 
I was wrong Haswell NDA is up today. Reading some reviews.

Anand already had this interesting blurb http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3



Why is Anand so hung up on the cache thing? I dont think the ESRAM is a cache, so is Xbox doomed?

It also makes me wonder yet again why MS dont go with simple EDRAM? I guess maybe because they dont own bleeding edge foundries that need a use like Intel (Anand points this out)

the 128MB on Haswell is 87mm^2 I think on 22nm. That would make 64 MB only some ~80mm^ on 28nm maybe.

Sigh, the eSRAM IS a cache, as in it can act as a source for the gpu, as in its a scratchpad so to speak but it can also be used as a framebuffer. Stop worrying about the XOne, its memory system, while not as straightforward as the ps4's, has a set of benefits attached to it. If you read what ERP, bkillian, Gubbi, sebbbi etc has been saying, you will understand that the system has its own merit. At this point I would advise you take a break, E3 is just by the corner. We will see what games look like on the system there and over the coming months and years.
 
Sigh, the eSRAM IS a cache, as in it can act as a source for the gpu, as in its a scratchpad so to speak but it can also be used as a framebuffer.

That's not what cache means; cache and scratchpad are mutually exclusive. To be a cache it must transparently allocate and synchronize memory with the larger backing store (DDR3). From the software's point of view it wouldn't even exist except for instructions or memory pages set to explicitly bypass it and perhaps some cases where it's forced to be flushed.

A scratchpad is a chunk of memory that lives in a separate address space that the programmer has to manage manually.

The difference is pretty big in terms of the work required to get good utilization out of either.
 
There are hybrid forms though. If I remember correctly, you could lock a bit of the 360s CPU cache and have full control over it ...
 
There are hybrid forms though. If I remember correctly, you could lock a bit of the 360s CPU cache and have full control over it ...

Locked cachelines aren't really the same as scratchpad because it still has to be backed by something (that the CPU may still flush it to) and you still address it via addressing that memory.

There are CPUs that let you reconfigure part or all of cache as a real scratchpad, though. In that case it's no longer cache.
 
Back
Top