SRAM...disadvantages and advantages

N3xtG3nGam3r

Newcomer
Was looking at the stat sheet for PS3, and saw that each of the 7 SPE's have their own 256kb of SRAM, as well as the 512 L2 Cache for the PPU.

I read up on it, and saw that unlike DRAM, SRAM is static, therefore memory does not ''trickle'' out.

Could someone elaborate on this for me, and maybe explain the differences, advantages, and disadvantages of SRAM in comparison to traditional style caches?

Also, if you could give examples of how the SRAM cache could be used, and the benefits derived from doing so, both in game and for developers, that would be cool.

Thanks.
 
Was looking at the stat sheet for PS3, and saw that each of the 7 SPE's have their own 256kb of SRAM, as well as the 512 L2 Cache for the PPU.

I read up on it, and saw that unlike DRAM, SRAM is static, therefore memory does not ''trickle'' out.

Could someone elaborate on this for me, and maybe explain the differences, advantages, and disadvantages of SRAM in comparison to traditional style caches?

Also, if you could give examples of how the SRAM cache could be used, and the benefits derived from doing so, both in game and for developers, that would be cool.

Thanks.

Firstly, SRAM is used in cache memory because it is so fast (relative to DRAM) to access and can be accessed in a dual ported manner. The difference in the two applications is purely the manner of accessing data; in a cache, you access it by providing the/any address and (through a hashing process) get your data back whilst in the SPU you provide a raw address and do not hash it in anyway.

The main reason for doing this is that the main non-deterministic process in computation is no longer the 'cost to compute' but the cost to access memory (you don't know if a page/line will be in L1/L2/L3/DRAM/HD). In the target areas for Cell such latencies are intolerable, and generally avoidable, so for the sake of requiring DMA to be handled manually, it is easier to re-define what the cache is. When you consider the multi-processor case, this access time is even more likely to vary because it depends on other processors work-loads. For performance (and simplicity), it then makes more sense to just ditch the entire concept of a shared cache and switch to a local (to minimise bus/access overhead; you'd probably saturate things with all SPUs going full tilt) SRAM store.
 
Sorry, i forgot to mention, i know enough to not be called a noob, however, that explination was still somewhat out of my realm of understanding :).

A little more of a simple explination please?
 
Sorry, i forgot to mention, i know enough to not be called a noob, however, that explination was still somewhat out of my realm of understanding :).

A little more of a simple explination please?

The fundamental difference you need to understand about SRAM vs DRAM is how the data is physically stored. SRAM cells are made up of transistors while DRAM cells are made of capacitors + transistors. A capacitor if you don't know much about it is like a simple battery - it holds a charge for a short time.

Now the simplest SRAM cell (single bit of memory) costs 6 transistors IIRC. While the simplest DRAM cell will have as few as 1 transistor and 1 capacitor. So physical storage wise DRAM has always been a win. Of course there will be different variations for performance reasons but generally a DRAM cell will be much much smaller than a SRAM cell - and require less power.

You mentioned memory "'trickleing out" out DRAM cells. It's because like a battery, DRAM cells lose their charge after a time interval and need to be refreshed. Refreshing takes time. And it adds complications to the way you access memory. Kryton addresses those in his post.
 
Was looking at the stat sheet for PS3, and saw that each of the 7 SPE's have their own 256kb of SRAM, as well as the 512 L2 Cache for the PPU.

I read up on it, and saw that unlike DRAM, SRAM is static, therefore memory does not ''trickle'' out.

Could someone elaborate on this for me, and maybe explain the differences, advantages, and disadvantages of SRAM in comparison to traditional style caches?

Also, if you could give examples of how the SRAM cache could be used, and the benefits derived from doing so, both in game and for developers, that would be cool.

Thanks.
SRAM is just one way to construct memory cells. SRAM uses six transistors per bit of storage, and is really fast. The six transistors per bit make large amounts of SRAM storage large and expensive though, so it is only applied in areas where the speed is critical, such as caches or local stores. Most processor caches are built from SRAM (the only exception that crosses my mind right now is the L3 cache in some Itanium monsters). It's entirely normal. Current PC processors use SRAM caches, and it's the same for the 360, the Playstation 3 and the Wii.

DRAM uses just one transistor per bit of storage, isn't so fast and has certain overheads because the cells have a tendency to leak information and need periodic refreshes. Because of the much smaller footprint it's used for system memory, which needs to be very large.
 
SRAM is just one way to construct memory cells. SRAM uses six transistors per bit of storage, and is really fast. The six transistors per bit make large amounts of SRAM storage large and expensive though, so it is only applied in areas where the speed is critical, such as caches or local stores. Most processor caches are built from SRAM (the only exception that crosses my mind right now is the L3 cache in some Itanium monsters).

Those are all SRAM, and the dies are frigging huge, 596 mm^2 for the dual core 24MB cache version.

The only caches that use DRAM that comes to mind are the 36MB level 3 caches in IBM's Power5 MCMs

Cheers
 
Okay, now could someone elaborate on the differences that the same game on a CPU with normal cache vs. the same game on a CPU with SRAM cache would be?

Thanks.
 
Okay, now could someone elaborate on the differences that the same game on a CPU with normal cache vs. the same game on a CPU with SRAM cache would be?
Normal cache is SRAM-based.
Or are you thinking of SPE-like Local Store vs a traditional automatic cache? That's a completely different issue - and both use SRAM anyway.
 
I thought i read that a normal cache in a CPU like 360 has, the data ''trickles out'', whereas the SPE SRAM cache's doesnt, because its static?
 
Sorry, i forgot to mention, i know enough to not be called a noob.
I beg to differ :D

SRAM is the fast local RAM used on CPUs for cache or local stores or whatever because, at the moment, there's nothing comparable in speed to replace it with. And all SRAM works the same - it's all SRAM after all. If it didn't work the same, it wouldn't be SRAM ;)
 
I thought i read that a normal cache in a CPU like 360 has, the data ''trickles out'', whereas the SPE SRAM cache's doesnt, because its static?
Both the 360's CPU cache and the SPE local store are built from SRAM cells.
Their function is different though.

A traditional cache, as in the 360's CPU or in a PC/Mac CPU is invisible to the application. An application accesses memory, and the cache, which sits conceptually between the CPU core and the memory, will automatically try to accelerate as many of these memory accesses as it can, by buffering frequently requested portions of the memory, predicting access patterns and fetching larg blocks ahead of time, combining write accesses etc.
Taking advantage of such a transparent cache is fully automatic. The application doesn't really need to worry about the details. You may turn it off completely, and the application still runs, much slower probably; you can build a processor with ten times the cache and the application will get the benefit automatically without any changes to its code.

Local stores are different. They serve a similar purpose in allowing a small amount of data to be kept very close to the processor for very fast access, but they do not manage their contents themselves. The application needs to use the local store explicitly.
In case of the Cell Broadband Engine, access to main memory is an indirect, asynchronous process through a DMA subsystem. The application can "place an order" for a chunk of main memory to be copied in (or out of) the local store, and the system logic will fill that order at some later time. Such a DMA model alone is not suitable for any real-world programming.
The processor can only directly access data that is currently in the local store. That means, contrary to a traditional cache, the local store is visible, it has addresses that the application can use to pinpoint exact positions and regions in the local store.


Again, both a traditional cache and a local store are built from SRAM cells. The local store concept is more complex to work with for application programmers, but as it is private to a core, it makes up for it by much more graceful scaling to massive multi-cores.
A traditional cache OTOH is a mirror, a local copy of main memory. When many cores have copies of the same memory region and one of them updates that copy, that creates a difference between main memory and the copy of main memory that shouldn't exist. When other cores work on the same piece of memory they always need to have the most recent version of it to avoid errors.
These issues can be and are solved in practice with coherency protocols but they have practical limits for their ability to scale, as their implementation gets ever more complex with an increasing number of interdependent caches (=number of cores, in most current architectures).

For what the 360's CPU does, traditional caches are fine.
Cell is better off with local stores though IMO. It's the only practical way to slap so many SPEs on one die in the given size and power envelope.
 
Back
Top