I thought i read that a normal cache in a CPU like 360 has, the data ''trickles out'', whereas the SPE SRAM cache's doesnt, because its static?
Both the 360's CPU cache and the SPE local store are
built from SRAM cells.
Their function is different though.
A traditional cache, as in the 360's CPU or in a PC/Mac CPU is invisible to the application. An application accesses memory, and the cache, which sits conceptually between the CPU core and the memory, will automatically try to accelerate as many of these memory accesses as it can, by buffering frequently requested portions of the memory, predicting access patterns and fetching larg blocks ahead of time, combining write accesses etc.
Taking advantage of such a transparent cache is fully automatic. The application doesn't really need to worry about the details. You may turn it off completely, and the application still runs, much slower probably; you can build a processor with ten times the cache and the application will get the benefit automatically without any changes to its code.
Local stores are different. They serve a similar purpose in allowing a small amount of data to be kept very close to the processor for very fast access, but they do not manage their contents themselves. The application needs to use the local store explicitly.
In case of the Cell Broadband Engine, access to main memory is an indirect, asynchronous process through a DMA subsystem. The application can "place an order" for a chunk of main memory to be copied in (or out of) the local store, and the system logic will fill that order at some later time. Such a DMA model
alone is not suitable for any real-world programming.
The processor can
only directly access data that is currently in the local store. That means, contrary to a traditional cache, the local store is visible, it has addresses that the application can use to pinpoint exact positions and regions in the local store.
Again, both a traditional cache and a local store are built from SRAM cells. The local store concept is more complex to work with for application programmers, but as it is private to a core, it makes up for it by much more graceful scaling to massive multi-cores.
A traditional cache OTOH is a mirror, a local copy of main memory. When many cores have copies of the same memory region and one of them updates that copy, that creates a difference between main memory and the copy of main memory that shouldn't exist. When other cores work on the same piece of memory they always need to have the most recent version of it to avoid errors.
These issues can be and are solved in practice with
coherency protocols but they have practical limits for their ability to scale, as their implementation gets ever more complex with an increasing number of interdependent caches (=number of cores, in most current architectures).
For what the 360's CPU does, traditional caches are fine.
Cell is better off with local stores though IMO. It's the only practical way to slap so many SPEs on one die in the given size and power envelope.