Will L2 cache on X360's CPU be a major hurdle?

Korrupt

Newcomer
X360's CPU has to share 1mb of L2 cache between 3 cores and 6 threads while PS3 has 256kb L2 cache per SPE. Will that 1mb L2 cache be a limiting factor in comparison to the Cell's L2 rich CPU?
 
The 256KB of memory on each SPE is essentially a cache, however it has characteristics that seperate it from a conventional cache
 
SPEs have 256KB of local store and no cache. CELL PPE has 512KB L2 cache and (I assume) 64KB L1.

Xenon has 1MB of shared L2 cache between 3 cores which each have 64KB L1 cache.

At least that is how I remember each design. (I am sure Shifty can correct me if I am wrong).

Will cache be an issue next gen? Of course, every limitation is an issue.

But as noted by others in the past, 3D games tend to use less cache (partly because there is a lot of streaming). A shared cache has some advantages like easy sharing between processors. Another nice thing about 1 big cache is that a large chunk of code can fit into it. Numerous small caches could run into issues where you have 1 large program and 4 or 5 small ones. The small ones are no problem in either design, but if you have a large chunk of code, lets say 600KB large, it wont fit into a smaller segmented cache. So in that case a large 1MB cache shared is better than 3 smaller caches.

In general cache will be an issue in the 360. It will be something developers keep in mind when designing their engines and in some cases will just need to design around.

Ditto the SPEs. Design to its strengths while minimizing any limitations.

Another benefit mentioned in regards to separate caches is it could minimize the effects of thrashing on the other cores.
 
cobragt said:
The 256KB of memory on each SPE is essentially a cache, however it has characteristics that seperate it from a conventional cache

If it's a cache then why not call it a cache? It's not a cache according to the engineer who designed it...
 
cobragt said:
The 256KB of memory on each SPE is essentially a cache, however it has characteristics that seperate it from a conventional cache

It's not a cache, but, like a cache, it is 0-wait state memory. Cell has 256*7 + 64 = 1856 kB of 0-wait state memory, and 512kB of small-wait state memory (PPE's L2). Xenon has 64*3 = 192kB of 0-wait state memory, and 1MB of small-wait state memory. It's largely anyone's guess as to how these numbers actually affect realworld performance at this point.
 
Wait a minute, each SPE doesn't have 256kb? So it's 256kb cache for all the SPE's? That's pretty low then...
 
mckmas8808 said:
It's not a cache, but, like a cache, it is 0-wait state memory.

What's so good about 0 wait-state memory?

It allows the execution core to do a load or store or instruction fetch at its full clock speed. Otherwise, it would have to wait for memory with each instruction fetch or data access, reducing its performance to a mere fraction of its full clock speed.
 
Korrupt said:
Wait a minute, each SPE doesn't have 256kb? So it's 256kb cache for all the SPE's? That's pretty low then...

Cell has 256kB of L1 memory for each SPE. (L1 does not mean L1 cache, in case anybody wants to nitpick.)
 
Correct me if I'm wrong but doesn't the 512kb of cache in PPE in CELL have direct access to main memory while the LS of the SPEs have to go through the cache to get to main RAM?
 
PC-Engine said:
Correct me if I'm wrong but doesn't the 512kb of cache in PPE in CELL have direction access to main memory while the LS of the SPEs have to go through the cache to get to main RAM?

"through the cache" you say that as if that were a bad thing :)

And you can just lock the L2 cache if you want to actually just bypass it.
 
Korrupt said:
Wait a minute, each SPE doesn't have 256kb? So it's 256kb cache for all the SPE's? That's pretty low then...

No, there is 256kB per SPE, so 256kBx7.

What's the 64 in phat's Cell calculation?
 
creon100 said:
What's the 64 in phat's Cell calculation?

64kb of L1 cache on the PPE + 256*7 for the L1 type sram in the SPEs. Then there is the 512kb of L2 cache on the PPE.
 
L1 and L2 cache are highly automated memory systems, which makes them simple to program for.

The only difference between L1 and L2 cache and the memory found on each SPE is that developers can dictate the way they wish the memory to be used on each SPE, increasing efficiency, however it operates much in the same manner as a cache would, that's why the local memories aren't traditonal cache.
 
inefficient said:
PC-Engine said:
Correct me if I'm wrong but doesn't the 512kb of cache in PPE in CELL have direction access to main memory while the LS of the SPEs have to go through the cache to get to main RAM?

"through the cache" you say that as if that were a bad thing :)

And you can just lock the L2 cache if you want to actually just bypass it.

The point I'm making is that how can LS function as cache if it doesn't have a direct connection to main RAM? If it can function as cache then why does it need to connect to another pool of cache to get to main RAM? AFAIR LS can't bypass the 512kb and go straight to RAM. Isn't the point of cache to cache data from main RAM so that you don't have to go offchip since it's a lot slower? If LS doesn't have a direct path to main RAM then how can it fill its LS and operate as cache? In essence the 7 SPEs will be fighting with the PPE over the 512kb of actual cache.
 
Can we get back to the topic at hand? I think the thread starter is asking whether or not it was a design mistake to have a shared cache for the three processors. Particularly I would like to know if there is a way to partition the cache so that each processor can get a specified chunk of cache space. If there is no way to partition the cache, then is there a way to prevent one processor from filling the cache with its pages of memory which then push the pages of memory out which belong to the other two processors?
 
Back
Top