This question may be already answered, but maybe someone could summarize for me the similarities and disimilarities between the SPE Local Store and L1 and L2 cache on a typical processor (e.g. Core2Duo).
My understanding (correct me if I am wrong) is that Local Store is basically a fast local Ram. Very fast. It is a working space/scratch pad with very low latency. It isn't like L2 cache in that it doesn't reflect the system memory and that you have to DMA to get data from system memory.
L2 appears, in general, to have higher latency than Local Store but has coherancy with the objects in system memory. L2 requires little management from the programmer.
Level 1 cache, from my meager knowledge, tends to be much faster than L2 cache but also much smaller. I have seen it frequently broken up into 2 parts, instruction and data.
My main interest of discussion (besides getting corrected on my understanding of these technologies) is how do they compare and contrast on a technical level? e.g. execution and latency issues, capabilities, etc.
The second part of interest is how do we see the future unfolding in regards to LS and caches? Will there be some convergance? Could we see L1 caches grow in size to mimick LS? Could LS concepts be brought to PC CPUs, or does the business reality of higher level coding make such an approach unrealistic?
IBM recently announced their roadmap with Cell2 on it with 32 SPEs and 2 PPEs, but not details on the cache arrangements. My guess is with SiO that IBM will be taking a serious look at ZRAM, which could result in inflated LS and cache sizes. Intel mentioned server chips in 2010 where there will be 3 levels of cache in a unique arrangement, with each processor having 32kB data and 32kB instruction cache, 512kB L2 cache (16MB total), and clusters of 4 cores having a common pool of 3MB L2 cache (24MB total). And then there is Terrascale.
So there seems to be a lot of movement in the processor industry (a lot more than over the last 5 years it seems) and I am curious how caches and memory will play out in future designs and what we may, or may not, see in the 2010-2012 timeframe.
My understanding (correct me if I am wrong) is that Local Store is basically a fast local Ram. Very fast. It is a working space/scratch pad with very low latency. It isn't like L2 cache in that it doesn't reflect the system memory and that you have to DMA to get data from system memory.
L2 appears, in general, to have higher latency than Local Store but has coherancy with the objects in system memory. L2 requires little management from the programmer.
Level 1 cache, from my meager knowledge, tends to be much faster than L2 cache but also much smaller. I have seen it frequently broken up into 2 parts, instruction and data.
My main interest of discussion (besides getting corrected on my understanding of these technologies) is how do they compare and contrast on a technical level? e.g. execution and latency issues, capabilities, etc.
The second part of interest is how do we see the future unfolding in regards to LS and caches? Will there be some convergance? Could we see L1 caches grow in size to mimick LS? Could LS concepts be brought to PC CPUs, or does the business reality of higher level coding make such an approach unrealistic?
IBM recently announced their roadmap with Cell2 on it with 32 SPEs and 2 PPEs, but not details on the cache arrangements. My guess is with SiO that IBM will be taking a serious look at ZRAM, which could result in inflated LS and cache sizes. Intel mentioned server chips in 2010 where there will be 3 levels of cache in a unique arrangement, with each processor having 32kB data and 32kB instruction cache, 512kB L2 cache (16MB total), and clusters of 4 cores having a common pool of 3MB L2 cache (24MB total). And then there is Terrascale.
So there seems to be a lot of movement in the processor industry (a lot more than over the last 5 years it seems) and I am curious how caches and memory will play out in future designs and what we may, or may not, see in the 2010-2012 timeframe.