Operation of cache hierarchies (?)

I don't program and have only peiced together what I've read either in forums or online articals.

Do I have the general principals right?


K8
Live code and data are loaded directly into the level 1 caches and executed as required. When new code or data are required the old code is dumped into the level 2 cache for either later retrival or ultimately dumped back into main memory. This is why K8 shows a relative insensitivity to L2 cache size, and with very limited prefetch, shows very high sensitvity to main memory latency.


P4
Live code and data are loading ino the level 2 cache, a working copy of instructions and integer data only are loaded into the respective level 1 caches, floating point data are accessed directly from the level 2 cache. Completed code is dumped back to the level 2, replacing the original. An efficient prefetch mechanism minimises main memory sensitivity with the goal being to have P4's operate as much as possible from cache, and so performance directly correlates to level 2 cache size.


Core 2 Duo
I'm making a half educated guess it more closely follows the K8 model of operation, with the advantages of an extremely advanced prefetch mechanism and an out of order allocation of FSB operations minimising main memory latency.


I'm curious as to what expectations people have of the impact of a level 3 cache being incorperated into the upcoming K8L. Any improvement in the execution units aside, my assumption is it will be implimented with an advanced prefetch mechanism (probably not quite up to Core 2 Duo's level), and so act as a prefetch buffer (as opposed to the L2 currently being a victim buffer) therefore having a sizable impact on performance. I'm also guessing the performance impact would likely make it architectually worthwhile on dual cores, not just quad cores, though die size constraints might make this uneconomic.


Please, correct me where I'm wrong, and shoot down my theory.
 
Core 2 is kind of like Pentium M and NetBurst together, so I'm not sure if your comparison to K8 is correct, i'd look more towards the Dothan, Madison, and Presler, for examples of how it handles it's FSB and cache (Intel has been quoted as saying that the cache design is large enough/efficient enough to ensure the FSB is never 100% laden with data) also consider that Core 2 utilizes a shared L2, K8 doesn't iirc
 
Back
Top