L0, L1 and L2 Cache

Tahir2

Veteran
Supporter
Hey guys..

Is it possible for someone to explain the difference between L0 (heard that term a few times but not before), L1 and L2 cache?

I thought L1 and L0 sound very similar - is it the same thing using different terminology?

Thanks in advance... have a nice one!
 
I've read some discussion about an L0 operand cache.
I've also read about an L0 cache that was used for accesses to the stack.
I suppose the stack cache could be faster because it specializes for a certain pattern of memory access (possibly dispensing with the support for unneeded addressing modes, I'm not sure).
 
The term L0 apparently is used by anyone however they'd like.

Usually, it seems that the L0 does sit between the CPU core and the L1 cache. Why this doesn't make the L0 the L1, excepting the case where some is counting starting from 0, I don't know.

Sometimes, the cache is very small, somehow specialized, or otherwise not quite as generally useful as the L1.
 
L1 is optional, registers are not. if registers are L0, then either L0 can be very dissimilar to other levels, or maybe L0 has to be something else.

x86 processors use way more registers "under the hood" that 8 or 16, using tricks like register renaming ; just like they decode x86 instructions into something else rather than executing natively.
So maybe or maybe not those "hidden" registers are L0. or L-1? :p

Is a Pentium 4's trace cache L0, and the P4 doesn't have any L1 instruction cache?
 
Read some patents regarding L0 and execution units.
Seems like could be tiny amounts of memory tied to specific units rather than the all inclusive approach of L1.

I have no idea however... still I saw this term first on Beyond3D recently I think just can't find the reference.
 
L0 doesn't make sense to me so I'll leave that alone, but since no one answered the rest of the question I'll give a short answer.

The difference between cache levels is latency. Ideally there would be one really large cache close to the execution pipeline, but this isn't possible so a small low latency cache is placed near the pipeline (L1) and a larger, higher latency cache (L2) is placed further away.
 
Like every cache, its goal is to improve access latency. It is easy to imagine an L0 cache with a 1-cycle latency (instead of today's L1 caches' 2-4 cycles latencies). In turn, since the L0 cache mask L1 access latencies, L1 cache can be made bigger/slower/more associative, e.g. more featureful. L1 cache is traditionally separated between Instruction and Data. L0 could show even more separation, for example one L0 buffer for FP units and one L0 buffer for integer units. If the core is multi-threaded (SMT/CMT/SoEMT...) each thread may have its dedicated L0 "buffers".
The typical size for an L0 array would be 256 bytes -- a kind of 1D queue/stack with performance characteristics similar to the register file itself.
 
Searching further, I think it's only reinforced my initial position that L0 is whatever someone wants it to be.

Assuming they aren't just counting the first level of cache as level 0, people just use the term L0 when they have an idea that involves sticking extra memory in front of the L1 and they don't want to change things fromwhat everyone else calls the caches by renaming the L1 as L2.
 
Thanks for the reply.

It is interesting to see how Intel and AMD differ in their cache architecture. Intel has stuck with 32KiB of L1 cache and increasing amounts of L2 cache for their Core 2 series of processors. Nehalem or i7 changed all that with a modest L2 cache and a new level of cache (the L3 cache).

AMD on the other has stuck to 128KiB of for their L1 cache since the days of the original Athlon introduced in 1999 and a maximum of 1MiB per core of L2 cache. Recently we have seen the introduction of large amounts of L3 cache in AMD processors with their 40+ cycle latencies that have helped with performance (Athlon II X2 2xx versus the Phenom II X2 5xx).

One explanation for the Core 2 processors needing so much L2 cache was due to the old bus design of GTL+, and with QPI Intel has taken a cue from AMD's archtitecture and gone for a point to point system for its chipset designs. Of course we all know AMD took the design from the Alpha EV6 bus.

L0 just cropped up out of nowhere a few times recently on these boards and I just found it interesting we discuss technology and throw these terms around and sometimes we don't realise what they may actually mean (until much later on and I guess there are a few people on here that know a lot more than they are allowed to let on).

A good friend of mine (much more technically inclined than myself) agrees with 3dilettante and believes marketing droids have had a go at engineering documents (again) and come up with the term L0 cache which looks like registers.

However I am not so convinced. I don't believe L1 cache is tied to specific execution units and therefore L0 cache is different but is it simply registry space in other words?
 
Quite a short memories here?

Cyrix found some good use of "L0" cache for their 6x86 architecture -- the 256-byte line for instructions:

754pxcyrix6x86archsvg.png


Note the L1 cache was unified, e.g. code & data in one place.
Some would call it quasi-L2 implementation, but since the 256-byte "extension" to the instruction queue can't be really qualified as L1 by any means, it's dubbed L0+L1 design.
 
Quite a short memories here?

Cyrix found some good use of "L0" cache for their 6x86 architecture -- the 256-byte line for instructions:
I've never really delved into Cyrix history.
That would be about one of dozens of things called an L0.

Note the L1 cache was unified, e.g. code & data in one place.
Some would call it quasi-L2 implementation, but since the 256-byte "extension" to the instruction queue can't be really qualified as L1 by any means, it's dubbed L0+L1 design.
Another example where someone stuck some fiddly SRAM in front of the L1 and called it an L0.
 
Back
Top