How big was P4 trace cache?

Discussion in 'PC Hardware, Software and Displays' started by msxyz, Apr 26, 2016.

  1. msxyz

    Newcomer

    Joined:
    May 5, 2006
    Messages:
    112
    Likes Received:
    46
    Just an historical curiosity... how big (in bits or bytes) was the Pentium 4 cache that contained the decoded x86 instructions? Official documents state it's 12K of microops, with 8 way associativity.

    As for the microops size, their format and how many ops per x86 instruction were generated, I don't think I've ever seen a definitive answer. Some sources say each opcode was 100 bits in length. That would mean the equivalent of 150 Kilobytes for 12K. Looking at this picture of a 180nm P4, however, the trace cache (left side) seems smaller than each half of the 256KB L2 cache.

    www.tayloredge.com/museum/processor/2000_Pentium4.jpg

    It's also not clear to me why the cache contains a number of instructions which is not a "nice" multiple of 2. I would understand it if the cache had 3 or 6 way associativity. Was part of the cache deactivated for sake of improving the yields?
     
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,805
    Likes Received:
    2,067
    Location:
    Germany
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    The trace cache had to serve a number of purposes, with a likely constraint being the number uops Intel found optimal or found it was capable of delivering in a cycle, which per Agner's document turned out to be three in a cycle. The trace cache served as the high-throughput instruction fetch method, and it would physically be adapted to match what Intel needed along those lines.
    The number of lines was a power of two, but the number of entries per line was not. The number of bits per entry was initially not a power of two, which might have been needed to keep the cache physically small enough with the process at the time.

    Longer trace lines would suffer more from poorer utilization when traces failed to properly fit, and the cache needed to be fast enough to service instruction fetch for a high-clock processor. The goal was likely to create a fetch path that was fast, had decent storage utilization, had good fetch bandwidth with the software that would be running on it, and had as much capacity as they could deliver. It seems that what Intel found it could do wasn't a power of 2. Cutting off extra capacity just to make the numbers round out would be detrimental, whereas upping the capacity to make it the next power of two might have been costlier than the expected gain.
     
  4. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    863
    Likes Received:
    264
    Maybe they used trits. 3^5 = 243 ~= 2^8 = 256 It's common in compression, used for more discreed coders than arithmetic coding.
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    Per Agner, there are 2K lines in the trace cache. So far, it's power of 2.
    There are 6 entries per line, which is where we lose the nice math.
     
  6. msxyz

    Newcomer

    Joined:
    May 5, 2006
    Messages:
    112
    Likes Received:
    46
    Thanks for the links. They answered many questions I had and more. I'm still amazed by the complexity of the design of the P4. In hindsight, if Intel wanted to go for higher frequencies above everything else, why not using a simpler design like the PPE? You'll loose some efficiency with older code optimized for other architectures but the cpu would be smaller and run cooler. Are there any factors I'm neglecting?
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    The PPE was not a particularly good processor, other than it was the sort of core that could be designed with a truncated timeline and a reduced budget, in order to fit a general-purpose core with Sony and Toshiba's eventually unsuccessful SPE philosophy.
     
  8. msxyz

    Newcomer

    Joined:
    May 5, 2006
    Messages:
    112
    Likes Received:
    46
    Does anybody have a copy of Intel presentations held at the IDF 2000/2001 about the P4 architecture? I see they're mentioned more than once but all the links I've found (pointing to an Intel FTP) are now dead. :/
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...