Processor cache architectures

Discussion in 'PC Hardware, Software and Displays' started by Infinisearch, Sep 17, 2015.

  1. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Does anybody have a good article or resource on CPU cache architectures? Basically I want to know about bus's (ala P6 backside bus) and data flow. (if L1 miss then L2 check, if L2 miss then... and what the max latency for a memory access is) Also if anyone knows a good article on DDR operation and timing that would be appreciated as well. Thanks.
     
  2. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Is the following of any help? It's a reasonable introduction.
    ?

    I did have a link to a page which was a "superset" in the sense that this talk was referenced, but for the moment I can't find where I've stored it!
     
    Lightman likes this.
  3. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,986
    Likes Received:
    847
    Location:
    Planet Earth.
    rapso, iroboto, sebbbi and 2 others like this.
  4. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Thank you both. I watched the video but it doesn't have the depth I'd like, in middle of looking at the pdf.
     
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
  6. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Thanks sebbbi forgot about RWT, it answered some of my questions. That is the level of article I was looking for but jaguar doesn't have an L3, which is one of the things I was curious about (namely intel's L3 slice arch). I checked their Haswell microarchitecture article but it glances over the L3 since its a part of the system architecture, it talks about it as if either you have prior knowledge or they discussed it before. Guess I'll dig some more for that specifically.

    Thanks once again... Roderic I thought your paper would be useful so I posted it in a thread on gamedev.net, sorry if I beat you to it.
     
  7. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Read the Sandy Bridge analysis for detailed Intel L3 cache information:
    http://www.realworldtech.com/sandy-bridge/8/
     
  8. rapso

    Newcomer

    Joined:
    May 6, 2008
    Messages:
    215
    Likes Received:
    27
    @Infinisearch
    what questions about the l3 do you have? maybe some forum's member has the answers;)
     
  9. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Bingo: Link that I wanted reappeared in twitter feed: https://gist.github.com/ocornut/cb980ea183e848685a36
     
  10. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,986
    Likes Received:
    847
    Location:
    Planet Earth.
    Knowledge should be shared freely :) (And I was on holiday so didn't check emails or forums at all ^^)
     
  11. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Basically I was wondering about the implementation of L3's in AMD and Intel cpu's. In addition Intel's modern L3's are hooked up as cache slices on a ring bus, and I was wondering what the latency penalty for a non-local slice is and how the tags are handled? Is the tag ram local to the slice? Are tags replicated across slices?...? Stuff like that and any tidbits on layout constraints in regards to bus's.
     
  12. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    542
    Likes Received:
    171
    The systems AMD and Intel use are very different.

    One cycle per hop in the ring bus, both on the request and on the response. Since the ring bus is unidirectional, with requests flowing in the opposite direction, this often means you need to go over the system manager/memory controller hops too.

    Tags are located on the same slice as the cache lines they cover.

    Intel uses the the L3 as a cache that is fully inclusive of all the levels below it. Physical addresses are striped evenly across all L3 slices -- that is, you cannot only access the local slice, all cores access all L3 slices evenly. Each tag entry in L3 has a bit for every possible consumer of the cache.

    For example, if core #1 wanted to read a cache line, whose address naturally resides in the L3 slice #3 and which was currently held as modified in L1 of core #2, the following would happen:
    • Core #1 sends a request upstream with the address. It will cross the system agent, then hop back down the other side until it reaches L3 #3.
    • The line is currently held in some cache, so L3 has a valid tag entry for it. That tag entry currently lists it as modified by Core #2, so the data currently held locally in L3 is not valid. L3 #3 sends another request upstream, addressed to Core #2.
    • Core #2 sets the line as shared, sends the the dirty data downstream to L3 #3.
    • L3 #3 receives the data, updates it's own copy, sets tag bits shared, held by core #2 and held by core 1, sends data downstream to core #1.
    • Data hops it's way around the ring bus until it reaches core #1, is read into L1 and used.
     
    #12 tunafish, Oct 1, 2015
    Last edited: Oct 1, 2015
  13. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Thanks tunafish, I had already read the article sebbbi had posted when you posted. But thanks all the same.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...