AMD Bulldozer Core Patent Diagrams

Discussion in 'PC Industry' started by Raqia, Apr 16, 2009.

Tags:
  1. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    No wonder the Windows scheduler is confused.

    Cheers
     
  2. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Probably the low idle clocks and TurboCORE are dragging the timings long here. The test is loading each pair of cores once at a time until all non-repetitive permutations are exhausted.
     
  3. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    Shouldn't the cores pick up speed quite fast once they get any load? The test loads each core for several seconds, isn't that enough time?
     
  4. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    That's sounds reasonable...This software was written at a time where there's no turbo and cnq as well as EIST is just for notebook computer.. But it's still too high compared to others. My friends's X5570 with EIST on just reach about 70ns
     
  5. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,264
    Likes Received:
    813
    PII x6 1055T @ 3.7 & 2.4NB, turbo off
    CPU0<->CPU1: 94.4nS per ping-pong
    CPU0<->CPU2: 91.6nS per ping-pong
    CPU0<->CPU3: 91.6nS per ping-pong
    CPU0<->CPU4: 93.6nS per ping-pong
    CPU0<->CPU5: 93.3nS per ping-pong
    CPU1<->CPU2: 93.8nS per ping-pong
    CPU1<->CPU3: 93.3nS per ping-pong
    CPU1<->CPU4: 95.2nS per ping-pong
    CPU1<->CPU5: 96.7nS per ping-pong
    CPU2<->CPU3: 91.2nS per ping-pong
    CPU2<->CPU4: 91.8nS per ping-pong
    CPU2<->CPU5: 92.3nS per ping-pong
    CPU3<->CPU4: 92.3nS per ping-pong
    CPU3<->CPU5: 95.0nS per ping-pong
    CPU4<->CPU5: 95.4nS per ping-pong
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The relevant portions of the memory controller and core arbitration logic were moved on-die with Nehalem.
    The northbridge has been a shadow of its former self since.
     
  7. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    If we talk about, the AMD's uncore structure isn't the same as the Intel?
    What's the core arbitration logic?
     
  8. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    Maybe we can turn cnq & tubro down and see what's gonna happen.
     
  9. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
  10. How is it even possible that they initially "mistook" the number of transistors by that much?

    Could this have been a reason for some layoffs in the marketing department?
     
  11. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    AFAIK, exact count of the planar elements for any IC comes from the manufacturing foundry first. But probably some miscommunication within AMD departments could carry the blame.
     
  12. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    I'd love to know how did AMD's transistor density grow going from 65 to 32nm if those numbers are correct
     
  13. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Llano's density is heavily skewed due to the presence of a highly compact structure like the IGP part, that takes a hefty chunk of the transistor budget.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    AMD is not maintaining a consistent count still. The number of transistors per module it disclosed earlier is 213M, so that x4 plus 400M in the L3 is enough to hit 1.2B, so something still seems off.

    Going by 1.2B, the density scaling is notably inferior to Intel, probably due to that bloated uncore.

    The Anandtech count for SB may not be comparable to AMD's wonky count. They are using the schematic count of 995M, while physically it has 1.16B.
     
  15. fehu

    Veteran

    Joined:
    Nov 15, 2006
    Messages:
    2,068
    Likes Received:
    992
    Location:
    Somewhere over the ocean
    Bulldozer!
    Now with 40% less transistor!
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Pretty much wherever AMD's revised count is showing up, it's being disputed by people who can perform arithmetic.

    If AMD is using a schematic-level count for BD at 1.2B as opposed to the physical count, perhaps it is not comparable to the count given for each module, which may have been using a physical count.
    That could open up a little leeway in the totals per die, but since 100M of each module must be just L2 cache cells, it's not leaving much room for the logic and everything else on the die that's not cache.
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    How many bits of error detection and correction are there per cache line? How many bits in a cache line?
     
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    For what I know, up to K10, AMD used to have ECC for the L1D in 8:1 ratio, i.e. eight ECC bits for every 64 bits of protected data, using 64-bit Hamming SED/DED method. The ECC bits were organized in separate banks along the main L1D array. For the lower cache levels I don't have any reliable information about protection implementations.

    p.s.: In Bulldozer, AMD removed the ECC protection for the L1D caches due to the inclusive relation to the L2, so now any error in the L1D will trigger data reload form the L2, which is ECC protected.
     
  19. Color me Dan

    Regular

    Joined:
    May 19, 2007
    Messages:
    300
    Likes Received:
    1
    Location:
    Sweden
    Does anyone feel like humoring a layman/outside observer?

    I wonder if there is any way for performance to improve over time through "easily" implemented code optimization such as compilers and/or (and I guess) libraries for the Bulldozer uArch. Could (really out of my depth here) microcode be updated if that has any meaningful impact on performance?

    The Anandtech review mentions that Windows 8 ought to have a better scheduler that takes the modular CPU architecture into account which ought to improve performance somewhat. That's what made me think about it as it sort of suggested that some problems could stem from how the CPU is seen, and thus used, by software.

    No doubt there are serious flaws in design that will have to be rectified, I just wonder how much of the performance penalty stems from the architecture directly and how much is due to simple novelty.
     
  20. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    It is possible to get some performance boost after OS manages to distribute threads equally over modules but it only helps as long as you don't load all the cores and even then the benefit is often tiny.

    Biggest problem seems to be godawful cache architecture and only thing fixing it is redesigned chip, not going to happen for at least a couple of years.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...