AMD Bulldozer review thread.

Discussion in 'PC Industry' started by I.S.T., Oct 12, 2011.

  1. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    You could have saved yourself the trouble and just posted it in the doom and gloom thread. :(
     
  2. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    I wonder how well a 16 core bobcat CPU at 3 ghz would do vs. an Interlagos 16 core.

    A bright side is that there do seem to be some clear bottlenecks to this architecture when it comes to cache conflicts. Atleast it's a straight path forward as far as corrections go.
     
  3. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,079
    Likes Received:
    1,519
    Location:
    France
  4. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
  5. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
  6. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,533
    Likes Received:
    139
    Probably worse, given even narrower FP, half speed L2, more primitive front-end and so on.
     
  7. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    You might even be able to squeeze in 24-32 bobcat cores (given its 2.5-5 watt TDP) at the same power envelope. I imagine a fullspeed cache and a higher clockrate might also do such a CPU wonders; its branch prediction is supposed to be quite advanced. From these reviews, it seems like BD is just pushing for throughput; so would a multiple core bobcat push past BD's performance per watt?

    I feel some mild dissapointment after being promised improved IPC, but as far as pure processor design goes, AMD is always quite good considering its 1/10th research budget vs. Intel. However, it seems like GF
    s process just can't compare w/ Intel's. It's kind of a shame that AMD is only projecting 10-15% for PD whereas Intel is projecting something like 20% for IVB (much of which must be a result of its 22nm finfet process). For now, Intel's process advantage seems insurmountable, but here's hoping its investment in ATI / Fusion yields it the competitive advantage and profits it needs to continue to atleast compete for the next 5 years.
     
  8. GZ007

    Regular

    Joined:
    Jan 22, 2010
    Messages:
    416
    Likes Received:
    0
    The problem is that if they pair 2 BD modules in the next desktop fusion, they will have even worse performance than the A8 fusion. :shock: And same high TDP.

    In multi socket server worlkloads with 1000+ threads BD probably runs fine.(just not on windows :lol:) Thats probably how it was designed. Too bad for desktop it fails with its lackluster single thread IPC (in games for example sure).
     
  9. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,974
    Likes Received:
    933
    Location:
    Somewhere over the ocean
    this is the barcellona2, let's just wait pilediver
     
  10. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,533
    Likes Received:
    492
    Location:
    Varna, Bulgaria
    From what I can gather reading the reviews and benches, BD's most preferred workload type seems to be massively threaded integer applications with good memory locality -- it's likely that this is a good new for server grade performance too. Anything that pushes the cache sub-system and the shared FPU in a more complex manner, drags the new architecture back. The good aspect here is the definitely improved DRAM performance that is evident in the more bandwidth constrained situations. The other likely bottleneck is the OS scheduler, not being well suited for the new cache and core organization in BD, potentially wrecking both TurboCORE performance and power management.
     
  11. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,627
    Likes Received:
    1,060
    BD might be a decent server chip, but I'm really surprised how much AMD dropped the ball with BD as a desktop chip.

    Things to improve:
    1. Fix the false aliasing L1 caches (increase associativity to pad up the bits in the index/tags)
    2. Optimize L2 cache for desktop, smaller and faster.
    3. Optimize L3 access. In terms of cycles BD's L3 takes two and a half times longer to access compared to Sandy Bridge. Lowering L2 latency will help here too, - or start L3 access in parallel with L2 access on a L1 miss.
    4. Fix AVX performance. AVX being slower than SSE is just ... wtf?

    Cheers
     
  12. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,627
    Likes Received:
    1,060
  13. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    Didn't each core in a module have their private L1 caches so there shouldn't be any trashing there, especially in the instruction cache that shouldn't even contain any data that can be changed by other cores?
     
  14. Farid

    Farid Artist formely known as Vysez
    Veteran Subscriber

    Joined:
    Mar 22, 2004
    Messages:
    3,844
    Likes Received:
    108
    Location:
    Paris, France
    It's really like the Althon-era was a fluke for AMD CPUs. These BD benchmark results are some sad stuff.

    I mean, that Super Pi (1M) result of 11.8 sec with a liquid nitrogen 7.5 GHZ overclocked BD is some reedonkulous matter.
    http://www.overclockers.com/amd-fx-8150-bulldozer-processor-review

    And that power consumption compared to the much better Intel SB chip is also something else:
    [​IMG]

    Time to jump into the "Wait for <next architecture >" bandwagon, I guess.

    Would be good, yes, if Xeons didn't command better performances still at a much lower TDP. And the price difference isn't all that in favour AMD when the whole price of a server is considered.
     
  15. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,627
    Likes Received:
    1,060
    Each core has its own data cache. The I-cache is shared and only two-way set associative. That is effectively one way per context of a module and, apparently, prone to thrashing.

    Cheers
     
  16. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    Ok, thanks for clearing that up.
    Anyone has seen any benchmarks where they compare it with and without the OS patch?
     
  17. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,533
    Likes Received:
    492
    Location:
    Varna, Bulgaria
    Something interesting I found, from hardware.fr's review:

    [​IMG]

    "Latence" = Latency
    "Debit" = Throughput
     
  18. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,533
    Likes Received:
    139
    No patch is out, and AMD's proposition for Linux is a bit dubious. Also, it's at best 3%, for special cases, by AMD's admission, so it's not a life-saver.

    A small note about the reviews. There's a set of x264 binaries being mentioned, that are supposedly the XOP codepath and the AVX codepath, or something like that - they're not. They're just the latest dev branch from Dark Shikari (as of a few days ago), compiled with gcc 4.6.2 (MinGW really), with -march={bdveri,corei7-avx}. If people would actually check the encoder's output they'd see that they get XOP with the supposedly AVX-only one too. Just as an useful tidbit.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...