AMD Bulldozer review thread.

I wonder how well a 16 core bobcat CPU at 3 ghz would do vs. an Interlagos 16 core.

A bright side is that there do seem to be some clear bottlenecks to this architecture when it comes to cache conflicts. Atleast it's a straight path forward as far as corrections go.
 
I wonder how well a 16 core bobcat CPU at 3 ghz would do vs. an Interlagos 16 core.

A bright side is that there do seem to be some clear bottlenecks to this architecture when it comes to cache conflicts. Atleast it's a straight path forward as far as corrections go.

Probably worse, given even narrower FP, half speed L2, more primitive front-end and so on.
 
Probably worse, given even narrower FP, half speed L2, more primitive front-end and so on.

You might even be able to squeeze in 24-32 bobcat cores (given its 2.5-5 watt TDP) at the same power envelope. I imagine a fullspeed cache and a higher clockrate might also do such a CPU wonders; its branch prediction is supposed to be quite advanced. From these reviews, it seems like BD is just pushing for throughput; so would a multiple core bobcat push past BD's performance per watt?

I feel some mild dissapointment after being promised improved IPC, but as far as pure processor design goes, AMD is always quite good considering its 1/10th research budget vs. Intel. However, it seems like GF
s process just can't compare w/ Intel's. It's kind of a shame that AMD is only projecting 10-15% for PD whereas Intel is projecting something like 20% for IVB (much of which must be a result of its 22nm finfet process). For now, Intel's process advantage seems insurmountable, but here's hoping its investment in ATI / Fusion yields it the competitive advantage and profits it needs to continue to atleast compete for the next 5 years.
 
The problem is that if they pair 2 BD modules in the next desktop fusion, they will have even worse performance than the A8 fusion. :oops: And same high TDP.

In multi socket server worlkloads with 1000+ threads BD probably runs fine.(just not on windows :LOL:) Thats probably how it was designed. Too bad for desktop it fails with its lackluster single thread IPC (in games for example sure).
 
From what I can gather reading the reviews and benches, BD's most preferred workload type seems to be massively threaded integer applications with good memory locality -- it's likely that this is a good new for server grade performance too. Anything that pushes the cache sub-system and the shared FPU in a more complex manner, drags the new architecture back. The good aspect here is the definitely improved DRAM performance that is evident in the more bandwidth constrained situations. The other likely bottleneck is the OS scheduler, not being well suited for the new cache and core organization in BD, potentially wrecking both TurboCORE performance and power management.
 
BD might be a decent server chip, but I'm really surprised how much AMD dropped the ball with BD as a desktop chip.

Things to improve:
1. Fix the false aliasing L1 caches (increase associativity to pad up the bits in the index/tags)
2. Optimize L2 cache for desktop, smaller and faster.
3. Optimize L3 access. In terms of cycles BD's L3 takes two and a half times longer to access compared to Sandy Bridge. Lowering L2 latency will help here too, - or start L3 access in parallel with L2 access on a L1 miss.
4. Fix AVX performance. AVX being slower than SSE is just ... wtf?

Cheers
 
also no I-cache thrashing.
Didn't each core in a module have their private L1 caches so there shouldn't be any trashing there, especially in the instruction cache that shouldn't even contain any data that can be changed by other cores?
 
It's really like the Althon-era was a fluke for AMD CPUs. These BD benchmark results are some sad stuff.

I mean, that Super Pi (1M) result of 11.8 sec with a liquid nitrogen 7.5 GHZ overclocked BD is some reedonkulous matter.
http://www.overclockers.com/amd-fx-8150-bulldozer-processor-review

And that power consumption compared to the much better Intel SB chip is also something else:
myw3i.png


Time to jump into the "Wait for <next architecture >" bandwagon, I guess.

it's likely that this is a good new for server grade performance too.
Would be good, yes, if Xeons didn't command better performances still at a much lower TDP. And the price difference isn't all that in favour AMD when the whole price of a server is considered.
 
Didn't each core in a module have their private L1 caches so there shouldn't be any trashing there, especially in the instruction cache that shouldn't even contain any data that can be changed by other cores?

Each core has its own data cache. The I-cache is shared and only two-way set associative. That is effectively one way per context of a module and, apparently, prone to thrashing.

Cheers
 
Ok, thanks for clearing that up.
Anyone has seen any benchmarks where they compare it with and without the OS patch?
 
Ok, thanks for clearing that up.
Anyone has seen any benchmarks where they compare it with and without the OS patch?

No patch is out, and AMD's proposition for Linux is a bit dubious. Also, it's at best 3%, for special cases, by AMD's admission, so it's not a life-saver.

A small note about the reviews. There's a set of x264 binaries being mentioned, that are supposedly the XOP codepath and the AVX codepath, or something like that - they're not. They're just the latest dev branch from Dark Shikari (as of a few days ago), compiled with gcc 4.6.2 (MinGW really), with -march={bdveri,corei7-avx}. If people would actually check the encoder's output they'd see that they get XOP with the supposedly AVX-only one too. Just as an useful tidbit.
 
Back
Top