RWT Analyzes Bulldozer Benchmarks

AlNom

Moderator
Moderator
Legend
Recently, benchmarks for AMD's eagerly awaited Bulldozer architecture have leaked online. So far, this has mostly created uncertainty about the performance of future products, rather than answering questions.

David Kanter, long time friend at RealWorldTech and always eager to discuss CPU architecture and performance, takes a look at the test system and benchmarks and explains the difficulties in precisely estimating performance. He also goes on to analyze the benchmark results and draws several conclusions about Bulldozer's microarchitecture and performance and what it may mean for future products.

Well, we certainly aren't going to spoil you, but we do encourage you to head over and check out the thorough analysis for yourself! Anyone remotely interested in Banana Dong (*ahem* B3D Codename for Bulldozer) shan't be disappointed.
 
I've tried to imagine what a bad case would be for BD.

I suppose it would be code that didn't use FMA, shuffled a lot (cutting FP throughput in half), had two threads slamming the write pipe with scattered writes that didn't coalesce in the write coalescing cache, and potentially wasn't blocked optimally for the smaller L1.
 
It moves values around within a SIMD register(s).
The XBAR unit can go further in how it can permute vectors than what AVX is able, but it also takes up one of the two FP issue ports.
This could save instruction usage by having a permute move values around within and between vectors in a single operation, instead of having to use multiple less generic shuffles to achieve the same end.
That's in XOP, however, so it may be a very useful instruction that will not get used as much as it could.
 
Back
Top