AMD Bulldozer Core Patent Diagrams

Are we comparing this to an OC'ed SB chip?
They regularly scale to similar clock speeds with less voltage, which would give an almost 20% perf lead.
Something is wonky, either with these results, or with BD.
 
Probably SuperPi likes caches, as wolfdale has 3MB L2 for each core.
It's more likely that the main iterative algorithm in SPI benefits from the loop detector cache/buffer in Nehalem and the more advanced L0 uop's cache in SNB.
Merom/Conroe probably also take a bonus to some degree from their more primitive loop caching implementation.
 
Why did SuperPi become popular? If they want something easy that SunSpider bench is even easier to get to and run.
 
you can't run a javascript benchmark a decade ago on a fresh windows 98 install, it is a moving target, and it only runs on a very complex moving target? (a web browser)

can't run it locally either. SuperPI is a 60KB zip file, with a binary that you launch right away. I've even tried it in wine, it's instant and flawless as on any windows OS from 95 to 7.
it also diverges right away when your PC is unstable, and can be left running to do the heat test.
shit.. am having a second instance of superpi diverge :)
 
can't run it locally either. SuperPI is a 60KB zip file, with a binary that you launch right away. I've even tried it in wine, it's instant and flawless as on any windows OS from 95 to 7.
The problem is like said a few posts back the results are possibly meaningless because some CPUs have hardware that works unrealistically well with SuperPi.

Cinebench might be more useful though and I saw that they ran that.
 
The L2 is part of the BD module itself, hence must be a core part of the design. Since all leaks show slow L2, I am afraid this bit might be broken beyond metal spins.

BD+1 then. :p
 
The L2 is part of the BD module itself, hence must be a core part of the design. Since all leaks show slow L2, I am afraid this bit might be broken beyond metal spins.

BD+1 then. :p

The optimization manual hints at it as well, seems their L2 is fubar to a certain extent and they claim BD v2 improves upon this.
 
Why do the BD and Nehalem quads and hex CPUs have a all-cores-active clock that's higher than what they set the "base clock" as?
 
Base of page 109, the sub-chapter about streaming instructions.
That's not about slow L2 in particular, if I got that right it's more about (performance) problems with syncing between write-combine buffer, write coalescing cache and L2. I don't think previous AMD chips were particularly fast there compared to competition, hence the up to 4 times (for multiple streams) and 6 times (non-complete cachelines - granted that's not something nontemporal stores are really designed for) slower quite possibly means it's a disaster (if you hit these conditions).
 
That's not about slow L2 in particular, if I got that right it's more about (performance) problems with syncing between write-combine buffer, write coalescing cache and L2. I don't think previous AMD chips were particularly fast there compared to competition, hence the up to 4 times (for multiple streams) and 6 times (non-complete cachelines - granted that's not something nontemporal stores are really designed for) slower quite possibly means it's a disaster (if you hit these conditions).

Yes, you are correct, I apologise for the limited interpretation I attached to that snippet.
 


On a motherboard with Turbo not fully supported therefore SuperPi for instance scores 19s vs 14.5s on Asus CHVF.
Also Cinebench 10 should be around 28k at stock on this BD sample.

Anyway at least some results from B1 spin.

From leaked benchmarks I saw on net memory clocks really well hitting 2580MHz on 2133MHz stick using normal air cooling. Something not possible on previous AMD CPU's.

Also few people were saying even B1 can hit 4.7-4.9GHz on all core stable @ around 1.5V again air cooled. Energy consumption is this situation was close to Phenom II X6 @4.1GHz and 1.44V
Newer spins are even better clock wise.

Take all above with tiny bit of salt as I'm just passing unconfirmed info here.
 
From leaked benchmarks I saw on net memory clocks really well hitting 2580MHz on 2133MHz stick using normal air cooling. Something not possible on previous AMD CPU's.
More important would be if that actually helps performance. For Phenom IIs it matters very, very little if you have ddr2-800 or ddr3-1600 the cpu is simply too slow to move more data.

Looks though like (unlike other AMD cpus including Llano) Turbo at least does something significant now.
 
More important would be if that actually helps performance. For Phenom IIs it matters very, very little if you have ddr2-800 or ddr3-1600 the cpu is simply too slow to move more data.

Looks though like (unlike other AMD cpus including Llano) Turbo at least does something significant now.


On Phenom II memory clock matters if you push NB clock up, especially with your example of DDR2-800. Granted, if you stick to default NB clock of 2000MHz then going past DDR3-1333 brings little to no benefit, but with NB at 2.8GHz going from 1333MHz memory to 1600/1833MHz brings measurable improvements. This of course while maintaining Cas latencies, otherwise it's a moot on AMD.

Just to list few apps benefiting measurably from better memory clocks:
- Cinema 4D
- SuperPi
- nearly all game engines when not limited by GPU
- Frybench

I would say Intel i7 are less sensitive to memory speeds thanks to their excellent prefetchers and cache structure. On AMD the lower latencies the better performance.

And to address first part of your post, I would hope so. With 8 integer cores and 256bit AVX instructions I do believe higher memory speed will be required to extract as much performance as possible.
 
Back
Top