AMD Bulldozer Core Patent Diagrams

1110072150d58dedb8d9f88b0b.jpg

111007215009ca15bc7ac63582.jpg

1110072150afdd16d70c377f58.jpg

111007215037eccbe47e4df6ff.jpg
 
So we have consumer packaging with product inside...where are the game benchmarks? :)
 
I wonder how good the watercooler is. Generally low-end watercoolers have been significantly worse than good air coolers.
 
I've never seen a retail CPU with liquid cooling included, has this been done in the past?
 
AMD needs 5GHz FX4xxx to match Deneb 3.7GHz :rolleyes:
As a desktop CPU with leaked so far performance I can't see it for long at launch price. It's more Core i3 competition and AMD's own 3 core than Deneb improved replacement.

I hope for their sake that server workloads are more forgiving, otherwise another K5
 
I wonder how good the watercooler is. Generally low-end watercoolers have been significantly worse than good air coolers.
It's a pity they don't preassemble the block with the CPU without a heatspreader, this would be a huge boost in cooling performance.
 
The thing is sure quite fast in well threaded integer workload. That gives some hope for the server performance.
 
AMD needs 5GHz FX4xxx to match Deneb 3.7GHz :rolleyes:
As a desktop CPU with leaked so far performance I can't see it for long at launch price. It's more Core i3 competition and AMD's own 3 core than Deneb improved replacement.

I hope for their sake that server workloads are more forgiving, otherwise another K5

Thats just 2 modules, half BD. The FX4xxx is probably a dirty cheap salvaged SKU.

I think AMD made a mistake to count one module as 2 full cores. People compare 4 core vs 8 core and say its shit. No, it just works as intended and its late to the party.
 
They may have tried at one point. Some early marketing slides had counted each module as a core, either through an early error or a possible bluff.
Unlike GPUs, the definition of a CPU core is more robust, and there are some notable companies that would be risking many millions of dollars if AMD got away with it.

As the physical path that a thread of execution takes through silicon, the cores in a module are mostly separate, and the parts that are shared are not required to be unique. The instruction issue and control circuits are physically separate.
 
They may have tried at one point. Some early marketing slides had counted each module as a core, either through an early error or a possible bluff.
Unlike GPUs, the definition of a CPU core is more robust, and there are some notable companies that would be risking many millions of dollars if AMD got away with it.

As the physical path that a thread of execution takes through silicon, the cores in a module are mostly separate, and the parts that are shared are not required to be unique. The instruction issue and control circuits are physically separate.

Well average joe will see new uarch cpu beaten by old PIIs and imo this will give really bad impression. Marketing shot in the foot imo.
 
One curious result that grabbed my attention is the AIDA's Sin-Julia benchmark. This small bench is a pure x87 multi-threaded code that fits entirely in the cache (256KB data per thread). Looks like, contrary to my early assumptions, the legacy FP throughput in BD is much weaker.
 
I am hoping somebody (Techreport? Hardware.fr?) will do proper synthetic benchmarking to see where the problem lies. Cause this is definitely not what has been promised (ie. 50% more perf for 33% more cores). Cache subsystem looks like first suspect.
 
Cause this is definitely not what has been promised (ie. 50% more perf for 33% more cores). Cache subsystem looks like first suspect.
First they didn't specify what clock frequencies they were comparing. Second they dropped it from 50% to something like 33% a few weeks ago.

My guess is they hoped to release BD at around 5GHz non-turbo and way over that with turbo but GF manufacturing process wasn't up for the task.
 
Oops...

Actually, we already have such an issue known for Bulldozer, and NO bench-marked system has the patch installed!

The shared L1 cache is causing cross invalidations across threads so that the prefetch data is incorrect in too many cases and data must be fetched again. The fix is a "simple" memory alignment and (possible)tagging system in the kernel of Windows/Linux.

I reviewed the code for the Linux patch and was astonished by just how little I know of the Linux kernel... lol! In any event, it could easily cost 10% in terms of single threaded performance, possibly more than double that in multi-threaded loads on the same module due to the increased contention and randomness of accesses.

Not sure if ordained reviewers have been given access to the MS patch, but I'd imagine (and hope) so! Last I saw, the Linux kernel patch was still being worked on by AMD (publicly) and Linus was showing some distaste for the method used to address the issue. One comment questioned the performance cost but had received no replies... but you don't go re-working kernel memory mapping for anything less than 5-10%... just not worth it!
Source
 
Seems like the sort of thing you'd expect them to have thought of in the design stage & have solved long before having physical hardware :???:

I'd really been expecting much higher L3/northbridge clocks, still only 2.2ghz is pretty lame.
 
Back
Top