AMD Bulldozer Core Patent Diagrams

I do not expect gamers to care for Llano's GPU performance. That niche will probably go for an Intel CPU and a discrete GPU.
Corporations will not care what frame rate Sandy Bridge has in Crysis.

Corporate buyers and value OEMs with no particular need for graphics performance will find Llano's GPU performance and featureset an interesting checkbox, but that is predicated on the CPU being available and in volume. A competing design that will meet their low expectations will be conveniently available.

Mobile users may benefit from Llano, but Sandy Bridge is being aggressively pushed there as well. AMD's lateness is setting it up to fight for the minority share it has enjoyed in the best of times. That is an improvement over the non-share it is declining to, but the opportunity for far more has been lost.

Instead we have Ontario launched first, which addresses one market that may have peaked in growth potential(netbooks, etc.) and which will be capped by the CULV variants of Intel's chips in the low-end laptop market.
Granted, it does have more time uncontested, and Intel seems intent on chilling customer sentiment by its early press releases.
 
That niche will probably go for an Intel CPU and a discrete GPU.
That depends on how close can Llano come to 5670's perf, where the majority of gpu volume is.

OT, I am curious about the possibility of using the integrated GPU as a specialized frontend to feed the discrete GPU. I wonder if anyone will explore that possibility.
 
The 5670 has 64 GB/sec in memory bandwidth. If a buyer splurged and took advantage of Llano's DDR3-1600 support, it might give the on-die GPU a bit over a third of the bandwidth before taking into account CPU bandwidth consumption.

AMD claims to have revamped the memory controller, though what it could manage with such a bandwidth gap is beyond me.
There is some weakness in Llano's implmentation versus Sandy Bridge. The GPU on Intel's chip is much more tightly integrated into the cache subsystem, which probably has performance benefits.

The bandwidth problem may constrain Llano to some extent, not that it should have any trouble beating Sandy Bridge's GPU. It does appear that Intel has not stated which desktop lines will have full GPU throughput, so the gap may grow or shrink based on that.

AMD has hinted at some kind of magic sauce coming down the driver development pipeline for Llano and discretes, but mysterious hints are easy to make and hard to hold accountable for.
 
I wonder what the chances are that Bulldozer or similar is Microsofts next Xbox CPU architecture. It does seem like a good fit, the server model of throughput first seems to have worked for both the Xbox 360 and PS3 in this current generation so why not go down that route again?
 
The weak CPU method worked, but I don't think developers would have turned their noses up at stronger scalar performance.
Without a firm die size it would be hard to say exactly how much Bulldozer would dwarf Xenon. It would likely be over-engineered for the target workload, though it would be massively more performant.

It would come down to how much could be stripped from the design to save die space, and how much Microsoft would want to relinquish in the freedom to shrink and port the design as it has for Xenon. Bulldozer is not described as being synthesizable and I do not think Microsoft can gain the same rights for tweaking and shrinking the x86 core like it has for Xenon.


I did notice an odd claim in Kanter's article. AMD claimed they weren't using multi-level branch predictors. I don't know if there is a mismatch in terminology, or if AMD is trying something new.
A two-level predictor can be made to hit high levels of accuracy, though it does require an extra layer of indirection to predict a branch.

Hopefully AMD's not trying to get cute with the predictor, given how critical it will be for a speed racer with a long pipeline. Part of the problem is that AMD has not shown leadership in that regard for K8, so we need to hope it has found inspiration where it has performed without distinction for a decade.
 
Getting the right performance / power at the start would take away much / all of the need to shrink and port the design in the future. From what I can tell, Microsoft doesn't seem that keen to do hardware tweaking after launch and most of that desire belongs in the domain of Sony. Beyond this the timescale for the hardware shrinks might fall to every 3 years given the fact that everyone besides Intel seem to be slowing down their process shifts.

If not Bulldozer for outright performance then why not Bobcat? Only then they would lose the scaler performance and amdahl's law is a bother.

Edit: Theoretically, I wonder if its possible to stitch two different types of X86 cores onto the same closed platforms CPU die. Would that even be practical or worthwhile if they had enough time to implement it? Something like say one Bulldozer module and 6 Bobcat cores.
 
Last edited by a moderator:
Microsoft has shrunk Xenon before.
It just re-implemented its Xbox chips to have the CPU and GPU on-die, and it looks like MS did a significant amount of that work itself.
That's actually a step ahead of Bulldozer, which won't have a fusion variant for a while and I doubt MS is going to get permission to fiddle with an x86 core.

There may be other things about Bulldozer that may make it less suitable for the market. If its cache coherence protocol is modified to allow better multi-socket scaling or a directory is put in place, that would be useless for a console and would needlessly impact performance at that level.

That would also go towards the question of having a Bulldozer and Bobcat on the same die. It won't work well if they don't have the same protocols. We do know they do not support the same set of vector extensions.
 
AMD shows die of Orochi, a 32nm 8-core Bulldozer

orochidie1.jpg


orochidieshot1.jpg


Oh boy, oh boy! :D
 
Last edited by a moderator:
The perspective corrected image is mindblowing.

Is this another attempt at some kind of hierarchy?

Did they keep some of the old Shanghai cores around, just in case BD crashed and burned?

Or did they talk intel in to putting some of their old cores to save themselves? :)

Or is that that GF's 32nm process can only make about half the die according to it's customers' specs? :)
 
Apparently this is some sort of future Fusion part, based on Bulldozer architecture, so it may not be representative for what will be the upcoming desktop and server first generation Bulldozer SKUs. :rolleyes:
 
The L3 looks like it is segmented between modules, so there could be some variable latency between a local tile and a non-local tile. Hopefully that means there is better average latency, given that Bulldozer's L2 has latencies closer to an L3 already.

It looks like a more square die than Istanbul, and one side is able to take the full length of the DDR interface.
That is longer than the analagous side of an Istanbul chip, which had to run the interface around two corners.

If the length of that side is enough to get an idea of the scale, Orochi seems likely to be larger than Westmere, which is 248mm2.

I haven't found a good number for the dimensions for Istanbul's interface pads, particularly if they were straightened out.
 
Back
Top