Haswell vs Kaveri

Update: AMD sent along an official statement (which they've issued previously) on Kaveri availability: "AMD's ‘Kaveri’ high-performance APU remains on track and will start shipping to customers in Q4 2013, with first public availability in the desktop component channel very early in Q1 2014. ‘Kaveri’ features up to four ‘Steamroller’ x86 cores, major heterogeneous computing enhancements, and a discrete-level Graphics Core Next (GCN) implementation – AMD’s first high-performance APU to offer GCN. ‘Kaveri’ will be initially offered in the FM2+ package for desktop PCs. Mobile ‘Kaveri’ products will be available later in the first half of 2014." If we read "customers" as the large OEMs that make desktops, then we may or may not have actual Kaveri hardware in hand for testing this year, but we'll wait and see.
 
The slide says "new next gen" but it could refer to Kaveri being GCN 1.1 while Kabini & Temash could be GCN 1.0
Kabini/Temash is GCN 1.1.
Anyway, looks though like the article was corrected, as it is now specifically mentioning "high-performance APU".
 
I vaguely remember reading that SteamrollerA was for AM3+ (4+ modules, L3 cache, Opteron and Phenom, now scrapped plans) and SteamrollerB was all along for APU's with HSA glue.

Please correct me if I'm wrong ...

So there will be no more advancements on the AM3+ platform?
 
If they don't have a great bandwidth solution I'm not sure why anyone other than the usual extreme budget shoppers (and confused fankids) will care about this new APU. Unless Steamroller B-type is a real IPC hotrod (not holding my breath).
 
So there will be no more advancements on the AM3+ platform?

There is a chance for Piledriver power optimized revision appearing not only for servers but also for AM3+. I think it was codenamed Warsaw.

No official plans I'm aware of for AM3+ Steammy and no point doing Excavator for old socket so I think AM3 reached it's top.
AMD is very heterogeneous computing focused and it makes sense to move away from pure CPU cores to CPU+GPU hUMA enabled solutions to promote new direction. More and more software can take advantage of GPU compute and with Kaveri and future revisions of APU's more and more algorithms will be able to efficiently tap into massive GPU resources.

BTW I have no insider info and only go on what I managed to read and analyse on internet and from AMD Partner communications.
 
Ah. What I meant to ask is "will AMD still be releasing big and powerful 8+core desktop CPUs?".
I suppose the answer is yes, but all their CPUs will include GPUs from now on..?
 
Ah. What I meant to ask is "will AMD still be releasing big and powerful 8+core desktop CPUs?".
I suppose the answer is yes, but all their CPUs will include GPUs from now on..?
A 8+ core chip with a GPU included would be extremely huge even on 28 or 22nm.

There were some rumors about a 6 core + GPU hiend Kaveri SKU. However, it's pretty sure to say they will stick with 4 cores and a GPU for some time.
 
But it would be nice if they simply did it with a 128SP GPU. But maybe this goes at odds with their strategy, while there is demand for a strong CPU and weak GPU (i.e. photoshop, music production) they're set on their HSA path.

Maybe they'll do, with socket FM3, newer process and the AM3 socket officially dead.
 
A 8+ core chip with a GPU included would be extremely huge even on 28 or 22nm.

There were some rumors about a 6 core + GPU hiend Kaveri SKU. However, it's pretty sure to say they will stick with 4 cores and a GPU for some time.

Vishera is 315mm^2 at 32nm, Pitcairn/Curacao is 212mm^2 at 28nm.

Two of those together (plus glue minus redundant memory controllers) could perfectly be 300m^2 or less, using 20nm (GloFo and TSMC will use 20nm, not 22nm).

It would turn out quite an interesting chip, as long as they figure out how to boost bandwidth. Just supporting higher DDR3 speeds as they've done until now won't cut it, obviously.
 
A 8+ core chip with a GPU included would be extremely huge even on 28 or 22nm.

There were some rumors about a 6 core + GPU hiend Kaveri SKU. However, it's pretty sure to say they will stick with 4 cores and a GPU for some time.

The L3 takes up a lot of space on Piledriver, and the APU's wouldn't have any.

I have my doubts they'll go with an octo core with graphics though. There doesn't really seem to be any point.
 
The L3 takes up a lot of space on Piledriver, and the APU's wouldn't have any.
We can't be sure about the presence of L3 in future APUs..
They could be targeting at a huge L3 coherent with the GPU as a way to compensate the memory bandwidth, a la Xbone and Crystalwell (though the last comes in the form of L4).

I have my doubts they'll go with an octo core with graphics though. There doesn't really seem to be any point.

It would be a quad-module, not octo-core in the usual sense, so it's 8 INT units with 4 FP units.

But please explain why there's no point with making an 8-core APU. Because Microsoft, Sony and AMD seemed to think otherwise for the next-gen consoles.
 
Last edited by a moderator:
It's economics basically. Microsoft and Sony are only paying ~$80 for theirs but AMD needs to be selling high end FX's for $200+. Most of the market for $200+ chips will be using discrete graphics cards and the lack of L3 on a "quad module" APU would just hurt x86 performance. AMD isn't in the position to blow transistor and power budget on graphics in this kind of chip.

It's just about possible we could see 6 cores or 3 modules + graphics at some point down the line.
 
It's economics basically. Microsoft and Sony are only paying ~$80 for theirs but AMD needs to be selling high end FX's for $200+. Most of the market for $200+ chips will be using discrete graphics cards (...)

I disagree. A stand-alone, replaceable APU with a performance that matches/surpasses the next-gen consoles and could fit in a Mini-ITX board with a low-profile case (no discrete card) is what Valve envisions for the future of PCs and their "Steam Box".
Big ATX cases with full-size motherboards and large discrete graphics cards will be pushed to a niche even faster during this gen, IMO.


It's just about possible we could see 6 cores or 3 modules + graphics at some point down the line.

I still don't get your thing with finding 3 modules reasonable but 4 modules "impossible".
The next-gen consoles will push for engines taking advantage of multi-core CPUs a lot more than last gen, and the next-gen consoles will both support 8 cores.
 
Well, a quad core Kaveri APU should be able to match the consoles already. 8 Jaguar cores at 1.6 GHz aren't exactly giving earth shattering cpu performance - it's ~3GHz i3 level. If AMD can't match that with Kaveri they might as well just lay off their big core division already.

I'm not saying an octo core with graphics is impossible, I'm just saying that when you have a performance deficit to make up vs your competition, the last thing you need to be doing is blowing the transistor and power budget on a large graphics portion that isn't really powerful enough to suit the intended ~$200 market. They need to be getting as much performance out of the CPU as possible.
 
CPU is the weakness, but still the temptation is quite strong. GDDR5 and Opteron-ish quad channel could enable integrated Pitcairn class performance. If it is possible in 2014 most of people still using discrete graphics can switch to APU.
 
Well, a quad core Kaveri APU should be able to match the consoles already. 8 Jaguar cores at 1.6 GHz aren't exactly giving earth shattering cpu performance - it's ~3GHz i3 level. If AMD can't match that with Kaveri they might as well just lay off their big core division already.

Matching the theoretical performance may not be enough for the advantage of low-level optimization in consoles.. Even if everyone is using x86 probably with SSE4.2 and AVX..

I'm not saying an octo core with graphics is impossible, I'm just saying that when you have a performance deficit to make up vs your competition, the last thing you need to be doing is blowing the transistor and power budget on a large graphics portion that isn't really powerful enough to suit the intended ~$200 market. They need to be getting as much performance out of the CPU as possible.

As I said, the evolution in the PC market is showing much more promising future (profitable, higher volumes, etc.) for high-performance APUs than big fat CPUs that need discrete GPUs to display graphics.

Not a single laptop is sold without an APU nowadays. I'm pretty sure that most laptop manufacturers would prefer to pay a bit more for a higher performing iGPU that matches a mid-range discrete GPU, so they can make smaller PCBs in order to fit larger batteries and/or a smaller chassis.
Intel knows this, which is why they're charging an arm and a leg for any of the Haswells with Crystalwell.

In the desktop space, take away the enthusiast gamers and you'll see many people who would settle for a very small, living-room friendly PC.
 
It is unclear which strategic path AMD chose wrt bandwidth constrains.
I hoped for a long while that they were to use GDDR5m with Kaveri, it seems that it won't happen anytime soon. It is sad as a quick read of a HD7750 DDR3 powered review show how bad it is => pretty much a waste of silicon.
Hybrid memory cubes should be available next year though they require (if I get it right) a rework of the memory subsystem (iirc the memory controller is off chip /bottom layer of the HMC). Overall it is not really "pc like", gddr5m fits the picture better. Price is unknown, could it be costly?
GDDR5m may be costly require different mobo and the roadmap is unclear (to me at least).

Overall when I look at the size of AMD APU vs one on side Intel chips and something like the XO, it makes me wonder if the best solution would be something akin to what we are seeing in the XO.
There is no need to push further than 4 big cores so the area devoted to the CPU in 20nm (and further) APU should go down, on the other hand allocating it to the GPU with the bandwidth necessary to feed it is wasteful.
A "positive" of this approach is that it doesn't eat too hard into their discrete market, with the area taken by cpu going down and HMC they would have lot of room for pretty solid GPU, lower the sale of their discrete lines which might be more profitable.
 
Last edited by a moderator:
I do wonder what features upcoming FM3 will bring and when.
It was leaked that BIOS guide for Kaveri mentions 4 64bit DTC's pointing to 256bit memory interface. Either that or 2xDDR3 and 2xDDR5 are implemented.

Then there is this official communication from AMD:
AMD's ‘Kaveri’ high-performance APU remains on track and will start shipping to customers in Q4 2013, with first public availability in the desktop component channel very early in Q1 2014. ‘Kaveri’ features up to four ‘Steamroller’ x86 cores, major heterogeneous computing enhancements, and a discrete-level Graphics Core Next (GCN) implementation – AMD’s first high-performance APU to offer GCN. ‘Kaveri’ will be initially offered in the FM2+ package for desktop PCs. Mobile ‘Kaveri’ products will be available later in the first half of 2014.
 
A "positive" of this approach is that it doesn't eat too hard into their discrete market, with the area taken by cpu going down and HMC they would have lot of room for pretty solid GPU, lower the sale of their discrete lines which might be more profitable.

Integration enables higher profits by improved efficiency and cutting off other vendors. Plus in this particular case, so far it is the segment where AMD can be technology leader relatively easily. I think there is more profit in APUs.
 
http://www.phoronix.com/scan.php?page=article&item=intel_broadwell_linux&num=1

Ben Widawsky in publishing the initial Broadwell support said, "Broadwell represents the next generation (GEN8) in Intel graphics processing hardware. Broadwell graphics bring some of the biggest changes we've seen on the execution and memory management side of the GPU. There are equally large and exciting changes for the userspace drivers."

Ben additionally said that the eigth-generation Broadwell graphics "dwarf any other silicon iteration during my tenure, and certainly can compete with the likes of the gen3->gen4 changes."

At a technical level, some of the architectural changes going from Haswell to Broadwell graphics include the silicon no longer supporting a force-wake write FIFO but most writes need to explicitly wake the GPU, interrupt registers have been completely reorganized, PTEs format and cachability settings have been changed to more closely resemble x86 PTEs/PAT, the address space has increased, and page table structures have changed for the per-process GTT.
 
Back
Top