Haswell vs Kaveri

R5000 to R6000 was a minor update from VLIW5 to VLIW4. Even Gen6>Gen7 brought much bigger changes.

This
bring some of the biggest changes we've seen on the execution and memory management side of the
GPU
and this
the amount of changes on BDW dwarfs any other silicon iteration during my tenure
and this

for actually making mesa work with the absurd amount of changes on the EUs
doesn't sound like something minor you want to imply. And by the way wrong thread. We don't have a Broadwell thread yet I know.
 
R5000 to R6000 was a minor update from VLIW5 to VLIW4. Even Gen6>Gen7 brought much bigger changes.
Just like Alexko said, when I said r5xx->r6xx I meant it (if you're talking technical things, you really don't care about marketing names, which sometimes don't align with architecture generations at all).


doesn't sound like something minor you want to imply
I am _not_ implying these changes are minor. Just that they aren't quite as big as gen3->gen4 was. Because if you look at these two, you'd have trouble figuring out those two archs are somehow related at all. I don't dispute that the changes may be bigger than anything else since gen4.
 
So Haswell -> Broadwell will be a bigger change than Sandy Bridge -> Ivy Bridge?
 
I am _not_ implying these changes are minor. Just that they aren't quite as big as gen3->gen4 was. Because if you look at these two, you'd have trouble figuring out those two archs are somehow related at all. I don't dispute that the changes may be bigger than anything else since gen4.

With gen3->gen4 you mean GMA3000>GMA4000? Can you explain more why you think so? You can judge about all Gen8 changes and improvements from these incomplete mesa code?


So Haswell -> Broadwell will be a bigger change than Sandy Bridge -> Ivy Bridge?

At least he claims it.
 
With gen3->gen4 you mean GMA3000>GMA4000? Can you explain more why you think so?
Gen3 was what was in i915, i945 chipset. Usually called gma900, gma950, but also things like gma3150 and IIRC gma3000 indeed (but NOT gma X3000, what did I just say about marketing names...). In other words, a dx9 capable architecture, with no vertex shader units at all. Gen4 was i965 chipset, whose original name was GMA X3000, but there's other chipsets sailing under gen4 (usually called gen4x as they are not 100% identical though it was mostly bug fixes), g35/g45 come to mind.
You can judge about all Gen8 changes and improvements from these incomplete mesa code?
No. But you can see from that code that it is still somewhat similar to gen7. Try looking at gen3 code and find similarities there...
 
Man, Intel has made so much progress on the graphics front in the last 5 years. The GMA9XX was the most terrible, awful thing you can imagine. It was barely serviceable even for light office use.
 
It is unclear which strategic path AMD chose wrt bandwidth constrains.
I hoped for a long while that they were to use GDDR5m with Kaveri, it seems that it won't happen anytime soon. It is sad as a quick read of a HD7750 DDR3 powered review show how bad it is => pretty much a waste of silicon.
Hybrid memory cubes should be available next year though they require (if I get it right) a rework of the memory subsystem (iirc the memory controller is off chip /bottom layer of the HMC). Overall it is not really "pc like", gddr5m fits the picture better. Price is unknown, could it be costly?
GDDR5m may be costly require different mobo and the roadmap is unclear (to me at least).

It is a major change from today's paradigm, but if it offers comparable latency and much better power characteristics, density, and bandwidth, HMC sounds very very desirable. It's been a while since memory has jumped in performance as much CPUs or GPUs have over the past two decades.

We're seeing adoption of this already: Power 8's architecture is specifically designed for this new wave even though it won't initially use HMC, and each of Power 8's off die Centaur memory controllers has the added benefit of acting like an L4 as well. Instead of the 8 controllers for the big iron Power 8, one or two similar controllers would probably suffice for a consumer part since stacked ram density should be high enough. You could also imagine the interface eventually being entirely on one package or on top of the APU die with a fixed amount of stacked dram, either acting as the main memory or if density is insufficient, another tier of memory between the APU and the DRAM (see the RSX die for PS3 Slim).

I think the cost of having all the ram on package should be less than the cost of adding a pair of Dimms even if it is based on HMC technology. I guess Intel's Iris Pro's lot prices don't attest to this quite yet but judging by the price of the Macbook Air, they're just ratcheting up prices for people who don't make sized major orders. I agree that such technology won't make it to market if it doesn't have compelling cost to benefit in today's business environment.
 
Last edited by a moderator:
Does anyone know if it's expected get details about Kaveri during AMD's developer summit?
 
http://www.computerbase.de/news/2013-11/amds-kaveri-ab-14.-januar-2014-mit-856-gflops/


856 Gflops for the whole APU, means ~750 for the GPU alone. What happened to the 1050 Gflops claim from AMD?

Perhaps this is a cut back version of Kaveri with only 512 shaders. I'll make the wild guess that they originally planned something with 768 shaders on a different socket or BGA with GDDR5, but cut down on execution risk by using their existing socket. The limited memory bandwidth would have meant that 768 shaders wouldn't have been properly fed so they went with 512 instead.
 
Last edited by a moderator:
The Clocks are lower than expected, especially for the GPU. 720 Mhz according to the last slide.

Glofo a shit.

Without a TDP being given we cannot determine that.

If AMD has decided that the target market for such a chip primarily wants 45W APUs at the expense of some performance then the lowered CPU and GPU clocks make a lot more sense.
 
856 Gflops for the whole APU, means ~750 for the GPU alone. What happened to the 1050 Gflops claim from AMD?
Only ~100gflops for the CPU? No AVX2/FMA? I guess it's only 2 "modules" of FP compute, but that's still well lower than a dual core Haswell.

If you add the theoretical peak of a 4770R you get something like:
CPU: 32 flops/cycle * 4 cores * 3.9Ghz = 499 flops/s
GPU: 16 flops/eu/cycle * 40 eus * 1.3Ghz = 832 flops/s
Total: 1331 flops/s
Now even at 65W it probably can't maintain those clock speeds with everything powered up, but we're talking theoretical here to start with.

And of course raw flops isn't everything (or even much) but the 100 gflops for the CPU is surprising to me. Makes it a bit more obvious why they are so driven to offload stuff to the GPU. 7:1 ratio is a bit more serious than 2:1 :)

Overall this seems well south of the next generation consoles too. I expected something a bit closer as a flagship to be honest, even taking into account the bigger cores.
 
Well there are two FMA-capable 128-bit FPUs per module, so four in total. 4×4 FMAs = 16 FMAs = 32 FLOP/cycle.

32×3.5GHz (reasonable clock speed assumption) = 112 GFLOPS.
 
FINAL_Lisa_Opening_Keynote_Draft_-_v12.1tb.pdf
tPnWWpg.png
 
Did I miss the link to the full slide deck? Or are all you guys just at the conference? :) If the latter, anything else interesting yet?
 
Well there are two FMA-capable 128-bit FPUs per module, so four in total. 4×4 FMAs = 16 FMAs = 32 FLOP/cycle.

32×3.5GHz (reasonable clock speed assumption) = 112 GFLOPS.

I thought each module only had a single FPU unit ever since Bulldozer.

1TFLOPs shouldn't be hard to attain with the desktop chip. A ~18% overclock would reach that and the latest unlocked APUs tend to be easily pushed higher than that.
Performance advantage over Haswell should be maintained because of better foothold on drivers and developer relations, but this thing will end up competing with Broadwell most of its lifetime.

Still no news about how they're going to handle memory bandwidth, which makes us all think they'll just go with dual-channel DDR3 and call it a day.

Unless Kaveri miraculously lowers power consumption relatively to its predecessor, it won't fit notebooks/ultrabooks, and with 2 CPU modules it doesn't seem to fit desktops that well either.
Either AMD has plans to convince game developers to make heavy use of the iGPU for GPGPU tasks (TressFX on the iGPU?) or I see Kaveri as a chip with no place where it really fits.
People who don't play games will settle for less and people who play games will need more. Kaveri ended up coming at the same time as the next-gen consoles so recommended spec requirements for games in the near future will soar.
 
Back
Top