Haswell vs Kaveri

And yet, would be epic badassery if there's 512MBytes of ram stacked on the die at full clock and a fat connection. I'm not counting on that, though...

That would be awesome. Though as I think about it, I'm starting to see the appeal of using SRAM even if it was only 64MB. Would essentially be a huge L4 cache that could be shared with the CPU/GPU, and could be used as a framebuffer the way Xbox360 uses its eDRAM. The low latency and huge bandwidth could make for some interesting efficiency gains in the pipeline...
 
Bear in mind that we're talking about a mainstream, notebook-oriented part, so cost and power are important concerns.
 
Really? Seems tiny if its DRAM on an interposer, I wouldn't think they would need an interposer at all if it was only 64MB DRAM. I mean, in 2013, they should have 4gbit DRAM chips, and 2gbit has a raw die size of ~55mm^2 nowadays IIRC? Thats gotta be below 10mm^2 for 64MB DRAM, whats the interposer for?

Unless of course this info is completely wrong and its much more than 64MB... I was expecting 512MB or 1GB stacked next to the die, personally.
I was expecting a much larger amount as well. But SRAM seems too costly for that, even if made on an older process. DRAM or even eDRAM looks more likely than SRAM, IMO.

They seem to have traded off capacity for much larger bandwidth. I hope they are able to hit ~100GBps, if they are going with only 64MB. The comment over on die framebuffers to feed the display during low power states is also interesting. They would need ~8MB just for that. I am not sure where they will find room for that given 32nm duals have 4MB L3 total.
 
hkyJF.jpg


Haswell most likely to have 50% more ALUs, according to the die-shot from an early running sample.
 
Last edited by a moderator:
Sorry but I don't understwnd that language, what does they said?

Amd bought a memory company not that long ago they could produce gddr5 modules.

I don't think all the memiry would be gdd5 but a module could be gddr5.
basivally kaveri could support two 128 bit buses:eek:n for gddr5 and one the ddr3.

Two buses wouldn't make sense for a consumer level part like Kaveri, it'd be very expensive and most consumers don't need the flexibility of being able to upgrade.

They should just make Kaveri boards w/ GDDR5 soldered on (much like a graphics board) and they'd much better memory bandwidth by using a fat bus and a memory controller similar to what they use in GPUs instead of using a bottlenecked dimm interface. I can see lower end parts having 8 GB and higher end parts 16 GB w/ 100+ GB/s compared to ~30 GB/s today.
 
Well Density for GDDR5 memory chips is not the same as for DDR3. Is more than 4GB doable? Which bus width?
Extra latencies would impact negatively the CPU performance, no?

Two buses would fit easily in llano/trinity. Even a 64Bits bus would helps granting the chip with extra ~30GB/s of bandwidth.
 
Sounds fake, if you ask me.

7970 is <4T today. And it eats >200W.

EDIT: surreal ->fake

The 7970M (Pitcairn) gets 2,176 GFLOPS in 100W, on 28nm, or 21.76 GFLOPS/W.

Ideal scaling would take that to 43.52 GFLOPS/W on 20nm, and 87.04 on 14nm.

So you'd only need <12W to get to 1TFLOPS on 14nm. Of course, ideal scaling is a pipe dream these days, but with a power budget of ~75W for a desktop chip and full, aggressive bi-directional power management, it seems doable.

They'd need to do something big about memory bandwidth, though.
 
The 7970M (Pitcairn) gets 2,176 GFLOPS in 100W, on 28nm, or 21.76 GFLOPS/W.

Ideal scaling would take that to 43.52 GFLOPS/W on 20nm, and 87.04 on 14nm.

So you'd only need <12W to get to 1TFLOPS on 14nm. Of course, ideal scaling is a pipe dream these days, but with a power budget of ~75W for a desktop chip and full, aggressive bi-directional power management, it seems doable.

They'd need to do something big about memory bandwidth, though.

I thought the high GT parts were for mobile.
 
The 7970M (Pitcairn) gets 2,176 GFLOPS in 100W, on 28nm, or 21.76 GFLOPS/W.

Ideal scaling would take that to 43.52 GFLOPS/W on 20nm, and 87.04 on 14nm.

So you'd only need <12W to get to 1TFLOPS on 14nm. Of course, ideal scaling is a pipe dream these days, but with a power budget of ~75W for a desktop chip and full, aggressive bi-directional power management, it seems doable.

They'd need to do something big about memory bandwidth, though.

The interposer tech is supposed to be on the way with Haswell. Let's hope they can provide lots of bandwidth by then.

GPU performance of that order could eliminate the need for discrete GPUs altogether, even for hi DPI displays.
 
The 7970M (Pitcairn) gets 2,176 GFLOPS in 100W, on 28nm, or 21.76 GFLOPS/W.

Ideal scaling would take that to 43.52 GFLOPS/W on 20nm, and 87.04 on 14nm.

So you'd only need <12W to get to 1TFLOPS on 14nm. Of course, ideal scaling is a pipe dream these days, but with a power budget of ~75W for a desktop chip and full, aggressive bi-directional power management, it seems doable.

They'd need to do something big about memory bandwidth, though.

Remember you're comparing 28nm TSMC to 22nm Intel. In that situation I wouldn't be surprised if Intel could have ideal scaling compared to the TSMC manufactured stuff.

Memory bandwidth is always the larger problem with iGPU, and it isn't clear how (or if) there is a plan to solve that. Although I hear there may be some crazy EDRAM type stuff coming down the pipe. The question then is how to make use of it staying in the D3D API, or if each iGPU architecture will require custom code to make use of it.
 
I was expecting a much larger amount as well. But SRAM seems too costly for that, even if made on an older process. DRAM or even eDRAM looks more likely than SRAM, IMO.

They seem to have traded off capacity for much larger bandwidth. I hope they are able to hit ~100GBps, if they are going with only 64MB. The comment over on die framebuffers to feed the display during low power states is also interesting. They would need ~8MB just for that. I am not sure where they will find room for that given 32nm duals have 4MB L3 total.

with an interposer the ram chip has to be customly built I'd think. so the density of what you have readily available with ddr3 and ddr4 is less relevant.

I think Intel kept things simple for themselves, they are strong at making SRAM and it's done at every process, it has to be low cost too and released quickly while being some of the latest tech, maybe this is the first memory-on-interposer mass product?
 
Two buses wouldn't make sense for a consumer level part like Kaveri, it'd be very expensive and most consumers don't need the flexibility of being able to upgrade.

They should just make Kaveri boards w/ GDDR5 soldered on (much like a graphics board) and they'd much better memory bandwidth by using a fat bus and a memory controller similar to what they use in GPUs instead of using a bottlenecked dimm interface. I can see lower end parts having 8 GB and higher end parts 16 GB w/ 100+ GB/s compared to ~30 GB/s today.

I could see this on mid range, high end laptops, and then on a few micro ATX, mini ITX mobos with a generous (for the seller) price tag.

two buses would just make your chip and socket much more expensive, such wide CPU are made and sold on socket G34 and 2011.
a memory controller, or a pair of 64bit ones that support both gddr5 and ddr3 would be more reasonable and is like the CPU I'm using, which supports both ddr2 and ddr3.
 
with an interposer the ram chip has to be customly built I'd think. so the density of what you have readily available with ddr3 and ddr4 is less relevant.

I think Intel kept things simple for themselves, they are strong at making SRAM and it's done at every process, it has to be low cost too and released quickly while being some of the latest tech, maybe this is the first memory-on-interposer mass product?

I am pretty sure wherever this memory is going to be, it's going to be some sort of dram.
 
Ideal scaling would take that to 43.52 GFLOPS/W on 20nm, and 87.04 on 14nm.

Ideal scaling died years ago. Since few process generations ago, Intel claimed ~20% increase in performance OR >30% reduction of power usage at same performance. It's likely same for other vendors as well.

I don't even know how Haswell will improve performance even by 2x on the 15W Ultrabook parts. If Anandtech's measurements are accurate, it takes ~4W for the CPU core, 9W for iGPU and 4W for the rest of the CPU(in typical games). If you want to bring that down to 15W, and double iGPU performance you need:

18W performance down to 7W, or 2.6x the improvement in perf/watt. Ivy Bridge is said by Intel to use 2x performance/watt at same performance level as Sandy Bridge, but does not reach the same 2x perf/watt at greater performance. Obviously because they had to use a combination of lowered clocks and process tech advancement to achieve 0.5x the power usage. How will they do that on higher performance?
 
Back
Top