Haswell vs Kaveri

With 65W it could suite in DTR devices (>=16" and 3kg). But probably Intel would not ship to vendors with such plans.
 
L4 cache starts at $468, AMD isn't competing with it (unless you include discrete, which isn't what we're talking about in this thread).
This thread is about Haswell vs Kaveri. Both are highly bandwidth starved APUs. This is the perfect thread to discuss APU bandwidth bottlenecks. Big L4 cache is the solution Intel chose to overcome the bandwidth issue. Microsoft (with AMD) chose a similar solution for Xbox One APU (ESRAM). Sony (with AMD) instead chose to use GDDR5 in their APU to provide the bandwidth needed. The current (high) price of Haswell with L4 is caused by lack of competition. If competition shows up, the price will go down.

It's definitely interesting to see what kind of solution AMD uses for Kaveri and for their future APUs. It's perfectly valid to assume that AMD wants to keep the GPU performance crown in APUs, as that's what they have always done. If AMD aims for the GPU performance crown, they must have a solution that provides better bandwidth than Haswell's L4. There's no way around this. Now now and not in the future.

Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller. Nehalem and Westmere did this without considerable cost issues (cheapest Nehalems with triple channel memory controllers had launch price of ~$150). I am quite sure this would be much cheaper solution to the problem than a huge 128 MB L4 cache... but of course not as energy efficient either.
 
Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller. Nehalem and Westmere did this without considerable cost issues (cheapest Nehalems with triple channel memory controllers had launch price of ~$150). I am quite sure this would be much cheaper solution to the problem than a huge 128 MB L4 cache... but of course not as energy efficient either.

The main downside here is that the cost of the motherboard would go up, and in any case the cost of the die would go up, even for low-end SKUs for which OEMs might not want to populate more than two channels.
 

Very good and insightful article, I must admit.

This thread is about Haswell vs Kaveri. Both are highly bandwidth starved APUs. This is the perfect thread to discuss APU bandwidth bottlenecks. Big L4 cache is the solution Intel chose to overcome the bandwidth issue. Microsoft (with AMD) chose a similar solution for Xbox One APU (ESRAM). Sony (with AMD) instead chose to use GDDR5 in their APU to provide the bandwidth needed. The current (high) price of Haswell with L4 is caused by lack of competition. If competition shows up, the price will go down

Apple wanted this particular solution, not Intel themselves. It is a normal supplier- customer relation where the bigger the customer, the more influence it has to the supplier (in that case Intel).
 
If AMD aims for the GPU performance crown, they must have a solution that provides better bandwidth than Haswell's L4. There's no way around this. Now now and not in the future.

Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller.

Agreed, 2x64bit DDR3 2133 is going to limit kaveri.

Often though that triple channel memory would be the best solution for the DDR3 generation for AMD, and provide a common platform for more core-heavy/shader-light high-end products.

there isn't going to be an AM4, so if there is any future for high(ish) end chips from AMD it will be leveraging their mainstream platform.
 
Very good and insightful article, I must admit.



Apple wanted this particular solution, not Intel themselves. It is a normal supplier- customer relation where the bigger the customer, the more influence it has to the supplier (in that case Intel).

Source?
 
If triple channel is done you would probably have to have a new, bigger socket coexisting with the FM2+ one. i.e. back to the socket 754 vs 939 days.
 
i realise its a bit late to expect that now that we have FM2+, just saying i thought it would be the most sensible direction to go in.
 
Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller. Nehalem and Westmere did this without considerable cost issues (cheapest Nehalems with triple channel memory controllers had launch price of ~$150). I am quite sure this would be much cheaper solution to the problem than a huge 128 MB L4 cache... but of course not as energy efficient either.
So there's obviously quad-channel desktop sockets on the Intel side (and presumably opterons as well?), and some even not completely unreasonably priced (i.e. similar to the dual channel ones). I'm not sure this would be particularly a good plan for mobile though, both in terms of size and power. I'm also not totally convinced it would end up much cheaper once you factor in the extra DIMMs and socket complexity though. Obviously it's hard to speculate on manufacturing costs separate from the retail costs of the various parts though, although the latter is not too relevant to architectural discussions.

I was originally quite skeptical of the "big cache" plan, but it works incredibly well in practice (way better than a simple framebuffer scratch pad like Xbox, and obviously a lot larger). Thus I'm not really sure what direction things end up taking in the long run... it seems like both Intel and AMD aren't lining up to put high-bandwidth solutions in the majority of iGPUs, but perhaps it creeping in on the high-end will cause some trickle-down in future generations as even the "smaller" iGPUs get faster.
 
Last edited by a moderator:
fuboi was obviously talking about mobile segment, because in desktop HD 8670D is competitive to Iris Pro.
 
fuboi was obviously talking about mobile segment, because in desktop HD 8670D is competitive to Iris Pro.

(Back from the beach) Yeah I was. It was a brain fart, I was thinking about APUs, and I forgot the riff-raff buys APUs for desktops ;). And I didn't know about hd8670, nice. Off-topic: After checking a review HD8670 gets beaten by a lowly gt640, meh.
 
What about GDDR5M?
http://www.hardware.fr/news/13287/gddr5m-4-gbps-sodimm-kaveri.html

News is it's on a SO-DIMM format, chip's interface is 16bit instead of 32bit which allows to double capacity, and a 64bit SO-DIMM in clamshell can have 4GB. Up to 8GB total in dual channel, clock is lower than on a graphics card because signal's timings have to be looser to work reliably, gives 4.0 Gbps.

So, if Kaveri gets that it's 64GB/s memory bandwith, versus 34GB/s for ddr3 2133 (all theoretical figures)
 
GDDR5M is old news, we discussed about this in the past. AMD denied GDDR5 already and in the manual it disappeared.
 
Did they deny GDDR5, but did not deny GDDR5M? And would be burying it to not create an Osborne effect, assuming it gets launched later.

Else I can buy the fact it's DDR3 only if so be it.
 
From what I gather reading another forum posts by The Stilt Kaveri has 256-bit memory controller which can work as 1-256bit or 4-64bit. Question is, will AMD release desktop or laptop motherboards with full memory interface width available? Will it be FM2+ or maybe FM3 socket? Last option is that there is a bug and AMD will sit on it till next revision same as they did with first Phenom II 940 which had DDR3 interface in silicon but fused off.

Just to clarify, The Stilt is only hinting, but he has access to BIOS developement documentation for upcoming AMD processors and I suspect there are memory controller registers for 4 64bit channels described in them.
 
From what I gather reading another forum posts by The Stilt Kaveri has 256-bit memory controller which can work as 1-256bit or 4-64bit.
Operating it as one 256bit controller doesn't make any sense and would be very detrimental to the performance. It would basically mean that with the current 8n prefetch memory technologies (DDR3, DDR4 as well as GDDR5 use all 8n prefetch) one would have a 256 byte access granularity. Combined with 64byte sized cachelines, that's a recipe for very bad bandwidth efficiency outside of streaming applications. It's much better to be able to carry out four indepent memory accesses concurrently on the four channels.
 
I'm sure OEMs love the idea of paying more for ram(GDDR5M costs more than DDR3 i bet).

If they pay $30 more for the RAM and then $150 less for the APU (compared to an Intel equivalent), why not?
 
Operating it as one 256bit controller doesn't make any sense and would be very detrimental to the performance. It would basically mean that with the current 8n prefetch memory technologies (DDR3, DDR4 as well as GDDR5 use all 8n prefetch) one would have a 256 byte access granularity. Combined with 64byte sized cachelines, that's a recipe for very bad bandwidth efficiency outside of streaming applications. It's much better to be able to carry out four indepent memory accesses concurrently on the four channels.

That's true, but the option apparently is there as it currently is in all AMD processors (Ganged/Unganged modes).

Let me quote him this once (I hope he doesn't mind):
Currently all of the AMD CPUs & APUs use 2x 64-bit (unganged) mode by default so I see no reason why it would change.
Unganged DCTs are more flexible and possibly could be powered down in low power states.

Ganging up the DCTs would reduce the congestion in certain rare scenarios.
In case of 4x 64-bit DCTs these scenarios will never occur due the monstrous bandwidth.
At DDR-1600 and above the bandwidth will never be fully saturated anyway.

If we assume that the SR will indeed feature a quad channel dram interface, of course
wink.gif

Link
 
Back
Top