This thread is about Haswell vs Kaveri. Both are highly bandwidth starved APUs. This is the perfect thread to discuss APU bandwidth bottlenecks. Big L4 cache is the solution Intel chose to overcome the bandwidth issue. Microsoft (with AMD) chose a similar solution for Xbox One APU (ESRAM). Sony (with AMD) instead chose to use GDDR5 in their APU to provide the bandwidth needed. The current (high) price of Haswell with L4 is caused by lack of competition. If competition shows up, the price will go down.L4 cache starts at $468, AMD isn't competing with it (unless you include discrete, which isn't what we're talking about in this thread).
Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller. Nehalem and Westmere did this without considerable cost issues (cheapest Nehalems with triple channel memory controllers had launch price of ~$150). I am quite sure this would be much cheaper solution to the problem than a huge 128 MB L4 cache... but of course not as energy efficient either.
This thread is about Haswell vs Kaveri. Both are highly bandwidth starved APUs. This is the perfect thread to discuss APU bandwidth bottlenecks. Big L4 cache is the solution Intel chose to overcome the bandwidth issue. Microsoft (with AMD) chose a similar solution for Xbox One APU (ESRAM). Sony (with AMD) instead chose to use GDDR5 in their APU to provide the bandwidth needed. The current (high) price of Haswell with L4 is caused by lack of competition. If competition shows up, the price will go down
If AMD aims for the GPU performance crown, they must have a solution that provides better bandwidth than Haswell's L4. There's no way around this. Now now and not in the future.
Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller.
Very good and insightful article, I must admit.
Apple wanted this particular solution, not Intel themselves. It is a normal supplier- customer relation where the bigger the customer, the more influence it has to the supplier (in that case Intel).
So there's obviously quad-channel desktop sockets on the Intel side (and presumably opterons as well?), and some even not completely unreasonably priced (i.e. similar to the dual channel ones). I'm not sure this would be particularly a good plan for mobile though, both in terms of size and power. I'm also not totally convinced it would end up much cheaper once you factor in the extra DIMMs and socket complexity though. Obviously it's hard to speculate on manufacturing costs separate from the retail costs of the various parts though, although the latter is not too relevant to architectural discussions.Simplest solution for the bandwidth problem would be to use triple channel (192 bit) DDR3 memory controller. Nehalem and Westmere did this without considerable cost issues (cheapest Nehalems with triple channel memory controllers had launch price of ~$150). I am quite sure this would be much cheaper solution to the problem than a huge 128 MB L4 cache... but of course not as energy efficient either.
i5-4570R isn't mobile CPU.
fuboi was obviously talking about mobile segment, because in desktop HD 8670D is competitive to Iris Pro.
Operating it as one 256bit controller doesn't make any sense and would be very detrimental to the performance. It would basically mean that with the current 8n prefetch memory technologies (DDR3, DDR4 as well as GDDR5 use all 8n prefetch) one would have a 256 byte access granularity. Combined with 64byte sized cachelines, that's a recipe for very bad bandwidth efficiency outside of streaming applications. It's much better to be able to carry out four indepent memory accesses concurrently on the four channels.From what I gather reading another forum posts by The Stilt Kaveri has 256-bit memory controller which can work as 1-256bit or 4-64bit.
I'm sure OEMs love the idea of paying more for ram(GDDR5M costs more than DDR3 i bet).
Operating it as one 256bit controller doesn't make any sense and would be very detrimental to the performance. It would basically mean that with the current 8n prefetch memory technologies (DDR3, DDR4 as well as GDDR5 use all 8n prefetch) one would have a 256 byte access granularity. Combined with 64byte sized cachelines, that's a recipe for very bad bandwidth efficiency outside of streaming applications. It's much better to be able to carry out four indepent memory accesses concurrently on the four channels.
Currently all of the AMD CPUs & APUs use 2x 64-bit (unganged) mode by default so I see no reason why it would change.
Unganged DCTs are more flexible and possibly could be powered down in low power states.
Ganging up the DCTs would reduce the congestion in certain rare scenarios.
In case of 4x 64-bit DCTs these scenarios will never occur due the monstrous bandwidth.
At DDR-1600 and above the bandwidth will never be fully saturated anyway.
If we assume that the SR will indeed feature a quad channel dram interface, of course