Haswell vs Kaveri

According to the numbers in the same review, 1600->1866 is a decent jump but 1866->2133 isn't huge.
I admit this is a bit puzzling. 1600->1866 is not just a "decent" jump from those numbers as on average it is near perfect scaling (one benchmark showing not that much gain, another showing superlinear scaling, the rest pretty much shows a 17% increase in line with the bandwidth increase), but 1866->2133 showing only a very modest increase. Now certainly you'd expect scaling to get worse once you leave the more or less fully bandwidth limited area, but not to that degree. Probably need to wait for some results from more reputable sites to draw any conclusions...
 
Laptops with embedded 2GB DDR3 that only the GPU can access?

Or it could simply be broken and Kaveri will never, ever use the third and fourth channels.
It looks like a big waste of transistors and die area, though.

There were some rumors about Kaveri supporting GDDR5 in the past, could the "extra memory controller" be related to that?
 
Laptops with embedded 2GB DDR3 that only the GPU can access?
I wouldn't think it would have to be GPU-only. Just have some RAM on the motherboard working on two channels and the DIMMs on the other two.

Also, some 17" notebooks have 4 SODIMM slots.
 
Has there been any discussion on the possible improvements to the APU/dGPU Crossfire "Dual Graphics" performance with Kaveri?

Was the decision not to use GDDR5 an economic or architecture decision? If economic, and the die space does exist, perhaps a higher price point Kaveri for desktop in the future?
 
Now that we have 4 DRAM channels confirmed, the most likely answer is that they wanted this instead for the server derivatives of Kaveri. Additionally adding GDDR5 interfaces would have made the chip too big.

Let's see if we'll have a 4 channel chip on the desktop, but i wouldn't hold my breath.
 
Now that we have 4 DRAM channels confirmed, the most likely answer is that they wanted this instead for the server derivatives of Kaveri. Additionally adding GDDR5 interfaces would have made the chip too big.

Let's see if we'll have a 4 channel chip on the desktop, but i wouldn't hold my breath.

We already know Kaveri's 1P server derivative Berlin is only using a dual channel DDR3 memory interface.
The 2P and 4P markets will continue to be served by minor revisions of the current Abu Dhabi and Seoul piledriver-based chips until 2015.
 
We also had few benchmarks leaked from SiSoftSandra showing prototype Kaveri system spotting 4x64bit memory interface which I linked to long time ago.
This means such a system is capable of running Windows and benchmarks, so I do expect AMD to follow up on FM2+ fairly shortly.

I can think of few alternative explanations for Sandra quad channel leak:
- software reading all four DCT's while phisically only 2 were active (unlikely as there also were Kaveri leaks with 128bit memory bus)
- current revision is not fully stable in quad channel and we might get new revision fixing it for new platform or never at all because Excavator is priority
 
Maybe a graphics card like BGA version with onboard memory is in the works. Sockets and dimm slots seem like needless things for non-enthusiast consumer PCs.
 
AMD didn't comment on this at all during the formal Kaveri announcement.

From here, the second pair of memory channels is either dormant or broken/dead.
If it's dead, story ends here.
If it's dormant, AMD will still announce new platforms that will greatly enhance the Kaveri's iGPU performance.

I wouldn't take those 1600-1866-2133 benchmarks as an indication of anything, really.
 
If it's dormant, AMD will still announce new platforms that will greatly enhance the Kaveri's iGPU performance.

This would be significant for AMD as they attempt to move up the PC food chain with the APU and would at least explain the attempt...if they are indeed dead. A more powerful 4 channel memory APU and much improved "Dual Graphics" iGPU/dGPU Xfire could have taken significant share from Intel and dGPU from Nvidia...and still might if a refresh is coming soon.
 
Last edited by a moderator:
I always thought the PHYs were required to be on the outside of the chip due to the i/o pads being there, so how is that supposed to work?

The PHY section is still on the edge of the chip. The only thing between the inner PHY and the edge is more PHY, and an interface layered like this can also be found for chips like Sandy Bridge EP or the XBox One APU.
The routing necessary is likely more complex (edit: the routing from the PHY to the pads), but it is doable. Since it's not mingling regular silicon with IO, the connections for the inner PHY at least have common routing goals with the outer PHY, and they won't interfere with the logic that is on the inside of the chip.

Granted, if there are additional interfaces on the chip, the expectation would be that there would be an occasion to use them.
 
Last edited by a moderator:
Yeah well, it has gotten a lot better in the past few years once Intel started taking it seriously :)
Just a quick update. I didn't manage to install Windows yet, but with the help of a colleague of mine who has a 2013 MacBook Air, at least I was able to determine the difference in performance between the HD5000 and the HD5100. I used Unigine Heaven 4. Quality set to high, no tassellation, no AA; resolution 1280x800 (half the retina display)

Macbook Air (Core i5-4250U @ 1.9 GHz): Min 11.8 Avg 15.7 Score 396
Macbook Pro (Core i5-4258U @ 2.4 GHz): Min 13.1 Avg 19.0 Score 472

So you were right; the HD5100 is between 10 and 20% faster. The temp of the CPU in the Air reached 76° C after a few test runs while the CPU in the Pro soared up to 83°C but, after a while, the fan kicked in at a higher speed and the temp settled down to 78° C.
 
The PHY section is still on the edge of the chip. The only thing between the inner PHY and the edge is more PHY, and an interface layered like this can also be found for chips like Sandy Bridge EP or the XBox One APU.
The routing necessary is likely more complex (edit: the routing from the PHY to the pads), but it is doable. Since it's not mingling regular silicon with IO, the connections for the inner PHY at least have common routing goals with the outer PHY, and they won't interfere with the logic that is on the inside of the chip.

Granted, if there are additional interfaces on the chip, the expectation would be that there would be an occasion to use them.

Come on, why don't you try your hand at the guessing game? :)
 
The PHY section is still on the edge of the chip. The only thing between the inner PHY and the edge is more PHY, and an interface layered like this can also be found for chips like Sandy Bridge EP or the XBox One APU.
The routing necessary is likely more complex (edit: the routing from the PHY to the pads), but it is doable. Since it's not mingling regular silicon with IO, the connections for the inner PHY at least have common routing goals with the outer PHY, and they won't interfere with the logic that is on the inside of the chip.
Ok I missed that Sandy Bridge EP looks just the same indeed. I guess I got too used at looking at gpu die shots which go to great lengths to put all the PHYs at edges - sometimes on 3 sides out of 4 :).

Granted, if there are additional interfaces on the chip, the expectation would be that there would be an occasion to use them.
Yes that is indeed strange. But with no new sockets announced you have to wonder... Or looking at those kernel patches which seem to indicate there won't ever be Kaveri's which have all memory channels activated... Dark silicon FTW :).
 
So you were right; the HD5100 is between 10 and 20% faster. The temp of the CPU in the Air reached 76° C after a few test runs while the CPU in the Pro soared up to 83°C but, after a while, the fan kicked in at a higher speed and the temp settled down to 78° C.
Neat, thanks for the results. The interesting thing is that in terms of the graphics hardware, they are both effective the same thing (2 slices, GT3)... and indeed they are also the same as Iris Pro. The differences in performance come entirely from power envelope, thermal dissipation (form factor) and in the case of Iris Pro, the on-chip EDRAM.

Thus the performance delta also depends on CPU use and such during the benchmark. Higher CPU usage will tend to separate the chips with greater power budgets further (i.e. your 5100 will likely look relatively even better than the MBA in a game w/ heavy CPU utilization). All of these chips are heavily power-constrained which is why the most interesting comparisons to Kaveri will be iso-TDP/form-factor. Previous APUs didn't really reach down into the power budgets of Haswell significantly, but Kaveri's lower TDPs should at least overlap the Iris Pro chips.
 
Come on, why don't you try your hand at the guessing game? :)

Assuming this really does have 256 bits worth of DDR3 PHY:

The PHY are physically overengineered for an explicitly on-package or interposer-based memory situation.
If AMD were planning on some sort of 2.5D or on-package solution to the exclusion of a 4-channel socket or 4-channel BGA mounting, they could have reduced the physical connections. However, for the sake of risk management and flexbility, it might be better to have the capability of driving off-package channels and just not use it than be stuck without.

In the case of a 2-channel socket and on-package sideband memory, half of the PHY would be overprovisioned.
Even so, it could also be possible for the sake of flexibility or copy-pasting that the design would just double an already existing PHY rather than make a smaller interface right next to the larger one.
The savings would likely be minimal because the bigger PHY would just force an area of dead silicon if the smaller interface had to sit next to it.

At any rate, the pictured chip packages don't seem to provide enough room for any sort of packaged setup at the outset, and I'm not sure on there being enough pins for a socketed FM2 quad channel solution, since it is about a thousand pins short of known quad-channel AMD sockets.


Kaveri at least initially only has DCT 0 and DCT3 active. I'm not sure how the numbering maps to the physical PHY. My first instinct is to think half of each PHY array is active, due to a quirk in how AMD has split its memory controllers. That's hardly anything beyond uneducated speculation, as I don't know how AMD physically places or connects them.

Without a socket, a package could get the necessary ball count for quad channel.
Possibly, the chip has the necessary routing from the PHYs to pads for all channels, with half of those links not connected or otherwise off when in a socket format and available for use if FCBGA.
Without that, it would require a different chip revision to make use of them. The lack of mention of a quad-channel at this point might mean the latter scenario is possible.

Extra channels could bring bandwidth benefits, or a hedge if Iris turned out to be better than Kaveri.
The modest benefits from bandwidth changes in the leaked benchmarks might mean this is more of a capacity play if the chip were to interface with a point to point memory standard like GDDR5 (if those rumors were correct) or if a descendant were to work with DDR4.
Extra channels could translate into higher capacity than dual-channel solutions. This might matter for a cloud server solution, where AMD Opterons can still be interesting because they can address more dense DIMMs.
Having channels to spare might allow this limited benefit to continue in a future revision, or if AMD needs to extend the life of its DDR3 cloud chips should the DDR4 chips slip.
 
This feels just like Deneb C2 which launched as Phenom II 940 BE with only DDR2 support and later was re-launched as Phenom II 945 supporting both DDR2 and DDR3 sockets.

Was it also true that 65nm Agena (Phenom I) had DDR3 controller already implemented but it never was validated? (CPU registry contained values for DDR3 operation if I remember correctly)
 
Back
Top