AMD Carrizo / Toronto

Yes, that is most likely related.

IMG0045881_1.jpg

http://www.hardware.fr/news/13971/amd-carrizo-carrizo-l-mi-2015.html

Carrizo-L apparently replaces the 20nm Nolan/Amur design we were expecting. That's rather disappointing. And it doesn't seem to support HSA.
 
Or the name just got changed into Carrizo-L for the Puma+ part since Carrizo-L has Puma+'s
 
Or the name just got changed into Carrizo-L for the Puma+ part since Carrizo-L has Puma+'s
'Skybridge-x86' was supposed to be on a 20nm process and have full HSA support. Carrizo-L has neither.

I suspect Carrizo-L is just 2014's Beema chip on a new package (with FP4's second memory channel pins left unused, if that is feasible).
 
In the Anandtech article, along with this slide, there are these two paragraphs:
One example of the efficiency improvement was provided by AMD’s Voltage Adaptive Operation. Rather than compensate for voltage variations which wastes energy, this technology takes the average operating voltage and detects when the voltage increases beyond a smaller margin. To compensate for this increase, the CPU speed is reduced until the voltage drops below the threshold and then the CPU speed is moved back up.
The changes in speed are designed to be so minute that it does not affect overall performance, however it might only take an errant voltage delivery component to consistently make the voltage go above that threshold, causing erratic slowdown that might be statistically significant. It will be interesting to see how AMD implements the latest version of this feature.
What I cannot understand is how this feature is going to cause a slowdown.
As I understand it, the point of this feature is that power supplies sometimes drop a bit below the requested voltage, and as a result chips usually run with a voltage margin, wasting power. In Carrizo, the margin is replaced with the chip slowing down a bit during those rare moments when the voltage drops for an instant.
So the result I see is actually the chip being faster on average, as less power is wasted and the average frequency can be increased.
Even in the case of systematically increased voltage (which seems rather impossible – VRMs probably have a feedback loop and are tested), there should only be slowdowns if the TDP is reached.
So what I am asking is what is wrong with my understanding?
 
So the result I see is actually the chip being faster on average, as less power is wasted and the average frequency can be increased.

Less energy is also wasted which allows for more performance in a restricted power envelope.

Cheers
 
'Skybridge-x86' was supposed to be on a 20nm process and have full HSA support. Carrizo-L has neither.

I suspect Carrizo-L is just 2014's Beema chip on a new package (with FP4's second memory channel pins left unused, if that is feasible).

Could Skybridge be coming 2H of 2014? 20nm doesn't seem to be primetime yet, so hopefully this is additional to and not instead of.
 
In the Anandtech article, along with this slide, there are these two paragraphs:

What I cannot understand is how this feature is going to cause a slowdown.
As I understand it, the point of this feature is that power supplies sometimes drop a bit below the requested voltage, and as a result chips usually run with a voltage margin, wasting power. In Carrizo, the margin is replaced with the chip slowing down a bit during those rare moments when the voltage drops for an instant.
So the result I see is actually the chip being faster on average, as less power is wasted and the average frequency can be increased.
Even in the case of systematically increased voltage (which seems rather impossible – VRMs probably have a feedback loop and are tested), there should only be slowdowns if the TDP is reached.
So what I am asking is what is wrong with my understanding?
I think that article got it backwards and that your understanding is more correct. I would characterize it as the design's realized standard clocks being higher than they would otherwise be due to the reduced amount of guard-banding for voltage and clock.
It wouldn't be for TDP, but for voltage droop, which can have serious consequences for the core's functional correctness far more quickly than it would matter for TDP. The logic runs the risk of just computing things wrong if not for measures like this or padding in the voltage floor.

I don't think the worry about errant voltage delivery if the platform is to spec is going to be a big deal, unless AMD skimps on the electrical specifications. If the power delivery is robust, this may allow a design to mildly tolerate events that might disrupt others.
The reduced average load might mean that over time AMD could increase the amount of logic in the chip, and thus raise the rate of voltage droop events to the point that the difference in sustained performance versus expected performance is more measurable.
 
Why is Kaveri so not-dense? It's not like it's revision 1 of this architecture....
 
Sounds like the result of this new high density circuit library they're using for their APUs, probably integration of techniques used for GPUs to APUs as it seems like other aspects such as the metal layers are also more GPU like. (There was this dramatic increase in the number of shaders from the Radeon 3800 to the 4800 that must've been the result of similar transitions in their standard circuit libraries.)
 
Is excavator still planed for desktop? Because from that slide, above 20w it's actually worse than steamroller. Or will there be a high performance process for excavator desktop instead of high density? If they go to high density route I don't see this architecture fit for desktop, unless they just ignore single threaded performance and go with many small cores.
 
Because from that slide, above 20w it's actually worse than steamroller.
That's per core pair. So 40W for a quad. Plus the GPU. Plus the uncore/chipset. IMHO, 65W TDP chips are probably similar or better than Kaveri.
A dual core Haswell-U has no problem reaching 2.4GHz in realistic workloads with a 15W TDP.
 
Is excavator still planed for desktop? Because from that slide, above 20w it's actually worse than steamroller. Or will there be a high performance process for excavator desktop instead of high density? If they go to high density route I don't see this architecture fit for desktop, unless they just ignore single threaded performance and go with many small cores.
Not at the moment, they will release "refresh Kaveri's" on desktop with slightly higher clocks.
 
With this slide:

http://i.imgur.com/4RQwUDB.jpg

5% greater IPC at 40% less power for the excavator cores - so 1.05x perf/0.6x power which means 1.75x the perf/w of the steamroller cores. Even if we take a sizeable chunk of that off as best case scenario/marketing, that could still be ~1.4x perf/w on the same node which is an impressive result. We will see in time how true that is however.
 
Based on the fourth slide here, chances are that it is true at very low power, maybe for 15W SKUs or below. There should be substantial gains at 25W and more modest ones up to 35W, and likely very little or nothing beyond that point.
 
I wonder how far the design methodology being rolled into Excavator would be from having a module slapped into a GPU, just for giggles if nothing else. The trend since Llano has been the successive abandonment of a large swath of physical distinctiveness of CPU silicon versus the ASIC component of the APU.

If slide 6 is reasonably accurate in the visual depiction of Steamroller (first even remotely useful micrograph of this at all?) versus Excavator, I am struck by how the logic and SRAM components would look far more at home in a die shot of a GPU than they would a CPU, similar to the look of the highly synthesized Jaguar APUs.
The one nice bonus Excavator might have over putting in Jaguar cores is that Carrizo appears to promise a level of HSA conformance that has not been indicated elsewhere, and the necessary interconnect might tag along better coming from that APU.
 
We even have a (tiny, blurry) die shot of Carrizo, courtesy of Planet3DNow:

Excavator_Carrizo_Die_Shot_Artikelbild_300x270.jpg

http://www.planet3dnow.de/cms/14242-amd-praesentiert-excavator-und-carrizo-auf-der-isscc-2015/

The CPU cores are quite small, the GPU is hard to distinguish from the northbridge and the DDR3 PHYs look really large to me. I think the southbridge is close to the top-left corner because there seems to be a bunch of I/O over there, but I'm really not sure. I can't make out anything else of interest.
 
Back
Top