AMD Carrizo / Toronto

The Big Picture:

plSAib1.png
 
Ah, thank you fellix! With the GPU missing from the first picture, it's no wonder I had trouble distinguishing it from the northbridge! :LOL:

And now the memory PHYs don't look particularly large anymore. In retrospect I really should have noticed that something was very wrong.

Is there anything wrong with that?

When Kaveri was first released, there was some speculation, based on estimated scaling factors, leaks and some ambiguous statements in official documentation, that Kaveri featured a 256-bit wide memory interface of which 50% was unused, i.e. that each stripe was 128-bit wide.

It's now clear that this wasn't the case.
 
Lets hope AMD manages to have Kepler/Maxwell type of improvement wrt to power consumption, they need it.

Now I think their positioning is completely off, that chip is designed as if they were still fighting against Intel. They don't, there is no need for them to have a "win" even on the GPU side of the APU. Intel is 2 nodes ahead and both its CPU line (Atom and Core) performances are out of their reach. They don't really the brand recognition to do that either.

I don't expect to win any argument on the matter but clearly it is time for them to produce based on more economical considerations. They want to continue to push out pretty big chips that can't live up to their promises either because of power constraints or bandwidth starvation, all the power to them, they will finish in another inventory write off, business as usual I would say.
The reality is that those APUs will end up against pentium and celeron, how AMD expect to be profitable enough going against Intel salvaged parts while tusing x4 CPU x8 CUs configuration (pretty big chip)? They won't be profitable and hardly competitive, will follow an inventory write off and lots of "big" APU sold for cheap as Athlon with half the APU disable. For the ref iirc those late Core M processor are ~90mm2.

Everything they are doing is fine: denser circuitry, more advanced power management, architectural improvement, etc. BUT the product is wrong for me, it is more than it needs to be. Actually the same applies to their lesser SOC line, fully enabled versions are pretty rare compared to 2 cores /salvaged versions.
I do get that they have been going against Intel for decades, trying to match their performances (exceeding them at a time), etc. BUT they can no longer do it, they have to change their ways.
The chip is twice as big as it should be in my opinion.
 
Last edited:
It's a bit sad that the bus architecture isn't as streamlined as even Sandybridge:

http://techreport.com/review/27853/amd-previews-carrizo-apu-offers-insights-into-power-savings/3

For example, for most intents and purposes, AMD's Kaveri is kind of a Radeon glued to a Bulldozer core. Rather than sharing a memory controller over an interconnect fabric that maintains memory coherency—as one would expect with the SoC approach—Kaveri's GPU has three paths to memory: a 512-bit Radeon memory bus, a 256-bit "Fusion Compute Link" into CPU-owned memory, and another 256-bit link into CPU-owned memory that maintains coherency. In theory, with a proper coherent fabric, these three links could be merged into one, saving power, reducing complexity, and quite probably improving performance. The use of a proper interconnect fabric would also allow AMD to swap in newer or larger graphics IP without requiring as much customization.

AMD surely chose to build Kaveri as it did for good reasons, most notably because it needed to deliver a product to the market in a certain time frame. Still, one can't help but note that Intel's original Sandy Bridge chip had a common ring interconnect joining together the CPU cores, graphics, shared last-level cache, and I/O. From a certain perspective, although it wasn't meant for this mission, Sandy Bridge's basic architecture was arguably a better fit for AMD's HSA execution model than Kaveri.

Makes me think that the AMD digestion of its ATI acqusition an ongoing process (although the large switch in fabrication to GPU style would be a sign of progress)
 
Makes me think that the AMD digestion of its ATI acqusition an ongoing process (although the large switch in fabrication to GPU style would be a sign of progress)
Looking back, AMD should have merged with Nvidia instead of buying ATI and then put JHH as new CEO as he demanded. History would have been very different.
 
It's a bit sad that the bus architecture isn't as streamlined as even Sandybridge:
Major changes in CPU interconnects are a significant undertaking. The complicated bus arrangement is how AMD was able to shoehorn the GPU memory subsystem into the crossbar+request queue setup AMD's CPUs have been using in some form since the K8.
Intel's transition from Nehalem to Sandy Bridge introduced the ring bus and a revamped cache protocol that looks like it might find a mesh-based successor in the next MIC. A lot of cores have come and gone with that foundation still in place.

AMD is overdue for a change, but there would be a lot of groundwork that would have to be done on the CPU, GPU, and uncore to make the two sides more compatible. At present, the GCN cache hierarchy is appears to be too primitive and the CPU side too unscalable for there to be something like a shared cache.

Some of the research AMD's put forward allows for a continued split between coherent and non-coherent traffic, so I'm not sure how that meshes with the article's definition of a "proper" coherent interconnect.
 
Intel's transition from Nehalem to Sandy Bridge introduced the ring bus and a revamped cache protocol that looks like it might find a mesh-based successor in the next MIC.

Hmm, what makes you believe that they would move to a mesh topology from ring?
 
Hmm, what makes you believe that they would move to a mesh topology from ring?
Knight's Landing will have a 2D mesh.
https://software.intel.com/en-us/articles/what-disclosures-has-intel-made-about-knights-landing

Whether that will filter down to the mainline anytime soon is unclear, but it is possible that the bandwidth requirements for the vector processors has met scalability limits of the original ring bus.
The highest-end Xeons may have the bandwidth and performance level to also find benefit in the revamped topology as well.
 
It seems like CPUs would benefit more from tight, predictable latencies than massive bandwidth; I don't think they typically saturate the bandwidth offered up by dual channel DDR3 (at least with pre-AVX2 workloads without massive FMAC throughput). Xeons seem to do just fine w/ a multi-ring bus. Inter-GPU connects might be a different story though.
 
The high-end Xeons already have more than a dual channel DDR3 setup and higher core counts, and AVX and FMAC are an ongoing feature for designs coming down the pipeline. It seems like they would be the next designs to approach the interconnect performance realm that MIC is in.

It is also the case going back to the ring bus on AMD's R600 that it has less predictable dynamic behaviors when bus segments come under contention. A grid would offer additional routes for traffic.

edit:
I figured it might be possible for the small more mobile-oriented designs to get by without such a transition for a while. On the other hand, it occurred to me that it might be possible to deactivate parts of the bus with more granularity when it's a grid.
 
Last edited:
Why is the Northbridge so huge?
It seems grossly inefficient compared to those tiny cores & nearly as big as or bigger than the GPU.

I mean, its presumably because of
Kaveri's GPU has three paths to memory: a 512-bit Radeon memory bus, a 256-bit "Fusion Compute Link" into CPU-owned memory, and another 256-bit link into CPU-owned memory that maintains coherency.
but I thought that AMD had taken so long to get its first Fusion stuff to market specifically because they were building a properly integrated Northbridge?
This is what the 3rd generation APU? & its still a cludge of different paths to memory...
 
The bulk of the unified north bridge concerned with the memory subsystem is sandwiched between the two CPU modules. Much of the area covered by the NorthBridge label is the on-die southbridge, IO, and accelerators.
 
Ah well that makes more sense.
Intel still has a bunch of that off in separate SB right?
 
Ah well that makes more sense.
Intel still has a bunch of that off in separate SB right?
Depends on chip, they have full blown SoCs too, but higher performance parts, including everything Broadwell (Core M etc) have it separated, yeah. On Broadwell-Y it's on the same packaging but separate die
 
Ah well that makes more sense.
Intel still has a bunch of that off in separate SB right?

I think it's same package, different die, at least for currently available stuff.
 
I think it's same package, different die, at least for currently available stuff.
Higher performance parts (Broadwell-based newest gen) yes, but they have real socs in the Atom-world (Bay Trail, Cherry Trail etc)
 
http://www.anandtech.com/show/9246/amds-carrizo-l-apus-unveiled-12-25w-quad-core-puma

- 4 Puma+ cores
- Apparently, a GPU with the same 2CU GCN amount we've seen for over 2 years in that line of APUs
- DDR3L-1866

Carrizo-L seems to be a tiny evolution from Beema, which was itself a tiny evolution from Kabini.
It's like the Intel "Ticks", but without the process shrink, and being two in a row.
I thought that AMD cancelling Krishna back in late 2011 would mean they'd get more time and agility to advance their low-power APUs.
Yet here we are in 2015 and announcing what seems to be little more than a rebranding of 2013's chips.

There wasn't a strong market for these chips in 2013. And since they don't exist in a vacuum, I'd say there's a much smaller market for these chips now. The new Cherry Trail line will likely crush these SoCs in both performance and performance/watt.
 
jaguar/puma will be replaced by a low power zen quadcore, so they must have preferred to move all the resources to this
 
There wasn't a strong market for these chips in 2013. And since they don't exist in a vacuum, I'd say there's a much smaller market for these chips now. The new Cherry Trail line will likely crush these SoCs in both performance and performance/watt.
Cherry-Trail doesn't really offer much of an improvement in cpu max clocks. Thus at the highest clocks Carrizo-L should be quite a bit faster still.
I think the gpu may still be competitive too, though most likely just because of the higher TDP. You are however right that perf/w is probably not so hot compared to Cherry-Trail. Of course for the whole range of Carrizo-L TDP intel would rather sell you a broadwell cpu instead (even the desktop Bay-Trail chips did not go past 10W).
But it seems the small cat cores are dead (everything Zen or K12 in that market segment too in the future) , hence no resources for a really new chip, just old chip in new socket more or less.
 
Back
Top