AMD RyZen CPU Architecture for 2017

CarstenS · Feb 3, 2016

Possibly, they want to integrate GPU cores into the count, making them available for general processing via HSA? Or won't it be necessary to assign a core# to them?

3dilettante · Feb 3, 2016

tunafish said:
The more interesting part is the (older, but I didn't notice that patch at the time) confirmation that LLC now lives among cores, not on the NB which likely has latency implications. I wonder if writes from a core preferentially access their own L3 slice or if they do a system similar to Intel where each cache line belongs to a specific L3 slice depending on it's address, and all cores access all L3 evenly.

It seems less useful to know the L3 ID if all locations are hashed across all the L3s in a socket. It seems useful to know how to exploit locality for a non-uniform access setup, which hashing across L3 IDs would interfere with.

CarstenS said:
Possibly, they want to integrate GPU cores into the count, making them available for general processing via HSA? Or won't it be necessary to assign a core# to them?

It doesn't seem too out of line for where the higher-count workstation/server CPUs are getting to.
I don't know about involving the CUs. The ID is being used in the context of the CPU socket and L3 hierarchy, and basing part of it on APIC ID would denote a change as CUs don't individually plug into that. The GPU as a whole might hook into things from an IO standpoint, so CUs usually hide behind layers of abstraction. There were indications that Polaris is introducing even more hardware scheduling, which might insulate individual CUs further.

Alexko · Mar 2, 2016

More details on Zen, courtesy of Dresdenboy: http://dresdenboy.blogspot.fr/2016/02/new-amd-zen-core-details-emerged.html

Deleted member 13524 · Mar 2, 2016

Alexko said:
More details on Zen, courtesy of Dresdenboy: http://dresdenboy.blogspot.fr/2016/02/new-amd-zen-core-details-emerged.html

So the L0 cache that appeared in other slides really is micro-op cache after all, and its functionality resembles the same micro-op cache as Intel has been using since Haswell.
Also, the presence of GMI pretty much confirms this slide from fudzilla.

fellix · Mar 2, 2016

ToTTenTranz said:
ntel has been using since Sandy Bridge.

Fixed.

3dilettante · Mar 4, 2016

ToTTenTranz said:
So the L0 cache that appeared in other slides really is micro-op cache after all, and its functionality resembles the same micro-op cache as Intel has been using since Haswell.

I am drawing a blank on slides for an L0 cache for Zen.
Historically, I've seen more about an L0 operand cache or stack cache--which is the wrong cache type for where the only L0 reference is listed.
As an Icache TLB, the L0 TLB might not be tied to a cache AMD calls an L0, since TLBs in prior CPUs have multiple levels hanging off of the L1.
It might be a micro-TLB used for way prediction or reducing the number times the core needs to check tags. That's where I've seen mentions of the term "microtag" that showed up in the expanded error reporting.
That might also be expanded to serve as a uop-cache hit check, if it shares Intel's constraint that the uop cache map to the L1.

Alexko said:
More details on Zen, courtesy of Dresdenboy: http://dresdenboy.blogspot.fr/2016/02/new-amd-zen-core-details-emerged.html

The oddity with the FMA FP0+FP3 FP1+FP3 pipe sharing was dropped, although why it was mentioned in the first place is curious. It seems elaborate for a typo.
If FP0 and FP1 are FMA pipes, it might explain why IADD is possible on them as well as FP3.

I tried looking up the context of the error reporting for shadow tag ECC errors. At least one instance I found it mentioned was a paper on non-uniform cache allocations, where shadow tags are used to determine which cores' workloads would benefit most from expanding the "local" allocation given to them from a last-level cache. That leads to lower latency within the local partition and not having to check across all the tags in the common case.

The error reporting for multi-way hits in the L3 and L2 is something I am not familiar enough about to know if this is something peculiar to a segmented L3, or a possible transient state in the long memory pipeline that could happen but has not been subject to reporting.

The error-reporting and checkpointing functionality in general seems to tie into Zen's server focus. The error reporting is something that is either not going to be in all new CPUs, or something that can be toggled.

Deleted member 13524 · Mar 5, 2016

3dilettante said:
I am drawing a blank on slides for an L0 cache for Zen.

My bad, not slides. It's on VR World's leaks from last month:

http://vrworld.com/2016/02/12/cern-confirms-amd-zen-high-end-specifications/

hoom · Mar 6, 2016

Hmm confirmed GPU/HBM of the APU as a separate MCM on the package.
Well that crushes my dream of using the HBM from the APU as a massive full bandwidth L3/4 a-la Crystalwell

Unless the consumer version goes for 8 core half CPU & GPU on the other half of same die... but then that would probably not be a big enough GPU to justify HBM -> would just use the system DDR I guess?

Edit: on the other hand Crystalwell has 100GB/s too, though on a more direct link & higher clock.
Off chip to the GPU die then back off that to the much lower clocked HBM sounds like latency would be pretty bad.

Blazkowicz · Mar 23, 2016

I was wondering what was you were talking bout and that is a server product, not a consumer APU. (announced in link above)

It's rather huge, goes in dual socket motherboards and could easily cost $2000, as a ballpark figure.
If you want some Crystalwell-type memory even a severely overpriced Broadwell-K or Kaby Lake + eDRAM will be cheaper still.

I also thought, what they're thinking? If GPU is off-die, would AMD have to call that "Unfusion"?

But seeing the 32-core MCM goes up to four sockets and the 16-core + GPU goes to two sockets, AMD must have designed some new interconnect that can be used between CPUs or between CPU and GPU ; multiple links are likely used in the CPU + GPU MCM.

There doesn't seem to be a word on it, but here we have AMD's answer to NVLink.

hoom · Mar 24, 2016

I had been hoping that the GPU bit of the Zen consumer level APU could be using HBM -> able to be used a-la Crystalwell <shrug>

Yeah they have this 100GB/S 'coherent data fabric' http://www.fudzilla.com/news/processors/38402-amd-s-coherent-data-fabric-enables-100-gb-s

fellix · Mar 24, 2016

AMD’s upcoming AM4 socket will be based on a µOPGA design with 1331 pins

AMD has been a devout supporter of Pin Grid Array socket types and it looks like the AM4 will be no different. OPGA stands for Organic Pin Grid Array (the ‘organic’ in the term stands for the plastic attached to the silicon die, out of which the pins protrude), and according to this report, the company is deploying a new standard called the µOPGA socket. The micro in the term indicates that AMD will be using pins with less diameter, which will presumable be weaker than OPGA based pins. Going up from 940 pins to 1331 is an increase of approximately 40% and it is implied that AMD will be decreasing the distance between the pins.

This means that while the µOPGA AM4 socket size will remain approximately the same, it will be much more fragile than previous OPGA based iterations. AMD hopes to use this particular socket for all its mainstream and enthusiast platforms – including APUs. AMD’s AM4 will combine the best points of AM1+, AM3+ and FM2 sockets. These will be deployed in everything from a budget AIO motherboard to the integrated PCH schematics of Bristol Ridge.

Deleted member 13524 · Mar 24, 2016

With 1331 pins I guess we can say it's probably not a quad-channel capable platform.

Albuquerque · Mar 25, 2016

hoom said:
Edit: on the other hand Crystalwell has 100GB/s too, though on a more direct link & higher clock.

Keep in mind, Intel's XEON E7 line of quad-channel equipped Haswells are rated at 102GB/sec in main memory throughput when using DDR4/1866 memory. If you stack up sockets with a NUMA-aware app, you can see benchmark numbers exceed 180GB/s...
The new SMIv2: http://www.anandtech.com/show/9193/the-xeon-e78800-v3-review/3

The benchmarks: http://www.anandtech.com/show/9193/the-xeon-e78800-v3-review/10

While the latency isn't likely as low as HBM, that's still an incredible amount of throughput available right now.

deasd · Mar 25, 2016

ToTTenTranz said:
With 1331 pins I guess we can say it's probably not a quad-channel capable platform.

Or triple channel like LGA1366?

Blazkowicz · Mar 25, 2016

It might have 32x PCIe lanes or so (less could be enabled on lower end APUs if it makes sense)
That would explain a bit more pins than desktop Intel.

I believe that otherwise we're expecting dual channel.
If there's a gigantic eight-channel socket it would make sense for it to be a quad-channel socket inbetween (similar to Intel's 2011), or a dual socket dual-channel (each) at least, the latter would replace Socket C32.
But then again, AMD lacks some means and C32 is hardly even known about so I guess the plan is, you want high end? Fuck it, let's bring the giant eight-channel socket (like a high-level Gillette executive would say)

Grall · Mar 25, 2016

fellix said:
AMD’s upcoming AM4 socket will be based on a µOPGA design with 1331 pins

Why are they sticking to pin grid sockets? Surely it must complicate high-speed signalling if you have more capacitance in your socket.

deasd said:
Or triple channel like LGA1366?

That'd be nice!

I still have my i7 920-based rig running strong. That phat array of DIMMs lined up never ceases to be impressive IMO, hehe.

nutball · Mar 25, 2016

Umm... is AM4 desktop, or server?

Blazkowicz · Mar 25, 2016

AM4 is desktop and replaces both AM3 and FM2+. Said to be like AM1 too so the CPU should integrate a good part of the chipset. (SATA, USB running from the socket)

Server uses an unknown huge socket. Competition goes to a big socket too if we consider Intel's future socket for Skylake-EP/Skylake-EX, or even "OpenPOWER" stuff.

Lower end x86 servers might use AM4, but with no special emphasis on it (roll your own for home or small business? Sure. Fill datacenter racks I think not)

eastmen · Mar 26, 2016

any new rumors about launch ? is this still a 2017 thing or is there hope it will hit in 2016

Grall · Mar 26, 2016

They've SAID '16, but words are cheap...

AMD RyZen CPU Architecture for 2017

CarstenS

Moderator

3dilettante

Alexko

Deleted member 13524

Guest

fellix

3dilettante

Deleted member 13524

Guest

hoom

Blazkowicz

hoom

fellix

Deleted member 13524

Guest

Albuquerque

Red-headed step child

deasd

Blazkowicz

Grall

Invisible Member

nutball

Blazkowicz

eastmen

Grall

Invisible Member

Similar threads