The AMD Execution Thread [2007 - 2017]

Status
Not open for further replies.
If the more recent leaked slides about AMD's HPC APU are legitimate, AMD's solution to implementing two sets of memory controllers on the same chip is to not do so by using an MCM hosting separate CPU and GPU silicon.

That might almost be good news.

Hopefully whatever communication exists between the CPU and GPU will be common to the consumer parts, meaning consumer Zen and Vega 11 (the small one?) might make it into laptops with only a custom interposer. You still save some power on memory and signalling and still save an awful lot of space.

If it's possible to say at this point, how do you rate the chances of being able to mount consumer Zen on an interposer? If it's a "standard" 16 core server Zen going onto those HPC APUs interposers wouldn't that indicate it's a solved problem?

EDIT: Although, would there actually be space for two dies and and a HBM2 stack on a consumer package, under that consumer sized heatspreader ...
 
Last edited:
The slide in question was found by Fudzilla, I believe and it was more recently released than the HPC APU slide posted previously. At least the code names and general feature names have been corroborated so far.
http://www.fudzilla.com/news/processors/38402-amd-s-coherent-data-fabric-enables-100-gb-s

The slide had a Greenland GPU and 2 HBM2 stacks on an interposer. As drawn, it seemed to indicate that this interposer is put onto an MCM that mounts the 16-core Zeppelin CPU separately.
Multiple GMI links connect the CPU to the GPU over the MCM substrate.
Zeppelin may not be designed to use an interposer, and the GMI bandwidth is relatively modest compared to what can be achieved with an actual interposer. I'm not sure the cheaper variants have better odds than the one destined for HPC.

It seems plausible that the CPU's GMI and PCIe connectivity would use the same physical interface, although prior to the OpenCL driver strings people were using Vega interchangeably with Greenland.
A likely implementation for this would require that Greenland have a measurably wider PCIe/GMI bus than would be necessary for a consumer GPU, and with Vega is listed separately it might not have that kind of capability.

I'm open to be pleasantly surprised, but it's possible that the engineering problems with the different memory types and CPU/GPU communication may not be addressed with Vega and consumer-space Zen. The HPC version may not be superior to the likely competition, but details are light.
 

West Coast Hitech L.P. stands to gain a lot of money if AMD's share price rises by 2020, while it loses nothing if it goes down. AMD is taking a 235 million USD charge to give them that option (warrant).

They'll also have to either use their contractual wafer allotment or pay GloFo quarterly for any volume under the negotiated minimum.

It does make one wonder if GloFlo's 14 nm FF was significantly worse than Samsung's (despite sharing process tech) and/or TSMC's 16 nm FF and that this prompted AMD to amend their previous agreement in order to use another Foundry even if they have to pay money to do so.

It really is too bad we don't have a Samsung sourced Polaris 10/11 to test against the GloFlo sourced chip.

Would be hilarious if AMD ended up using excess capacity at Intel's Fabs. :D

Regards,
SB
 
I'd like to see that. Even 4GB of HBM would be a good start on a system that wouldn't need to use drivers to swap texture data over PCI-E. If the HPC APUs could use the HBM as a last level cache then everything would just fly, with fewer headaches than for a dGPU.
The limitation would be the speed of the GMI link, so roughly 100GB/s based on some of the leaded docs a while back. That is possibly in addition to system memory bandwidth. Hard to tell what the maximum rate is as the configurations were all CPU to GPU and the links only large enough to accommodate maximum system memory bandwidth. That's still a ton of bandwidth for a CPU to play with if it works. Problem being it's not the CPU that likely needs the bandwidth. Most bandwidth intensive tasks for a CPU using HSA would be better accelerated by the GPU portion.

Not if you want performance out it at a reasonable price. just doesn't work that way, you need both for a lower end and midrange laptop, now if you want to go higher end, cut down Greenland just doesn't fit the bill does it?
Price could be interesting as they might be able to run a system without any dram. A chromebox for example typically has 2/4/8GB configurations. So some costs might be offset by not including DIMMs or possibly even their sockets. That could make for some interesting configurations and performance.

I'm assuming that AMD will already have to design two sets of memory controllers into the same chip for the HPC version, and solve any issues relating to CPU <-> GPU communication. Solutions should already have been produced and be working blocks on whichever process they're using for the HPC version. We're not talking about solving any new problems here, or even necessarily adding any new features.
Only one controller using GMI links for different chips based on some leaded documents.

Zeppelin may not be designed to use an interposer, and the GMI bandwidth is relatively modest compared to what can be achieved with an actual interposer. I'm not sure the cheaper variants have better odds than the one destined for HPC.
Modest, but likely more than sufficient considering the typical CPU memory bandwidth. Overkill if the GPU handles acceleration for bandwidth intensive tasks.

It does make one wonder if GloFlo's 14 nm FF was significantly worse than Samsung's (despite sharing process tech) and/or TSMC's 16 nm FF and that this prompted AMD to amend their previous agreement in order to use another Foundry even if they have to pay money to do so.
Was there an agreement they had to use GF exclusively? If they can't produce Polaris fast enough to meet demand, meeting wafer requirements shouldn't be a huge concern. Article doesn't mention chip pricing either, so it's possible some of the costs are offset by cheaper production if there were yield/performance issues.
 
This new "better" agreement just shows how GF have AMD by the balls, pretty ridiculous that the original deal was even legal.
The sooner AMD break free from GF's grasp the better.
 
The slide in question was found by Fudzilla, I believe and it was more recently released than the HPC APU slide posted previously. At least the code names and general feature names have been corroborated so far.
http://www.fudzilla.com/news/processors/38402-amd-s-coherent-data-fabric-enables-100-gb-s

The slide had a Greenland GPU and 2 HBM2 stacks on an interposer. As drawn, it seemed to indicate that this interposer is put onto an MCM that mounts the 16-core Zeppelin CPU separately.
Multiple GMI links connect the CPU to the GPU over the MCM substrate.

Zeppelin may not be designed to use an interposer, and the GMI bandwidth is relatively modest compared to what can be achieved with an actual interposer. I'm not sure the cheaper variants have better odds than the one destined for HPC.

Thanks.

So it looks like Greenland will be self contained on an interposer. My (optimistic outlook) would be that if Greenland is available for a HPC part, it could be made available for a consumer part without any re-engineering needed.

"4+" TF isn't a whole lot, but I suppose that this is going to be clocked for going on a package that already has 16 Zen cores on it, and that 100 GM/s should be enough to for both CPU <-> GPU communication and for the GPU to plunder main memory bandwidth.

It seems plausible that the CPU's GMI and PCIe connectivity would use the same physical interface, although prior to the OpenCL driver strings people were using Vega interchangeably with Greenland.
A likely implementation for this would require that Greenland have a measurably wider PCIe/GMI bus than would be necessary for a consumer GPU, and with Vega is listed separately it might not have that kind of capability.

I'm open to be pleasantly surprised, but it's possible that the engineering problems with the different memory types and CPU/GPU communication may not be addressed with Vega and consumer-space Zen. The HPC version may not be superior to the likely competition, but details are light.

So I suppose for a "consumer super APU" the likely best hope would be consumer Zen on a MCM with Greenland. Power limits allowing, that would should give you a fast 8 core CPU with something around 470 8GB performance, but with a lot of power saved over the 256-bit GDDR5 setup.

I'll keep dreaming for now, but I think getting something clearly PS4 Neo beating in an ultra small form factor PC would be pretty cool.

The limitation would be the speed of the GMI link, so roughly 100GB/s based on some of the leaded docs a while back. That is possibly in addition to system memory bandwidth. Hard to tell what the maximum rate is as the configurations were all CPU to GPU and the links only large enough to accommodate maximum system memory bandwidth. That's still a ton of bandwidth for a CPU to play with if it works. Problem being it's not the CPU that likely needs the bandwidth. Most bandwidth intensive tasks for a CPU using HSA would be better accelerated by the GPU portion.

I suppose the key would be using the 8/16 GB of HBM2 wisely. Perhaps transferring in data head of time, or using it as an enormous cache?

Only one controller using GMI links for different chips based on some leaded documents.

Excellent! *rubs hands together*

[/QUOTE]Was there an agreement they had to use GF exclusively? If they can't produce Polaris fast enough to meet demand, meeting wafer requirements shouldn't be a huge concern. Article doesn't mention chip pricing either, so it's possible some of the costs are offset by cheaper production if there were yield/performance issues.[/QUOTE]

Hopefully AMD wouldn't have agreed to a contract whereby they had to use GF even if GF couldn't supply them with enough working dies....

AMDs biggest problem right at the moment seem to be that they can't make enough of Polaris 10. While performance is behind nVidia they could still be selling well if they had the chips to sell, and 470 4 GB wasn't going for more than a 480 8GB was supposed to be.
 
I suppose the key would be using the 8/16 GB of HBM2 wisely. Perhaps transferring in data head of time, or using it as an enormous cache?
It might not even need that much for a consumer level product. With a 100GB/s link to system memory most textures could be stored there.
 
So it looks like Greenland will be self contained on an interposer. My (optimistic outlook) would be that if Greenland is available for a HPC part, it could be made available for a consumer part without any re-engineering needed.

Hum.. if Greenland is self contained, then why does the OpenCL driver recognize both Greenland, a Vega 10 and a Vega 11?
Could Greenland actually be the exact same graphics chip+interposer+HBM2 set as one of the Vegas, but the OpenCL recognizes two different GPUs because one has a 100GB/s link to the CPU (and main memory) and the other has the PCIe 3.0's 16GB/s? I recon that for HSA that could make a large difference.
 
Hum.. if Greenland is self contained, then why does the OpenCL driver recognize both Greenland, a Vega 10 and a Vega 11?
Could Greenland actually be the exact same graphics chip+interposer+HBM2 set as one of the Vegas, but the OpenCL recognizes two different GPUs because one has a 100GB/s link to the CPU (and main memory) and the other has the PCIe 3.0's 16GB/s? I recon that for HSA that could make a large difference.
Likely because of the memory model. There wouldn't necessarily need to be any HBM present. That may also be the case if the HBM is used like a cache with a portion of system memory the VRAM.
 
Hum.. if Greenland is self contained, then why does the OpenCL driver recognize both Greenland, a Vega 10 and a Vega 11?
Could Greenland actually be the exact same graphics chip+interposer+HBM2 set as one of the Vegas, but the OpenCL recognizes two different GPUs because one has a 100GB/s link to the CPU (and main memory) and the other has the PCIe 3.0's 16GB/s? I recon that for HSA that could make a large difference.

It's possible that Greenland is unrelated to Vega and is "Big Polaris" that was rumored to be removed from the desktop roadmap, possibly due to how close it was going to be to a potential Vega launch.

It would make sense to use Polaris over Vega as it's something that is a known quantity now versus having to implement both new CPU cores AND new GPU cores simultaneously.

Regards,
SB
 
Was there an agreement they had to use GF exclusively? If they can't produce Polaris fast enough to meet demand, meeting wafer requirements shouldn't be a huge concern. Article doesn't mention chip pricing either, so it's possible some of the costs are offset by cheaper production if there were yield/performance issues.

For the most part, yes. AMD has mentioned there are trigger points that could allow them to manufacture chips at a foundry other than GF. I'm going to guess that it's related to wafer allocation and potentially if GF is unable to deliver working silicon.

This new amendment to their Wafer Supply Agreement With GLOBALFOUNDRIES, allows them to bypass this as long as they pay a fee. As the press release mentioned it's multi-tiered.
  1. 100 million USD paid in 25 million USD installments per quarter. That allows them to manufacture "certain products" at another foundry. So there may be some products that are still exclusive to GloFo.
  2. Additional quarterly payments starting in 2017 and presumably ending in Q4 2020 based on the volume of wafers "purchased from another foundry." Basically a tax per wafer used at another foundry. Regardless of whether AMD uses GloFo or someone else, GloFo will be getting money from AMD per wafer.
  3. West Coast Hitech L.P. is granted a warrant to purchase AMD stock at a fixed 5.98 USD. That can be exercised in whole or in part at any point between now and prior to Feb. 29, 2020. If the stock price goes down, this means nothing. If the stock price goes up, that subsidiary of Mubadala Development Company PJSC has the potential to make a lot of money.
As such AMD will record a one time charge of 335 million USD due to points 1 and 3 above. 100 million USD for point 1 and 235 million USD for point 3.

It's still a very favorable contract for GloFo. In fact on the surface it appears to provide GloFo with far more money than they were due prior to the amendment.

But it allows AMD to bypass GloFo with compensation per wafer. That's key if GloFo is having unsatisfactory yields or lower than expected performance (power or performance) which would directly impact AMDs ability to compete in the marketplace. Its so much in GloFo's favor that the first thing that comes to mind is that GloFo isn't doing nearly as well with 14 nm FF compared to Samsung or TSMC's 16 nm FF and AMD feels it is required in order to be competitive and/or they feel that GloFo will not be able to match Samsung and TSMC in transitioning to lower nodes.

Regards,
SB
 
Am I misreading the situation, or does AMD seem to be consistently a little hobbled by GF?

The Xbox ONE S is manufactured on TSMC 16nm, and is delivering large quantities (hundreds of thousands a month) of 900+ mHz (GPU) chips with 8 fully enabled Jag cores, with a total system power (at the wall) of under 80W while gaming. That's with only a single bin, and it's a 240 mm^2 chip.

Granted there's a lot we can't tell from that, but it shows that AMD are banging out larger and more complex chips from TSMC, with at least "good" power efficiency even from a single bin, and in vast numbers. And X1's 12* CUs will almost certainly be taking up less than 20% of that 240 mm^2, so TSMCs density isn't looking bad either.

* Edit, could actually be 14 CUs if it has two redundant like the 28nm chip.
 
But it allows AMD to bypass GloFo with compensation per wafer.
That's one heck of a bypass considering what AMD spends on wafers. That compensation should also only apply to expected margins. They shouldn't need to compensate for full revenue. The scale of that payment seems inline with reasonable profits for GF for the remainder of the agreement. You'd think they could have just bought a 5 year supply of 14nm interposers that can't be screwed up. AMD spent what, 155M on fabs last year? Even with a ~40% margin that's 5 years of chips at the rate they've been purchasing them.
 
Status
Not open for further replies.
Back
Top