AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
A recent slide? Have been seeing too many posts about using multiple dies on an interposer recently at anandtech and other forums..
It was originally posted at Fudzilla I think, which is why it's mostly disregarded, only part of a slide too but the style was faithful to AMD's - could have just as well been fake too.
Anyway, the slide was posted around 6 months ago. It got a bit more credibility a while back when AMD talked about "coherent 100GB/s bus" they're working on, which matches the one described in the slide to be used between the GPU and "Zeppelin" CPU. The GPU is supposed to have 4 TFLOPS+ performance and apparently 2 stacks of HBM memory, while the CPU / whole module has 4 DDR4 memory channels
Here's the slide:
20150810amdmcm.jpg
 
A recent slide? Have been seeing too many posts about using multiple dies on an interposer recently at anandtech and other forums..
It was originally posted at Fudzilla I think, which is why it's mostly disregarded, only part of a slide too but the style was faithful to AMD's - could have just as well been fake too.
Anyway, the slide was posted around 6 months ago. It got a bit more credibility a while back when AMD talked about "coherent 100GB/s bus" they're working on, which matches the one described in the slide to be used between the GPU and "Zeppelin" CPU. The GPU is supposed to have 4 TFLOPS+ performance and apparently 2 stacks of HBM memory, while the CPU / whole module has 4 DDR4 memory channels
Here's the slide:
To add to this, it seems the thing is in fact real.
I just browsed through, and Fudzilla has even earlier released another slide, which mentions 16 Zen cores with 4x DDR4, and Greenland GPU with 2x HBM2 stacks, some other details has since changed (PCIE > GMI)
Also, the Zeppelins have now been confirmed:
http://dresdenboy.blogspot.de/2016/02/amd-zeppelin-cpu-codename-confirmed-by.html
 
So, "Greenland" is ~ Fiji performance with proper double-precision at 4 TFLOPS+?
Definitely maybe :yep2:
Of course nothing says the Fudzilla slides are real at all, just that 2 features in them have now been confirmed in some way
 
Just saw the dresdenboy post on reddit, so zeppelin it is. A fake slide would've used Zen instead, so there's more credibility.

Zeppelin looks like a CPU only while I was thinking that it was an APU and then adding a discrete card level of performance on MCM.

If it's 4TF of SP then it could be the Polaris 10 chip, with the whole assembly going into laptops.
 
If it's 4TF of SP then it could be the Polaris 10 chip, with the whole assembly going into laptops.

Polaris 10 is almost certainly around 2 TFLOPs because AMD have told time and again that it offers console-level performance. Plus, getting 4 TFLOPs into a ~40W GPU is probably not coming to the first-gen of FinFet models, since it would mean a 4-5x performance/power improvement over 28nm (unlikely). Also, Polaris 10 uses GDDR5, whereas the chip in that diagram uses HBM.

The first rumor with the CU amount for each chip spoke indeed of a full fledged Polaris 10 going into the Raven Ridge APU, but it doesn't seems possible that it's the same as in that slide.

And then Polaris 11 is supposed to replace Hawaii which does between 5 and 6 TFLOPs. Perhaps the chip going into this Zeppelin MCM is a third GPU (Tonga replacement?) is actually another GCN GPU coming in 2017, or maybe a very downclocked and/or cut-down Polaris 11.
 
I was going by Fury X doing 8.6TFLOPs at 600mm2, so 4TFLOPS should be doable for a chip 1/4th the size on 14nm which puts it in the same ballpark as Cape Verde. Power would depend on the frequencies.

But you're right, 270X does 2.69TFLOPs and Polaris 10 would likely end up close to it since the scaling won't be that simple. The slide also mentions 4+ TFLOPs, so it should be a bigger chip.
 
Unless Fiji has a hidden 2:1 DP ratio (like Hawaii) that we don't know about, then 4 TFLOPs DP in an AMD GPU is still far away. And I don't think AMD would develop a MCM with a Fiji in it. Cooling that thing (>400W combined TDP?) would be hell.

>4 TFLOPs SP are probably well below Polaris 11's target though, so there is no easy piece that fits that puzzle. Unless this specific MCM implementation uses a very downclocked Polaris 11, towards 650MHz or so.


The funny thing is that GPU is sitting on top of an interposer, which is on top of a MCM substrate, which is on top of a (probably socketed) motherboard.
Inception jokes should follow.
 
I haven't seen Polaris 10 or 11 linked to the name Greenland, however.
Fiji doesn't have a cache-coherent 100 GB/s interface or two stacks of HBM capable of 500 GB/s bandwidth, so the chip is going to be heavily reworked anyway.
The IO capability of that GPU would be interesting to see. There would be three tiers of connection speed in that setup, with the interposer, package, and off-package IO. Would or could that GPU use those MCM links if it is deployed outside of that context?
 
The funny thing is that GPU is sitting on top of an interposer, which is on top of a MCM substrate, which is on top of a (probably socketed) motherboard.
Inception jokes should follow.
I was thinking on similar lines :yes:
A PCB base package rather than silicon presumably?
 
I haven't seen Polaris 10 or 11 linked to the name Greenland, however.
Fiji doesn't have a cache-coherent 100 GB/s interface or two stacks of HBM capable of 500 GB/s bandwidth, so the chip is going to be heavily reworked anyway.
The IO capability of that GPU would be interesting to see. There would be three tiers of connection speed in that setup, with the interposer, package, and off-package IO. Would or could that GPU use those MCM links if it is deployed outside of that context?

We know all the chips are going to be reworked, and will feature new memory buses and etc. as it is. Direct comparison to previous chips is pointless and always has been. AMD would've needed to re-tape out all previous designs on 14/16nm finfet as well as the new ones to just naively "carry over" old designs, obviously zero reason to do this.

Regardless, if the 4 terflop performance thing is correct, and previous rumors are correct, then we can perhaps assume that the three chips will go 1536/3072/4608 compute units with accompanying memory buses. Whether the buses support GDDR5X is unknown, though perhaps not (GDDR5X won't go into production until the middle of this year right?) Regardless, a 128bit bus for the smallest Polaris 10 would still be fast enough to hit just above the overclocked 380x target if it used SK Hynix's 8ghz GDDR5, even without GDDR5X (are they even making that though? There were announcements but...)

Though I don't honestly know why Polaris 10 would have a 128bit bus while 11 would have a 384bit bus. Doubling the compute units but tripling the bus width suggests either 384bit is too much or 128bit is too little. Polaris 12 would, assuming the efficiency gains from AMD are accurate and a reasonable assumption about finfet clockspeeds, hit 50%+ performance of the Fury X at 225-250 watts of TDP, right along the lines of AMD's claimed efficiency in TDP gains. A 50% increase would also allow 12 teraflops SP performance and allow a more reasonable approximate 1:3 ratio to hit 4 teraflops DP. Though since it wasn't demonstrated at CES who knows when it'll be released (delayed due to yield problems? It would be the biggest chip).
 
Unless Fiji has a hidden 2:1 DP ratio (like Hawaii) that we don't know about, then 4 TFLOPs DP in an AMD GPU is still far away.
What on earth makes you think we're talking about Fiji in the Fudzilla picture?

Greenland is not the Polaris chips that have been announced. Additionally, if it has Fiji SP ALU count then it could easily have 2:1 SP: DP for more than 4 TFLOPS.
 
What on earth makes you think we're talking about Fiji in the Fudzilla picture?

I wrote that the only current GPU that could be able to do 4 TFLOPs DP would be Fiji, and that does not seem like a probable choice.


Greenland is not the Polaris chips that have been announced. Additionally, if it has Fiji SP ALU count then it could easily have 2:1 SP: DP for more than 4 TFLOPS.

Then it's not coming before mid-2017.
 
People out on the street now use the term GPU for cards and not chips?
GPUs, chips, I mean the same thing with those, the actual GPU-chip, not cards based on a GPU-chip (at minimum I expect 4 cards on those 2 Polaris GPUs)
Anyway, my point was that they don't necessarily include the GPU-chip in the possible HPC APU/MCM/whateveritis to the "2 Polaris GPUs/chips" (some sites have now started calling it "Vega 10")
 
Now now, they've only said they'll release 2 Polaris chips this year, that doesn't mean they'll release 2 GPUs, integrated included, this year.
Perhaps I'm wrong but to me, GPU always meant the chip. The same GPU can go into several graphics cards or embedded graphics solutions.
Polaris 10 and 11 are two distinct GPUs. I wrote earlier in this thread that AMD could take up to 8 distinct graphics solutions from these 2 GPUs (Pro + XT versions of each chip in both mobile and desktop).


I also didn't exclude the possibility that one of these Polaris chips isn't in that MCM solution (it could be 4 TFLOPS SP for a Polaris 11 at ~650MHz). I said if it does 4 TFLOPS DP then it's definitely not coming in 2016 because that would mean it's a chip with a much higher performance than Polaris 11. Polaris 11 is a Hawaii replacement, so even with a 2:1 ratio it should do 3 TFLOPs DP @ 1GHz if it comes with 48 CUs and they're all enabled.

Regardless, Jawed was the only one expressing certainty that those are 4TFLOPS DP because the slide is aimed at a HPC audience (I don't see HPC anywhere in that fragment of a slide though, I guess he took it from the quad-channel RAM).
What I see is 2 stacks of HBM2 so there's a good chance it's Polaris 11 + Zen in a MCM. The top-end Polaris is probably using 4 stacks to reach 1TB/s.
The 4 channels of DDR4... that would actually make sense for AM4 if AMD wants to make full use of integrated GPUs in their APUs. That way, even the lower-end models could skip the HBM and interposer and still get 100GB/s of bandwidth, which could suffice for say a Polaris 10.


I haven't seen Polaris 10 or 11 linked to the name Greenland, however.
You weren't supposed to, either. Greenland would be a part of the "Arctic Islands" family and RTG has changed their codename schemes away from the "Islands" mess and on to "Star Name + performance number".
You'll probably never hear from "Greenland" again, since it should have had its codename changed to "Polaris something".
 
Regardless, Jawed was the only one expressing certainty that those are 4TFLOPS DP because the slide is aimed at a HPC audience (I don't see HPC anywhere in that fragment of a slide though, I guess he took it from the quad-channel RAM).
The clue is in the non-PCI Express link twixt CPU and GPU, which, apart from the fact that a MCM of CPU and GPU is never going to be a consumer product, is a pure HPC solution.

What I see is 2 stacks of HBM2 so there's a good chance it's Polaris 11 + Zen in a MCM.
What was once Greenland is now Vega, isn't it?
 
Status
Not open for further replies.
Back
Top