AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

If they did go as large as they could on the interposer, however, that would also raise questions.

Following the assumption that you want each stack aligned with the GPU, and following the picture that has as little margin as possible, the math gets weird in the absence of implementation details like how much area is lost to spacing requirements or any components.
Let's just say being flanked by HBM takes a rounded up 12mm off of one dimension of a GPU mounted on a big, say 26x32mm interposer.
The other dimension of a 14 x ~32 or ~26 x 20 area is the length that the 7.3mm long HBM stacks plus some margin can fit into. If it were not significantly shorter in that dimension as well, it would seem a little bare.
 
The picture I linked earlier seems to show a long edge. I say that because two adjacent HBM chips take up 14mm of GPU side and there's about 5mm separating them. There appears to be about another 5mm of interposer from the main HBM chip to the bottom edge of the interposer, which I believe is just visible in the bottom right corner. So three gaps of 5mm plus 14mm takes us to ~30mm.

My estimate of .25mm spacing twixt HBM module and the large chip seems too small, now. I reckon it's more like 0.5mm. The gap between the edge of the interposer and the HBM module seems to be about the same, for what it's worth. So it seems likely to me that the GPU would be about 1mm smaller than the long dimension of the interposer. So the earlier estimate of 30mm for the long dimension seems reasonable.

That would then leave the other dimension in the region of 13mm. So that would lead to a GPU die size of something like 390mm².

Here's an actual production interposer using UMC's 65nm tech on page 9, which is described as "~775mm²". Which is 31x25 :p :

www.semiconjapan.org/en/sites/semiconjapan.org/files/docs/SPR7_25_Xilinx_SureshRamalingam_0.pdf

The really interesting question is how much die space is gained from deletion of GDDR5 PHY? I'm not too sure of a size estimate for the PHY in Hawaii. Something like 80% of the size of Tahiti's 74mm²: 60mm²?

Could we be talking about the PHYs for HBM being something like 20mm²? Saving 40mm². Making Fiji the equivalent of a 430mm² conventional chip? Smaller than Hawaii.

45% more CUs in Fiji over Hawaii, with the CUs taking >50% of the die would be impossible to fit.

So, I have to wonder if AMD is building Fiji with low double precision capability. Perhaps AMD will go for 1/16th DP rate to squeeze in all those CUs. Then there's the small question of the cost of double the ROPs, double the triangle rate and the new cost for delta colour compression.

The theoretical transistor counts just don't add up for a mere 390mm² chip.

Even if we switch dimensions, 19x24 = 456mm². Which would be equivalent to a 496mm² GDDR5 chip. That's only 13% bigger than Hawaii.

The only way I can reconcile that picture with a 19x24mm GPU is if the HBM chips are asymmetric along the long side of the GPU. The HBM module that's only just visible has its top edge aligned with the top edge of the GPU, so that both are 0.5mm away from the top edge of the interposer. That would make the long edge we can see 7 + 5 + 7 +5 = 24mm.
 
Last edited:
This slide is still relevant. It's from Synapse design, they do some parts of the physical design for AMD gpus. AMD very likely has some GPU that's >500mm^2, and at this point I think it's a good bet that that one is Fiji.
 
The picture I linked earlier seems to show a long edge. I say that because two adjacent HBM chips take up 14mm of GPU side and there's about 5mm separating them. There appears to be about another 5mm of interposer from the main HBM chip to the bottom edge of the interposer, which I believe is just visible in the bottom right corner. So three gaps of 5mm plus 14mm takes us to ~30mm.
That could be the bottom edge. Not knowing what a GPU interposer should look like at its lower corner, I'm not sure how to interpret the patterning, or that particular shade of beige at the bottom margin of the picture.

One question I have, looking at the Xilinx pdf, is what requires 5mm or so of spacing. The FPGA's various components don't need that much space, nor did Kalahari's memory stand-ins.

Perhaps it is more important for the GPU's internal architecture to balance its data paths instead of cramming them too close together, or there are other reasons like better physical reliability if components are not all off on one side.
Otherwise, it might be possible to pack them bit closer, where having a long dimension that is 3-4x the length of the memory may be useful when a GPU comes out with higher bandwidth needs.

Could we be talking about the PHYs for HBM being something like 20mm²? Saving 40mm². Making Fiji the equivalent of a 430mm² conventional chip? Smaller than Hawaii.
Hynix gave a picture of the base die's ballout. I can't think of a reason for why the GPU's side would be massively different, so I think it may be between 24-30mm2 for all 4 interfaces. This is rough guesstimate, as I haven't tried to pixel count to any level of accuracy. The slides say roughly 6.0x3.3mm, and the PHY is a perhaps a little over 1/3 of that area.
http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-3-Technology-epub/HC26.11.310-HBM-Bandwidth-Kim-Hynix-Hot Chips HBM 2014 v7.pdf (page 19)
 
So, I have to wonder if AMD is building Fiji with low double precision capability. Perhaps AMD will go for 1/16th DP rate to squeeze in all those CUs.

Unfortunately I suspect it will be 1/16.
AMD sets there fp64 performance to be "much more than Nvidia". I don't think they have a particular target value or a commitment to pass on advancements outside of their Firepro range.
If the R9 390X has a fp32 rate of ~ 8192Gflops (http://www.techpowerup.com/gpudb/2633/radeon-r9-390x.html) then 1/16th gives an fp64 rate of 512Gflops. Which is still much more than Nvidia (barring the one titan).
Unless something changes re Nvidia or Intel, going forward, I suspect fp64 Gflops will end up being capped ~500-700Gflops, no matter how fast fp32 advances.
Personally I'd prefer a 1Gflop cap
 
Unfortunately I suspect it will be 1/16.
AMD sets there fp64 performance to be "much more than Nvidia". I don't think they have a particular target value or a commitment to pass on advancements outside of their Firepro range.
You're assuming that AMD knew during the design of Fiji that gm200 would have reduced FP64 performance. I'll be very surprised if Fiji doesn't support 1/4 or even 1/2 in a professional version.
 
Anyway, 4GB, today, for an enthusiast card, they might as well not fucking bother at all. It's not enough for some games right now, much less what's coming in the future.

Titanx cant sustain 60fps in witcher 3 at 1920x1080! a new game using 12gb ram.
costing 1200euro seems silly at best a new card with so much ram cant do better with new games.
it sits with 9gb not doing anything when you game so I rather have a card that is faster and allows me to utlize all the ram.
4k gaming is years away for single card set up still.
 
This slide is still relevant. It's from Synapse design, they do some parts of the physical design for AMD gpus. AMD very likely has some GPU that's >500mm^2, and at this point I think it's a good bet that that one is Fiji.
The picture could be showing a GPU with HBM2 modules: 8GB in 2 modules with 512GB/s bandwidth...

That's the only other way I can think of to explain this picture, if it's a picture of an AMD GPU with HBM modules and the GPU is >500mm² and there's a single interposer.

The alternative is that we're talking about a GPU being mounted upon multiple independent mini interposers. e.g. 2 interposers that run the length of the long side of the GPU, yet merely wide enough to support all of the power and interconnect duties for both the GPU and the HBM modules (microbumps/TSVs should be dense enough to support the two or three hundred amps of current required by the GPU). In this scenario we'd have a GPU that is mostly mounted on some kind of underfill, with two narrow strips of interposer along the long edges, to connect to the HBM modules.

In that design, the interposer dimension limits (31 x 25mm) are relaxed. Each interposer might be 31 x 15mm. You could then get a GPU in the region of 550mm² or even larger, with HBM modules along two sides of the GPU.
 
That could be the bottom edge. Not knowing what a GPU interposer should look like at its lower corner, I'm not sure how to interpret the patterning, or that particular shade of beige at the bottom margin of the picture.
Agreed. But, knowing the maximum dimensions of an interposer are either 31mm or 25mm, there really isn't much choice about that part of the picture...

EDIT: Although with 3 HBMs along that edge we'd be talking about 7 + 5 + 7 + 5 + 7mm = 31mm. 6 chips in total and 768GB/s? Or, where's the fourth HBM chip?

One question I have, looking at the Xilinx pdf, is what requires 5mm or so of spacing. The FPGA's various components don't need that much space, nor did Kalahari's memory stand-ins.
Yes, I was surprised by that spacing too.

Hynix gave a picture of the base die's ballout. I can't think of a reason for why the GPU's side would be massively different, so I think it may be between 24-30mm2 for all 4 interfaces. This is rough guesstimate, as I haven't tried to pixel count to any level of accuracy. The slides say roughly 6.0x3.3mm, and the PHY is a perhaps a little over 1/3 of that area.
http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-11-day1-epub/HC26.11-3-Technology-epub/HC26.11.310-HBM-Bandwidth-Kim-Hynix-Hot Chips HBM 2014 v7.pdf (page 19)
I assume that PHY in the GPU is larger because it's solely driving the address and command interfaces and other bits and bobs, whereas the HBM base die only has to drive data back to the GPU.

But, I can't work out the labelling on that picture. Why are there channels numbered 0, 1, 4 and 5? Why are there 5 areas for each channel? I think the numbering is a copy-paste error.

Also, PHY has to include HBM power. At the very least I would guess that the four thin strips in the centre of the "channel 0,1,4 and 5" area are the power, with each of those 4 strips being power for a single memory die in the stack.

So it seems to me that the PHY area estimate needs to be lowered a smidgen for the GPU, since power is more widely dispersed across the whole surface of GPUs and wouldn't need to be so acutely localised. (Though one could argue that the difference is so tiny, e.g. 0.1mm², that it's pointless to even raise this.)

So, erm, back to square 1.
 
Last edited:
That's the only other way I can think of to explain this picture, if it's a picture of an AMD GPU with HBM modules and the GPU is >500mm² and there's a single interposer.

I guess you are talking about the 26x32mm interposer?
I would imagine a GPU of 21x25mm would be the absolute largest they could do, depending on the spacing requirements and based off of AMD's 5x7mm for a HBM IC.
 
Last edited:
Unfortunately I suspect it will be 1/16.
AMD sets there fp64 performance to be "much more than Nvidia". I don't think they have a particular target value or a commitment to pass on advancements outside of their Firepro range.
When I wrote that I assumed that AMD has taken the decision not to make Fiji a FirePro compute-monster chip, because it would need to support 16GB of memory.

Of course, there's a theory that simply by attaching HBM2 modules to Fiji, you get a compute-monster chip with 16GB of memory.

Instead I think AMD looks at this as a test chip for HBM, which is on 28nm (easy to test on). A bit like Tonga seems to be a test for colour delta compression (the chip is absurdly large for its performance, terribly unbalanced in its fundamentals).

I don't know if AMD can get HBM2 and the next process node to synchronise, say in spring or summer 2016. I would expect it to though. And for there to be a compute-monster chip with 16GB of memory to be derived from that, with >10 TFLOPS single precision performance, therefore ~ 5 TFLOPS double-precision for the compute monster.
 
I guess you are talking about the 26x32mm2 interposer?
I would imagine a GPU of 21x25mm2 would be the absolute largest they could do, depending on the spacing requirements and based off of AMD's 5x7mm for a HBM IC.
I'm talking about a 31 x 25 interposer, with 2 HBM modules solely on one side. Which I think allows for a 25 x 24mm GPU.
 
I just noticed a new picture when checking AMD's HBM page : http://www.amd.com/en-us/innovations/software-technologies/hbm

6315_ASIC_HBM_polys-partial.jpg


Nice piece to speculate about so I guess it was worth sharing here :)

It would be compatible with a 50x50mm packaging, 32x26mm interposer, 4 HBM modules and some ~20x24mm GPU...
 
I wonder how much are the savings on the GPU I/O perimeter padding with HBM? Probably enough to compensate for the tighter memory integration on the same package.
 
Back
Top