AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

D3D12 is 7-9 months away? Launching a true D3D12 card at the same time seems very likely.

We already know that HBM is not going to allow >4GB RAM in the first iteration.

So neither of these things is inherently surprising.

I'm still dubious that HBM is coming this spring, but Alexko's sequence is the first thing I've seen that makes it seem like a possibility to me. I still think the performance is going to be disappointing if it's only 50% faster while using delta-compression + HBM.


For what i know, this was for HBM1 ( 4Hi stack ) qualification progress in 2014.. SK Hynix should be now able to product HBM2 ( 8GB ) .
 
Wasn't the qualification done in late 2014? It's a little early to hop to the next gen.
HBM Gen2 should double bandwidth per pin, which would make the leaked results even more disappointing. A 4-stack of HBM Gen2 would have as much bandwidth as Hawaii's L2-L1 crossbar.
 
Wasn't the qualification done in late 2014? It's a little early to hop to the next gen.
HBM Gen2 should double bandwidth per pin, which would make the leaked results even more disappointing. A 4-stack of HBM Gen2 would have as much bandwidth as Hawaii's L2-L1 crossbar.

Well its maybe my error to name it HBM2... HBM1 qualification process was about a 1GHZ minimum (( if i remember well ) and the 4Hi stack ( this was for validate the process and this validation have take many months / years), this said this was the minimum requirement for the validation, in between SK Hynix and AMD still have continue to develop the technology. 8GB and 1.25Ghz seems quite reasonnable for me. ( I dont say that the 380-390 gpu will have 8GB.. no, i just said that, 8GB HBM today can be producted ( so not limited to 4GB, it dont do a lot difference now, the technology is now validated, the evolution will advance then really really fast ). So if the first GPU's have HBM1 4Hi stack with 4GB, i will not be suprised to see in Q3 same architecture with HBM 8GB for the pro market or gaming.

AMD with their top line of Firepro have put the bar really high on the memory bandwith side ( and on peak computing power ) .. 512bit / 16Gb on a single gpu... they will need to quickly been able to go with HBm2 if they want to make progress this trend with their new gpu's, and the words of Lisa about the agressivity on professionnal market was really clear ..

You are speaking about the crossbar, what imply HBM bandwith capacity, on the cache ?, the crossbar for the next generation of GPU ? ( Its an humble question .. )



this said the flu delirium of Alexko is interessant..
 
Last edited:
It's a simplified marketing diagram that appears to overestimate the die savings and may exaggerate the amount of area that is available on the interposer. The other memory types are memory dies that are mounted on chip packages. The package and ball area are not required for a memory type that is meant to mount directly onto a silicon interposer.
The silicon interposer itself is a large and comparatively simple silicon chip, and it does become less economical with size.

Xilinx has an FPGA product that has slices mounted on a 775mm2 interposer, and those products are not cheap.

Let's say we go with the rumors of a 500-600mm2 GPU.
The following leak gives a 42mm2 area for an HBM stack.
http://www.eteknix.com/sk-hynixs-high-bandwidth-memory-presentation-leaks/

At the upper end, 600 + 4x24 = 768mm2 of GPU and memory area, and the known prototype arrangements leave a decent amount of area that cannot be readily used.
8 stacks would take up the area of a mid-high range GPU all on their own. Interposers by design are meant to be large relative to what is mounted on them, but the rumor mill has been more modest than silicon interposers going an extra 300-400mm2 to handle a giant GPU and a massive slate of memory.
 
At the upper end, 600 + 4x24 = 768mm2 of GPU and memory area, and the known prototype arrangements leave a decent amount of area that cannot be readily used.
How large can a GPU get and still be able to fit HBM on an interposer? We know there's a maximum die size that can be manufactured (~660sqmm people say). To hit that, you probably want a square die, not a rectangular as shown in the example HBM arrangement image posted in this thread, otherwise it would seem to me that one of the dimensions of the die would grow outside the max reticle size.

So with a square GPU, you'd need a fatter interposer to fit the HBM stacks... Can such a large interposer even be made? Don't they use the same silicon etching gear to make them as with integrated circuit chips (albeit older, obsolete equipment, I assume)?
 
I'm trying to find a link to an alleged leak of some of Globalfoundries' test modules and some other presentations on it.
I think the last time I looked at some of the numbers, the largest rumored GPU size might have had problems with fitting the area of more than two stacks.
The highest-end rumored GPU could suffer from the even worse capacity and bandwidth constraints.

Economics in terms of cost per mm being higher and yield seems to be a major limiter of interposer size.
I've seen it pointed out that if the interposer's own process is coarse enough (Amkor talked about a 65nm process with a few metal layers), it is possible to "stitch" together multiple zones. Each zone would be an area compatible with the reticle.
Very large sensors do something like this, although I'm not familiar with that enough to know if the lower bound of that methodology is compatible with the needs of the memory interconnect. Alignment becomes too hard if the geometry gets too fine.

There are also interposer materials besides silicon, like glass, which have different limits and trade-offs. Since AMD and its closest partners haven't talked about that, I haven't really considered other materials.


edit:

Here's something I think is similar to what I looked at back then:
http://www.3dincites.com/2014/11/globalfoundries-3d-ducks-row/

The prototype interposer is ~830 mm2, and people where guessing at taking Hawaii and increasing it by 50% or more. The area starts to cut really close if things are taken that far.
 
Last edited:
The prototype interposer is ~830 mm2, and people where guessing at taking Hawaii and increasing it by 50% or more. The area starts to cut really close if things are taken that far.
The interposer is non-square. It could easily be square, e.g. 32x32mm.
 
Possibly not.
The 26x32 dimension may not be coincidental.
Per the following, that dimension is the largest front-end field reticle. There are steppers with wider fields, but the lines they draw are more coarse.
http://thor.inemi.org/webdownload/2014/Substrate_Pkg_WS_Apr/08_TechSearch.pdf

The Globalfoundries prototype looks to be pushing up to the limit of the foundry's steppers.
Stitching seems to come up for proposed interposers that go beyond this, with two regions of higher density that are bridged by less dense wiring. The economics of that would be affected because of the additional complexity and the per mm2 cost of the interposer.
If there were a way to practically seat a GPU across two sub-interposers, that could give a lot more area to play with. The memory channels wouldn't need to cross out of the region where the finer process is used.
 
Would not thermal expansion be an issue with a split interposer, or would it effectively make no practical difference vs. a single interposer/traditional substrate?
 
The interposer would be split in terms of what geometry of process its joining wires would come from. The silicon itself would be contiguous.
Warping has been cited as a challenge for large interposers, so much of the concern may just be due to the size.
I suppose there would be some physical difference given the larger wires, but it would be relatively balanced on both sides of the divide. There very well could be a physical effect, but I don't know how to characterize it.

There may be more pressing concerns like how that might disrupt the connections for the GPU straddling the divide, or difficulty aligning both regions and the GPU.
 
Well its maybe my error to name it HBM2... HBM1 qualification process was about a 1GHZ minimum (( if i remember well ) and the 4Hi stack ( this was for validate the process and this validation have take many months / years), this said this was the minimum requirement for the validation, in between SK Hynix and AMD still have continue to develop the technology. 8GB and 1.25Ghz seems quite reasonnable for me. ( I dont say that the 380-390 gpu will have 8GB.. no, i just said that, 8GB HBM today can be producted ( so not limited to 4GB, it dont do a lot difference now, the technology is now validated, the evolution will advance then really really fast ). So if the first GPU's have HBM1 4Hi stack with 4GB, i will not be suprised to see in Q3 same architecture with HBM 8GB for the pro market or gaming.

HBM1 isn't 1GHZ, its <=1.2 Gb/s. HBM2 is targeting ~2Gb/s.


It's a simplified marketing diagram that appears to overestimate the die savings and may exaggerate the amount of area that is available on the interposer. The other memory types are memory dies that are mounted on chip packages. The package and ball area are not required for a memory type that is meant to mount directly onto a silicon interposer.
The silicon interposer itself is a large and comparatively simple silicon chip, and it does become less economical with size.

Xilinx has an FPGA product that has slices mounted on a 775mm2 interposer, and those products are not cheap.

Xilinx is using a silicon interposer while it is unlikely that any consumer level part will be using a silicon interposer. The most likely interposer type for consumer deployment is an organic interposer and those have significantly less restrictions on size.


How large can a GPU get and still be able to fit HBM on an interposer? We know there's a maximum die size that can be manufactured (~660sqmm people say). To hit that, you probably want a square die, not a rectangular as shown in the example HBM arrangement image posted in this thread, otherwise it would seem to me that one of the dimensions of the die would grow outside the max reticle size.

So with a square GPU, you'd need a fatter interposer to fit the HBM stacks... Can such a large interposer even be made? Don't they use the same silicon etching gear to make them as with integrated circuit chips (albeit older, obsolete equipment, I assume)?

The often quoted 660mm^2 is a result of the field area of the actual light source path with is generally on the order of 22x33mm. Most silicon interposers are made using older processes like 90nm and 65nm. But for a consumer level product, one would assume that companies would be targeting organic interposers. That's certainly what Intel is doing including combining the organic interposer into the actual package substrate. Intel has already announced an organic substrate that can support both HBM and HMC. And the KNL design is likely going to be using upwards of 4 HMC stacks on an organic substrate along with a likely ~600+ mm^2 logic die.
 
The NVidia Pascal GPU mockup has a quite large square interposer with four HBM chips on the substrate. HBM!=HMC, but for comparison, it does show an example of a large ~600mm^2 square GPU with local RAM interconnect. Visually the HBM dies look a lot larger than the HMC "aspirin" diagram. NVidia calls it a "3D chip on wafer" integration, but I don't know if that implies an organic or silicon substrate.
 
The often quoted 660mm^2 is a result of the field area of the actual light source path with is generally on the order of 22x33mm. Most silicon interposers are made using older processes like 90nm and 65nm. But for a consumer level product, one would assume that companies would be targeting organic interposers. That's certainly what Intel is doing including combining the organic interposer into the actual package substrate. Intel has already announced an organic substrate that can support both HBM and HMC. And the KNL design is likely going to be using upwards of 4 HMC stacks on an organic substrate along with a likely ~600+ mm^2 logic die.
Is this different from EMIB? My understanding of that packaging tech is that it embeds silicon channels into the substrate.
 
Is this different from EMIB? My understanding of that packaging tech is that it embeds silicon channels into the substrate.

The embedded portion can be a variety of different technologies from ceramic to high density organic to silicon. The main advantage is in requiring the high density interconnect only for those signals which require it and also not having off package signals also having to go through an interposer. Overall, it should result in a significant cost reduction and result in reduced latency/equipment utilization.
 
The embedded portion can be a variety of different technologies from ceramic to high density organic to silicon. The main advantage is in requiring the high density interconnect only for those signals which require it and also not having off package signals also having to go through an interposer. Overall, it should result in a significant cost reduction and result in reduced latency/equipment utilization.
I had not read about the possibility of an organic bridge being a possible option.
It is an elegant way to get around the way interposers upend the value of real estate for memory and the primary chip, and it does play to Intel's strength in packaging.

I guess it slipped my mind in this thread due to the thinness of discussion on the part of AMD's most likely GPU manufacturing partners, where I assumed an interposer was closer to their comfort zone despite the cost penalty. I would imagine if things go the way they normally do in terms of catch-up that a competing EMIB-like or straight organic interposer would not be available to AMD for quite some time.
 
AMD Working On Something “Crazy” For GDC: http://wccftech.com/amd-working-crazy-gdc/

Brad Wardell revealed that AMD is working on something “Crazy” for E3 plus a new demo for GDC that will make Starswarm look “primitive”. Wardell is the founder, president and CEO of Stardock a well established game publisher and developer.

Wardell partnered with Oxide Games in 2013 to create Nitrous, a next generation game engine designed for complex real-time strategy games.


We’ll be in the AMD booth, I can’t tell you the specifics but AMD has something crazy they’re working on.

When I look at hyperware or whether something is going to be successful I usually go was the idea in hindsight obvious ? and AMD is working on something that in hindsight will be like duh why didn’t we have this already ?



New DX12 Stardock Demo Coming At GDC
We’re going to be showing something at GDC that will make Starswarm look really primitive, because Starswarm is something we whipped up in like two months.

That should be in the Microsoft booth there at GDC to be able to show just the massive difference.
Said Brad Wardell during an interview with The Inner Circle.














 
Back
Top