AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

3dilettante · Mar 16, 2015

Gipsel said:
As the separate stacks would sit in very close proximity on the same logic die, I wouldn't expect much of a timing difference.

So this is standard stacks with standard base layers, that are themselves stacked on a die, that is set on the interposer rather than some kind of custom larger base die.

And the signals from the GPU have to run through the PHYs on the base die anyway. Routing the data and address lines to the TSV contacts of the second stack just a few millimeters (the size of a 2 GBit die) away, is probably costing not much more time than the signals would travel inside a larger 4 GBit die if you adress another bank there.

It is physically a different chip in a standard that does not promise that independent channels are necessarily in sync. The link die could impose some additional synchronization between stacks that operate as if they are alone, such as a possible corner case with refresh timings shifting between halves of the same channel.
The standard shouldn't care, but the memory controller would stand to benefit from being aware of possible differences in things like bank activation limits.

Jawed · Mar 16, 2015

Perhaps "dual link interposing" means that there's a small interposer between the GPU interposer and a pair of HBM stacks, which is only present for 8GB HBM. With a 4GB HBM configuration, the secondary interposer is not present and each HBM stack interfaces directly to the GPU interposer.

The secondary interposer is dumb, but each base die in the pair of stacks is now connected to its mate and there's a protocol for these two dies to share the 1024-bit bus back to the GPU.

Gipsel · Mar 16, 2015

willardjuice said:
But what's stopping GCN 1.1/1.2 from being 12_0?

~~Lack of proper conservative rasterization support?~~
Oops, that's 12_1 already.

3dilettante said:
So this is standard stacks with standard base layers, that are themselves stacked on a die, that is set on the interposer rather than some kind of custom larger base die.

No, I meant 4 memory dies stacked on a shared custom base die (stacked on the Si interposer). You basically distribute half of the banks of a channel to a different die. The clocking would be of course synchronized.
If that is not good enough, it should be possible with low effort on the base die to group the two channels (with 8 banks each) from each 2 GBit die to a virtual 128bit channel with 16 banks. In that case you stay on the same die so your timing concerns should mostly vanish. Channel 0-3 as seen from the GPU would be on one stack, channel 4 to 7 on the other one. It would basically mimic an 8Hi stack (or two 4channel 4 Hi stacks) just that one implements it as two 4Hi stacks next to each other with a custom base die.

An active interposer handling this with two standard stacks (including a standard base die) would probably not be cost efficient considering the size will probably sit right at the reticle limit (26x32mm). Same goes for an additional layer underneath two standard stacks, albeit this remains a possibility. Another idea would be to daisy chain two HBM stacks on a single interface (with a similar splitting of the channels between the stacks, the first base die routes requests going to the upper half of the channels to the second base die, connections between the two stacks run through the interposer). Bould this would also necessitate a custom base die and would complicate the timing issues.

Kaotik · Mar 16, 2015

It needs to be done in a fashion where GPU only sees 4x1024bit buses. IMO most likely theory is that 2 stacks of DRAM share same logic under them, instead of 8x1-stack it's 4x2-stack

lanek · Mar 16, 2015

Whatever the solution of dual link is ( we should discover it soon ), they have not developp it in 1 month as some article want to suggest it.

Alexko · Mar 16, 2015

Kaotik said:
Could it mean just that Carrizo features full DX12 too?

Probably. HardWare.fr hinted at this a while back.

3dilettante · Mar 16, 2015

Gipsel said:
No, I meant 4 memory dies stacked on a shared custom base die (stacked on the Si interposer). You basically distribute half of the banks of a channel to a different die. The clocking would be of course synchronized.

Just to make sure I'm interpreting this correctly, this would be 2 4-hi stacks without a base layer stacked onto a custom shared base layer. The layers/dies/chips/stacks terms are flying around thick in this discussion.

I'm curious with the rise in channels and the potential for a compromised mounting on the interposer whether a salvage SKU could exist where there is a ~~non-integral amount of memory~~ (edit: non-multiple of 8 channels), rather than tossing a large amount of finished silicon at the end of the process.
We already have partially disabled GPUs without a full complement of memory channels. The oddity here would be that the chips would be physically there.

AlNom · Mar 16, 2015

I think I need an uttargram representation. I'm not following the stacks/layers/channels either.

Jawed · Mar 16, 2015

A stack might be better called a module, since the base die is integral to the memory dies above it. Each of the 4 memory dies in a stack has two channels, each of which has a 128-bit data bus, so the combination of 4 dies, 2 channels and 128-bits is 1024 bits.

Gipsel · Mar 16, 2015

3dilettante said:
Just to make sure I'm interpreting this correctly, this would be 2 4-hi stacks without a base layer stacked onto a custom shared base layer.

Yes.

3dilettante said:
The layers/dies/chips/stacks terms are flying around thick in this discussion.

That's why I tried to restrict myself to a somewhat consistent use of just dies and stacks

.

3dilettante said:
I'm curious with the rise in channels and the potential for a compromised mounting on the interposer whether a salvage SKU could exist where there is a ~~non-integral amount of memory~~ (edit: non-multiple of 8 channels), rather than tossing a large amount of finished silicon at the end of the process.
We already have partially disabled GPUs without a full complement of memory channels. The oddity here would be that the chips would be physically there.

In principle, HBM allows for less than 8 channels in a stack. If that would make sense for a salvage model compared to using just 3 stacks, I don't know. But maybe someone will offer HBM stacks with let's say 6 channels and one die less at a lower price point than a fully featured 8 channel stack in the future, who knows?

3dilettante · Mar 16, 2015

lanek said:
Whatever the solution of dual link is ( we should discover it soon ), they have not developp it in 1 month as some article want to suggest it.

Given the lead times in manufacturing, even if development time was near-zero, manufacturing and integration into a product probably couldn't be done on such short notice either.
This seems like it could have been initiated earlier, and might represent a preliminary version of the expanded functionality that is part of HBM2 proper.

Jawed · Mar 16, 2015

Why would you assemble a module from one or more known-bad memory dies? Every memory die is tested and only the good ones are picked for assembly into a stack. The idea that you find out after you've assembled the stack that it doesn't work is absurd.

3dilettante · Mar 16, 2015

Jawed said:
Why would you assemble a module from one or more known-bad memory dies? Every memory die is tested and only the good ones are picked for assembly into a stack. The idea that you find out after you've assembled the stack that it doesn't work is absurd.

My scenario is a compromised mounting on the interposer. The stacks should be provided on a known-good basis (unless there's a value RAM option), but the mounting process itself is an additional step with some 0.xxxx% error rate.
At that point, why toss the interposer and anything else already mounted if it turns out one channel has a few bad bumps?
The same might happen if one memory channel out of 32 is dodgy on the GPU.

3dilettante · Mar 16, 2015

Gipsel said:
But maybe someone will offer HBM stacks with let's say 6 channels and one die less at a lower price point than a fully featured 8 channel stack in the future, who knows?

I was thinking more about salvage for defects in assembly, but that would be one way to salvage DRAM stacks, assuming the incremental cost for the sorting is outweighed by the revenue brought in.

liquidboy · Mar 16, 2015

The design of the HBM could be based one the PIM work they (AMD Research) did in 2014.. We discussed this a while ago, just can't remember/find which thread it was in

"Throughput-Oriented Programmable Processing in Memory" - AMD Research 2014

silent_guy · Mar 16, 2015

Jawed said:
Why would you assemble a module from one or more known-bad memory dies? Every memory die is tested and only the good ones are picked for assembly into a stack. The idea that you find out after you've assembled the stack that it doesn't work is absurd.

There is such a thing as packaging yield. (Usually very low single digits.) I don't know if there's an official term, but let's call it micro-bonding yield? I expect the latter to be worse than the former.

3dilettante · Mar 16, 2015

liquidboy said:
The design of the HBM could be based one the PIM work they (AMD Research) did in 2014.. We discussed this a while ago, just can't remember/find which thread it was in

The stacked memory may have come first. The paper references die stacking, HMC, and HBM as items in its recent past. HBM has slides going back as 2010.

Silent_Buddha · Mar 16, 2015

Kaotik said:
Isn't the current information suggesting that all GCN's support Tier 3 Resource Binding (17% of Steam DX12 hardware* supporting it), they just miss some of the other DX12 stuff which Fiji would have

*DX12 hardware = hardware that supports DX12, even if it's limited to 11.x feature levels

IIRC, there's multiple items that are classified from Tier 1-3. Resource Binding is only one of them. While Hawaii (should be Hawai'i

) has that for resource binding it likely lacks it for some others.

Regards,
SB

3dilettante · Mar 16, 2015

3dilettante said:
The stacked memory may have come first. The paper references die stacking, HMC, and HBM as items in its recent past. HBM has slides going back as 2010.

Correction to the above, that would be 2011 for the slides, 2010 is the purported start date.

LordEC911 · Mar 17, 2015

liquidboy said:
We discussed this a while ago, just can't remember/find which thread it was in

Page 6 of this thread.

AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

3dilettante

Jawed

Gipsel

Kaotik

Drunk Member

lanek

Alexko

3dilettante

AlNom

Moderator

Jawed

Gipsel

3dilettante

Jawed

3dilettante

3dilettante

liquidboy

silent_guy

3dilettante

Silent_Buddha

3dilettante

LordEC911

Similar threads