Unique to the 2.5D packaging is who eats the cost for logistics and faults in the components before they get to board partners.Apart from the re-selling factor, how different is this from nvidia using GDDR5X which only Samsung produces?
There's only one provider of HBM2 for AMD cards so far and it's Hynix. There's not a lot of possible confusion from that area..
Perhaps bundling memory in the old days introduced some additional logistics and perhaps some level of warranty for the rare bad module, which is good for at least a small amount of additional cost besides just providing a packaged GPU.
If a board partner has less than 100% success rate of attaching said items to a PCB, it's not AMD's problem.
The Fury package has a significantly more extensive logistics train due to how many separate vendors are responsible for each component, shipping between them and AMD, and the less than 100% success rate.
Each piece either reduces the fault rate (old process interposer) or places the onus on a partner (known-good stack), but the integration steps themselves are less than 100%. As the manufacturer driving early implementation, AMD may have eaten a decent amount of the logistics for coordinating and shipping between the widely separated manufacturing and integration points. Well-established things like putting a package together has very high yields, and back then the overall interposer assembly process only promised something like >95%, I think.
Going by the Hawaii vs Fiji estimate, AMD would be on the hook for $183 for each faulty module at a significantly higher rate than the failure for plain packaging. It would preferably be low in absolute terms, but regular packaging yields are so high that any blip at each step is going to be measurably worse.
One change with HBM2 was provisioning extra signals for redundancy, which seems consistent with integration losses being measurable enough.
The choice of going with 2 stacks, besides possibly making the interposer requirements lower and smaller, may also be a measure for reducing integration losses. Some phenomena like warping are influenced by interposer size and number of components being integrated.
The lead time for all of this on top of the fab+packaging route would presumably be longer as well, and the stakes are higher when it comes to binning and product mix since faults or mispredictions for any individual element get bound together in a single assembly.
The inflexibility of this seems to explain why AMD may be aiming to double-down on scalability with Navi, and deconstructing things further with chiplets and active interposers in its HPC proposal.