On another note: Is there any indication of how high HBM gen2 temperatures are allowed to be without the chips throttling and/or suffering from severe degradation of signal quality?
AMD's proposals for more tightly integrated HBM have a stated desire to stay at 85C or below. Above 85C, the DRAM must begin to increase its refresh rate to compensate for the capacitors in the arrays leaking charge more readily. Some of AMD's TOP-PIM research proposed a curve of doubling the refresh rate for every 10C above the threshold.
It would start to impact the availability of the DRAM rather significantly, and harm perf/W.
The more recent spec for HBM that includes what is labelled HBM2 leaves a fair amount up to the device vendor, although a temperature-compensating refresh mode is present. I think HBM originally couldn't readily measure past 90-95C, but it seems like the latest thermal readouts can max out at ~125C, although that sounds like it's in the realm of shutting down due to the GPU's not being that tolerant and because HBM2 has an emergency shut-down sensor that would take things down at some point.
Anybody noticed the absence of any form of comments from
@sebbbi on the matter? Usually he has at least some preliminary thoughts. I mean he has the Vega FE now, he is the better equipped guy to make at least some educated guesses? Are his hands tied with an NDA? Even Rys had had to make some comments and he is working for RTG!
Rys gave him the card, and as the legend goes it was done over a beer. He said further disclosures would be up to Rys.
I'm pretty sure agreements over a beer rank somewhere in the hierarchy of oath-swearing methods used in myth.
Sure, it could be broken, but "an hour of wolves, and shattered shields", etc.
- HBM2 temps are the real limiter for FE when stock... Once they hit 90C, the card throttles. And they hit 90C pretty easily. Have to keep GPU temps below 80 to keep it from running away... I hope there is some voltage control or power management that can deal with this in the future. Doesn't look like GN bothered to check how it's doing with their modded card.
The HBM2 is trying to dissipate heat when it is right next to a 220W+ source, and its main thermal contact is the copper base that is being heated by said GPU perhaps a mm away.
It would be natural even without it's own power dissipation that it would rise to the ambient temperature, which the GPU would under load be raising to 80-90. There may be a limit to what it can do, since the big controlling factor is how heavily the GPU is using memory.
We could possibly consider this an additional secondary weakness of AMD's choice to use HBM, particularly with GPUs that are so much less efficient.
8-hi stacks may make things warmer for some of the layers versus what a 4-hi could do, but I've wondered before if there were something that could be done to give the stacks a contact with less direct influence from the GPU without making the hop to a closed loop cooler.