Jawed
Legend
Nano demonstrates that Fiji is substantially more hastened by clocks (hence voltage) than leakage power (though we can't isolate the leakage power component of the differences seen).The Fury example showed possibly half a watt or more per degree C above 40C, on a Fury X with a water cooler and a temp ceiling that had dropped 30C below that of Hawaii. Extrapolating from 18W from a 25C swing in a 40C to 65C test to the 290X's 95C seems like an appreciable amount of power budget to play with, although it may be the case that HBM will constrain that upper range anyway.
Additionally, 290X using Fiji "design rules" would suffer substantially less if for no other reason than that it is smaller (74% of the area, prolly substantially less than that being the active high-leakage circuitry due to the difference in memory PHY area usage).
Finally, ~300W GPUs in an era where boost clock margins over base of as much as 20% would seem to imply that liquid cooling is here to stay.
If anything, Nano is proof that chips should be 600mm² before they are 10%+ higher in clocks.
Yes, sadly it's not possible to isolate that from Anandtech's data, since there was no idle power consumption measurement at 65C versus 40C. And Anandtech didn't repeat the test for Nano.Without more full power gating, the "off" transistors will have temperature and voltage components worsened by the most in-demand logic.
This article is pretty interesting:Besides that, FinFET has much better control at lower voltages, which benefits static and active. The foundries are comparing 14/16nm against 28nm, with the largest benefit being the roughly 2X improvement in power efficiency, since 20nm planar got most of the density improvement but very little power scaling.
http://www.techdesignforums.com/practice/technique/arm-tsmc-16nm-finfet-big-little-design/
The finFET process gives you a very good performance gain compared to planar. However, it imposes a number of challenges. One of the key challenges will be dynamic power, which does not scale as well as the leakage power. This created a lot of implementation challenges for the team.
Also, allowing lower-Vt cells for for synthesizing dynamic hotspots leads to great reductions in dynamic power for a small increase in leakage power
The increasing power density of 16FF designs relative to the 40nm and 28nm generations cause problems for die utilization in two ways. One is simply the increased competition for metal from the power rails and signal tracks [...] "The effective power-metal area that can be packed into the design is much less than for 28nm" [...] "You need to find the right metal length and spacing in order to not waste routing tracks."
With the caveat:
[...] expected to double performance from 1GHz on 40nm to 2GHz or more on 16nm
So the article is from the perspective of substantially higher clocks (let's say 50% higher 28nm -> 16nm?), which is not what GPUs are trying to do (though clearly NVidia pivots on a much higher base clock than AMD on the same node currently).
I think a neat example is that NVidia has much longer ALU pipelines than AMD's 4-cycles, though I don't know what that number is in Maxwell 2.Maxwell seems to be more responsive to voltage tweaks or just overclocking in general. The simplified scheduling probably means the pipeline has lower complexity in multiple stages, but I am not sure which element of the physical design could be different.
My suspicion is that NVidia uses longer pipelines throughout the chip for all functional areas, which implies lower interconnectedness per cycle of unit processing. Which I'm guessing allows NVidia to use either a lower-power routing library or to bias their design more towards lower power consumption cells.The transistor portion of the hybrid nodes is going to be significantly better than 28nm, whereas the wires are less so.
The GPU designs at the hybrid nodes should adapt to that reality, although it would be an interesting exercise to know how well the current architectures would fare if transplanted as-is. Maxwell seems generally more comfortable at 28nm than GCN, and sporadic tests by some sites seem to show less variation in power draw based on temp for some reason.
I dunno, after all these years we still know practically nothing about the micro-architectural power-v-density and power-v-performance trade offs in GPUs. Almost everything we know about the progression through nodes and node technologies comes from CPU designs where the count of ALUs is substantially unchanged over the last decade (compared with GPUs) and idle power consumption has become more important, since custom blocks have been deployed for high-performance features (such as video decompression).