Agreed on all points. It's just that there have been reviews claiming that some particular product had a particular power draw because that was what they'd measured on Furmark or OCCT. Thankfully, this isn't something I've seen recently, but it's happened. Or sometimes they would just measure power consumption with those applications without going through the trouble of pointing out what you've just detailed, so readers would misinterpret these results as somehow being representative of real-world gaming scenarios. I suspect this still happens on a regular basis, though hopefully not from the best reviewers.
Given how modern GPUs will push to the limits of their power budget if they can, it's also likely that if any new reviewer tried such a flawed comparison now that the cards would not show a big difference between cards with similar board power ratings.
Since stress tests like that tend to create more workload for the silicon to churn through the more effective it is, what they'd be comparing in those cases would be unclear.
At that time, what I did find disappointing was that some cards needed driver updates to bring their power load back into spec. That indicated to me that the amount of hardware and logic in GPUs had outpaced their ability to manage themselves.
And while I agree that some other real-world applications might conceivably approach this level of power consumption for some scenes and settings, or that some games on some boards can sometimes spike up (almost) as high, you're just not going to see the same energy consumption over an hour of gaming as you would over an hour of fuzzy donutting, or probably not even over a minute of gaming/donutting.
Perhaps not on a sustained basis, though some can last for a while. I won't penalize a card that runs right at its specified limit for an extended period, and I accept that these specifications are not iron-clad laws against ever spiking above.
For modern cards, the problematic examples seem to be related to cooling solutions that run right at the edge of being inadequate, or problems where spikes can either trip the thermal fail-safe of the VRMs or in rare instances damage them.
I once worked on a 3D application that was getting some very negative feedback from some users, especially MacBook users, apparently, because the opening menu would turn their laptops into space heaters, apparently causing some freezes due to overheating. It turned out that the graphics engine was just outputting something like 500FPS, because there was almost nothing to render, and somehow that sent the graphics card into a very uncomfortable thermal situation.
So I just enabled V-sync and that saved the day (and the planet*) but it seemed to me that the machines should have been able to handle that better.
*OK, maybe not, but at least I tried.
That reminds me that Starcraft 2's menu did this as well, with some reports of overheating cards and claims of damage I did not see verified.
It's that sort of bug that makes me not like vendors brushing off demanding software as being "power virus" loads, because that leads to confusion between software that people may purposefully run to stress the hardware and buggy or unpredictable software loads that present risks that consumers are not equipped to analyze.
Nevermind that vendors can hand-wave away certain scenarios as being unlikely now, but then they don't update their appraisal when a bigger GPU or stronger CPU comes along to add more active silicon or remove bottlenecks to other programs becoming more demanding--or when a new API comes along and massively reduces CPU bottlenecks that might have kept some games in check.
edit:
For the purposes of documenting my earlier claims about the usage of the term power virus:
2004 mention of power virus loads with regards to Itanium and others:
http://computerbanter.com/archive/index.php?t-44656.html
2001 discussion that serves as a second-hand reference to a likely Intel tech marketing statement about the P4 and a power virus load used to characterize its TDP:
https://www.realworldtech.com/forum/?threadid=4067&curpostid=4081
A more recent paper ~2010 that includes a list of programs that were considered power viruses, including a set that target more ancient Pentium or K6 cores from the 1990s:
https://lca.ece.utexas.edu/pubs/ganesan_pact10.pdf