"Power Virus" naming conundrum, Watts the Resistance? *outage*

Please do not use the term „power virus“, that's marketing terminology condemning high load test tools, making people afraid of those.
It would be an unfortunate co-opting of a term that's been around for some time. I've seen references to it going back to the early 2000s, and I think I remember there being off-hand references to them before that in the context of some of the single-core processors some years before that.
Power viruses were described as software written with a mix of operations to get pathologically high power consumption, often relying on internal knowledge of sometimes undocumented behaviors. As a term of the art, it came up in the context of validating CPU hardware or providing a power consumption threshold somewhere below the theoretical max that could be derived based on the raw current, voltage, and physical characterization of a chip family.

At the time, there could be justified worry about using them in an uncontrolled environment, since transistor budgets and designs hadn't bloated to the point where only a fraction of the die could easily violate thermal limits as a matter of routine, and thermal protection measures and DVFS were either in their infancy or not there.

These days, I'd think similar fears being stated by vendors should be a cause for concern, since at least in theory their DVFS and other on-die protection schemes should be capable enough to prevent such problems.
That there have been products in more recent years that did have problems like that was something I've considered as a sign of some kind of deficiency or failure in the design.
 
It would be an unfortunate co-opting of a term that's been around for some time. I've seen references to it going back to the early 2000s, and I think I remember there being off-hand references to them before that in the context of some of the single-core processors some years before that.
Power viruses were described as software written with a mix of operations to get pathologically high power consumption, often relying on internal knowledge of sometimes undocumented behaviors. As a term of the art, it came up in the context of validating CPU hardware or providing a power consumption threshold somewhere below the theoretical max that could be derived based on the raw current, voltage, and physical characterization of a chip family.

At the time, there could be justified worry about using them in an uncontrolled environment, since transistor budgets and designs hadn't bloated to the point where only a fraction of the die could easily violate thermal limits as a matter of routine, and thermal protection measures and DVFS were either in their infancy or not there.

These days, I'd think similar fears being stated by vendors should be a cause for concern, since at least in theory their DVFS and other on-die protection schemes should be capable enough to prevent such problems.
That there have been products in more recent years that did have problems like that was something I've considered as a sign of some kind of deficiency or failure in the design.

Now, IHVs often just use it as a way to dismiss very high load applications as irrelevant to actual power draw. As un unfortunate co-opting of the term, it's an unfortunate means to a legitimate end, which is to explain that applications like Furmark or OCCT, which are designed specifically to draw as much power as possible from a chip, aren't representative of real-world power consumption, i.e., power drawn by useful applications.
 
Now, IHVs often just use it as a way to dismiss very high load applications as irrelevant to actual power draw. As un unfortunate co-opting of the term, it's an unfortunate means to a legitimate end, which is to explain that applications like Furmark or OCCT, which are designed specifically to draw as much power as possible from a chip, aren't representative of real-world power consumption, i.e., power drawn by useful applications.
I haven't seen reviews that use Furmark or OCCT as their choice in perf/W comparisons. What I've been interested in is the tests where Furmark or OCCT cause boards to go past the specs their chips should abide by.
These are products that sample electrical and thermal behavior every hundredth/thousandth a second, or in some cases at the granularity of instruction issue. If a power-management solution cannot hold its consumption to a specification with comparatively forgiving tolerance over seconds or minutes, then there's a deficiency in the effectiveness of the solution or the honesty of the specification.

Power virus also has a historical precedent of being written in-house and using knowledge outside programmers lack to exercise many circuits in combination that would be virtually impossible to do accidentally.
The marketing use of "power virus" now is a weaker meaning since it should be a warning sign that outside programmers were able to arrive at a program capable of breaking power limits, and a mark of deficiency for the set of mostly earlier chips that couldn't protect themselves outside of app detection.
On top of that, the intimation that it's just those apps is not entirely honest. There are games that have levels or scenes that can get the same or close enough power spikes, or boards that shave costs on thermal protection or board elements like VRM monitoring/cooling to where specifications are blown past or functionality is compromised. That's in part because it didn't take hidden insight for something like a fuzzy donut to knock GPUs off their management game, and if it can be done that readily it's not unreasonable for other combinations of demanding or even buggy software (or a flaky driver distribution) to put the physical product in a similar position.
Those shouldn't be acceptable escape cases for modern power management.
 
I haven't seen reviews that use Furmark or OCCT as their choice in perf/W comparisons. What I've been interested in is the tests where Furmark or OCCT cause boards to go past the specs their chips should abide by.
These are products that sample electrical and thermal behavior every hundredth/thousandth a second, or in some cases at the granularity of instruction issue. If a power-management solution cannot hold its consumption to a specification with comparatively forgiving tolerance over seconds or minutes, then there's a deficiency in the effectiveness of the solution or the honesty of the specification.

Power virus also has a historical precedent of being written in-house and using knowledge outside programmers lack to exercise many circuits in combination that would be virtually impossible to do accidentally.
The marketing use of "power virus" now is a weaker meaning since it should be a warning sign that outside programmers were able to arrive at a program capable of breaking power limits, and a mark of deficiency for the set of mostly earlier chips that couldn't protect themselves outside of app detection.
On top of that, the intimation that it's just those apps is not entirely honest. There are games that have levels or scenes that can get the same or close enough power spikes, or boards that shave costs on thermal protection or board elements like VRM monitoring/cooling to where specifications are blown past or functionality is compromised. That's in part because it didn't take hidden insight for something like a fuzzy donut to knock GPUs off their management game, and if it can be done that readily it's not unreasonable for other combinations of demanding or even buggy software (or a flaky driver distribution) to put the physical product in a similar position.
Those shouldn't be acceptable escape cases for modern power management.
Is there aversion to just calling them stress tests?
 
Is there aversion to just calling them stress tests?
At least some of the original usage of "power virus" referenced internal stress test programs for CPU lines. The most explicit use of power virus came up in discussion of the power consumption of the Pentium 4, where such programs were used to justify its stated TDP versus the raw current and voltage specs for the chip.
Whether the internal tools for prior generations were called power viruses, or if this was a retroactive name for them is unclear to me and it's somewhat difficult to scrounge up the details for something that came up at least as far back as 2001.

In some aspects "power virus" in that instance was a PR move to justify how chips had scaled in transistor count and performance to the point that they could more conceivably blow past their limits, and at the time there were fewer safeguards.
CPU manufacturers have gotten progressively more creative with the the bounds of their specifications with various forms of turbo or using thermal margin to boost higher, though typically this coincided with an increase in the on-die management since the enterprise space was less forgiving of something blowing past specifications that could influence cost and reliability.

So this weaker stance from the early 2000s has itself been weakened further, by lowering the bar of what can be permitted as an excusable failure to manage DVFS to third-party software. What I find more unfortunate is that this seems to have been done to apply a sort of guilt by association to any software that might spike power consumption, not just Furmark or OCCT. There have been games that pushed GPUs past their supposed spec, or instances of buggy software or drivers that have lead to stability problems or damage to cards accidentally. That's not a "power virus" in my opinion, but a product that cannot manage itself in the face of the software base that exists.
 
I haven't seen reviews that use Furmark or OCCT as their choice in perf/W comparisons. What I've been interested in is the tests where Furmark or OCCT cause boards to go past the specs their chips should abide by.
These are products that sample electrical and thermal behavior every hundredth/thousandth a second, or in some cases at the granularity of instruction issue. If a power-management solution cannot hold its consumption to a specification with comparatively forgiving tolerance over seconds or minutes, then there's a deficiency in the effectiveness of the solution or the honesty of the specification.

Power virus also has a historical precedent of being written in-house and using knowledge outside programmers lack to exercise many circuits in combination that would be virtually impossible to do accidentally.
The marketing use of "power virus" now is a weaker meaning since it should be a warning sign that outside programmers were able to arrive at a program capable of breaking power limits, and a mark of deficiency for the set of mostly earlier chips that couldn't protect themselves outside of app detection.
On top of that, the intimation that it's just those apps is not entirely honest. There are games that have levels or scenes that can get the same or close enough power spikes, or boards that shave costs on thermal protection or board elements like VRM monitoring/cooling to where specifications are blown past or functionality is compromised. That's in part because it didn't take hidden insight for something like a fuzzy donut to knock GPUs off their management game, and if it can be done that readily it's not unreasonable for other combinations of demanding or even buggy software (or a flaky driver distribution) to put the physical product in a similar position.
Those shouldn't be acceptable escape cases for modern power management.

Agreed on all points. It's just that there have been reviews claiming that some particular product had a particular power draw because that was what they'd measured on Furmark or OCCT. Thankfully, this isn't something I've seen recently, but it's happened. Or sometimes they would just measure power consumption with those applications without going through the trouble of pointing out what you've just detailed, so readers would misinterpret these results as somehow being representative of real-world gaming scenarios. I suspect this still happens on a regular basis, though hopefully not from the best reviewers.

And while I agree that some other real-world applications might conceivably approach this level of power consumption for some scenes and settings, or that some games on some boards can sometimes spike up (almost) as high, you're just not going to see the same energy consumption over an hour of gaming as you would over an hour of fuzzy donutting, or probably not even over a minute of gaming/donutting.

So in a nutshell, Furmark/OCCT are very valid, useful tools, but they are sometimes misused, or used without properly explaining their purpose, such that the results they provide can be misconstrued by readers unfamiliar with their true purpose.

[…]
CPU manufacturers have gotten progressively more creative with the the bounds of their specifications with various forms of turbo or using thermal margin to boost higher, though typically this coincided with an increase in the on-die management since the enterprise space was less forgiving of something blowing past specifications that could influence cost and reliability.
[…]

And also less forgiving of something blowing up, which, frankly, was sometimes a real concern. :p

So this weaker stance from the early 2000s has itself been weakened further, by lowering the bar of what can be permitted as an excusable failure to manage DVFS to third-party software. What I find more unfortunate is that this seems to have been done to apply a sort of guilt by association to any software that might spike power consumption, not just Furmark or OCCT. There have been games that pushed GPUs past their supposed spec, or instances of buggy software or drivers that have lead to stability problems or damage to cards accidentally. That's not a "power virus" in my opinion, but a product that cannot manage itself in the face of the software base that exists.

I once worked on a 3D application that was getting some very negative feedback from some users, especially MacBook users, apparently, because the opening menu would turn their laptops into space heaters, apparently causing some freezes due to overheating. It turned out that the graphics engine was just outputting something like 500FPS, because there was almost nothing to render, and somehow that sent the graphics card into a very uncomfortable thermal situation.

So I just enabled V-sync and that saved the day (and the planet*) but it seemed to me that the machines should have been able to handle that better.


*OK, maybe not, but at least I tried.
 
Last edited:
Agreed on all points. It's just that there have been reviews claiming that some particular product had a particular power draw because that was what they'd measured on Furmark or OCCT. Thankfully, this isn't something I've seen recently, but it's happened. Or sometimes they would just measure power consumption with those applications without going through the trouble of pointing out what you've just detailed, so readers would misinterpret these results as somehow being representative of real-world gaming scenarios. I suspect this still happens on a regular basis, though hopefully not from the best reviewers.
Given how modern GPUs will push to the limits of their power budget if they can, it's also likely that if any new reviewer tried such a flawed comparison now that the cards would not show a big difference between cards with similar board power ratings.
Since stress tests like that tend to create more workload for the silicon to churn through the more effective it is, what they'd be comparing in those cases would be unclear.
At that time, what I did find disappointing was that some cards needed driver updates to bring their power load back into spec. That indicated to me that the amount of hardware and logic in GPUs had outpaced their ability to manage themselves.

And while I agree that some other real-world applications might conceivably approach this level of power consumption for some scenes and settings, or that some games on some boards can sometimes spike up (almost) as high, you're just not going to see the same energy consumption over an hour of gaming as you would over an hour of fuzzy donutting, or probably not even over a minute of gaming/donutting.
Perhaps not on a sustained basis, though some can last for a while. I won't penalize a card that runs right at its specified limit for an extended period, and I accept that these specifications are not iron-clad laws against ever spiking above.
For modern cards, the problematic examples seem to be related to cooling solutions that run right at the edge of being inadequate, or problems where spikes can either trip the thermal fail-safe of the VRMs or in rare instances damage them.


I once worked on a 3D application that was getting some very negative feedback from some users, especially MacBook users, apparently, because the opening menu would turn their laptops into space heaters, apparently causing some freezes due to overheating. It turned out that the graphics engine was just outputting something like 500FPS, because there was almost nothing to render, and somehow that sent the graphics card into a very uncomfortable thermal situation.

So I just enabled V-sync and that saved the day (and the planet*) but it seemed to me that the machines should have been able to handle that better.


*OK, maybe not, but at least I tried.

That reminds me that Starcraft 2's menu did this as well, with some reports of overheating cards and claims of damage I did not see verified.
It's that sort of bug that makes me not like vendors brushing off demanding software as being "power virus" loads, because that leads to confusion between software that people may purposefully run to stress the hardware and buggy or unpredictable software loads that present risks that consumers are not equipped to analyze.
Nevermind that vendors can hand-wave away certain scenarios as being unlikely now, but then they don't update their appraisal when a bigger GPU or stronger CPU comes along to add more active silicon or remove bottlenecks to other programs becoming more demanding--or when a new API comes along and massively reduces CPU bottlenecks that might have kept some games in check.


edit:
For the purposes of documenting my earlier claims about the usage of the term power virus:

2004 mention of power virus loads with regards to Itanium and others:
http://computerbanter.com/archive/index.php?t-44656.html

2001 discussion that serves as a second-hand reference to a likely Intel tech marketing statement about the P4 and a power virus load used to characterize its TDP:
https://www.realworldtech.com/forum/?threadid=4067&curpostid=4081

A more recent paper ~2010 that includes a list of programs that were considered power viruses, including a set that target more ancient Pentium or K6 cores from the 1990s:
https://lca.ece.utexas.edu/pubs/ganesan_pact10.pdf
 
Last edited:
Yes, by now, the only thing that stress tests actually test is the hardware's ability to manage clocks and voltages in a way that guarantees everything remains within specifications. I believe that's the case for all modern boards, but I'm not 100% sure of that.

They do also provide a lower bound for the maximum power than can be drawn at a certain clock speed, which is interesting at least from an academic point of view.
 
Back
Top