Is the difficulty of debugging complex games non-linear?

Games are much more complex than they used to be.

You have various subsystems running on multiple cores. Many of the subsystem are 3rd party and you have very little technical insight in what goes on inside of them. You then have these running on multiple cores which makes debugging much much much harder than in the single core days (ie. you can't single step through a parallel program in any meaningful way).

The games themselves, in particular open-world ones, can't be exhaustively tested because there is an infinite amount of states you can put the game in. Your QA team, with a few dozen people (at best) will find a fair chunk of bugs, but the millions of gamers that will subsequently play the game will find bugs the QA team didn't. - That's just simple statistics.

You do your best to harden each individual subsystem, you add margins (memory, processing time/resources) and you test the shit out of it. If you have time, you do more, if you don't, you do less.

Cheers
 
You have no idea how many toasters fail. You've got one, maybe two. You buy many more games, plus you don't discuss toasters with other people online.
 
I'm discussing toasters with you online ;)

ps: true I dont know how many toasters fail doesnt mean it's unreasonable to expect a fully working one.
 
Last edited:
You have no idea how many toasters fail. You've got one, maybe two. You buy many more games, plus you don't discuss toasters with other people online.
Two of my toasters and two of my microwave ovens have died. The latest toaster died before it was one year old. Our kitchen mixer also died (smoke coming out of it). It was brand new. Two of my Xbox 360s have died (RROD). Just two months ago my amplifier died. It was very expensive (and very good), and only two years old (obviously it died just after the warranty period ended). My bike body also cracked half a year after the body warranty ran out (it had an aluminum body, so welding was not possible). I have spend almost 2000 euros during the last three years to fix various problems in my car. But the worst of them all is that our brand new apartment had 40 issues (bugs) on the first day inspection. The building company fixed 20 of them in the first months, but it took them 1.5 years to fix the remaining 20. Some of our neighbors had to sue them because they have even bigger problems.

And phones... phones are the worst. My iPhone died 1.5 years old (Finnish law mandated 2 years warranty soon after... a little bit too late for me). The Galaxy Note I bought after it had two batteries that almost exploded (battery got very hot and swollen). The store said they have never seen anything like that... but the same happened to my sister's Note 2, my wife's and colleague's S4 and my fried's Note 3. The face sensor of the phone also died (making it impossible to use the keyboard/screen while calling) and the camera lens got filled with dust (so much that all pictures were almost pure grey).

Obviously none of this means that games are allowed to have bugs :)
 
Two of my toasters and two of my microwave ovens have died. The latest toaster died before it was one year old. Our kitchen mixer also died (smoke coming out of it). It was brand new. Two of my Xbox 360s have died (RROD). Just two months ago my amplifier died. It was very expensive (and very good), and only two years old (obviously it died just after the warranty period ended). My bike body also cracked half a year after the body warranty ran out (it had an aluminum body, so welding was not possible).

How did you become cursed?
 
You have no idea how many toasters fail. You've got one, maybe two.

I'm discussing toasters with you online ;)

Two of my toasters and two of my microwave ovens have died. The latest toaster died before it was one year old.

FrakkinToasters.jpg



Semi-NSFW:

 
And who drove that expectation, In a large part publishers did.

Yes, I judge software by the standards i judge any other product (be it a game a gps or a toaster)
If I buy something I expect it to work, Now some people think I'm unreasonable for wanting that because software is err special but that's the viewpoint I hold.
My new LG monitor has light bleed and a dead pixel. LG's dead pixel policy is 7. "They're not bugs, they're an engineering tolerance..." I can add a load of hardware faults across products that Sebbbi already listed. We also get hardware revisions of devices and products that tweak issues with the original. It interesting that the military has been raised as an example of where bugs can't happen, because they do. Soldiers train how to strip and rebuild their weapons in the field for exactly when a bug happens and it jams, for example.

And most importantly, few of those devices and products are anything like as complex as software. It's pretty ridiculous to expect a million lines of multithreaded code to be as perfect as a graphite pencil or a basic heating element in a housing for toasting bread, maybe complicated with an extremely basic ASIC and few lines of timing code. One only need look at product reviews to see there are always failures and faults for those ubiquitous 1* ratings. I can't see how your position is in any logically tenable. It makes an unfair comparison between complex and simple products, and doesn't tally with the fact even simple products fail at both the design (poor phone reception due to antenna placement) and the fabrication stages.
 
My new LG monitor has light bleed and a dead pixel. LG's dead pixel policy is 7. "They're not bugs, they're an engineering tolerance..."

Much of this is because of economic reasons. LG could supply only zero hardware-defect monitors by rejecting panels with dead/bright/flawed pixels and where assembly results in light bleed but the cost would need to be passed on to the consumer. But there is a middle ground; lots of manufacturers offer a zero dead pixel option, Dell call theirs the Premium Panel Guarantee, LG call theirs Zero Pixel.

Defects are things you have to accept if you're not prepared to pay more for higher qualification testing.

It interesting that the military has been raised as an example of where bugs can't happen, because they do. Soldiers train how to strip and rebuild their weapons in the field for exactly when a bug happens and it jams, for example.

Nobody claimed there were no bugs in military software, command & control software stuff used by many NATO countries is horrendous. The specific example I used was missile avionics. There are many software implementations where lives (or deaths) or catastrophic consequences are wholly in the control of software and where every effort must be made to eradicate errors. This is why whenever when there is an anomalous behaviour in something critical, recovery and testing of that so it can be carefully analysed so the fault can be determined, learned from and avoid being repeated.

But a weapon jam is not "a bug", i.e. it's rarely a flaw in the weapon itself. It's usually caused by particulates in the mechanism, or a malformed casing or a dud. No more a "bug" than putting a curved CD/DVD into a player and having it jam. Stripping and cleaning weapons is accepted standard maintenance for a firearm. This isn't a great analogy ;)

But analogous deviations can also be planned for in software. There aren't a huge amount of unique "mission critical" system types in the world but the methodology for development is similar: you use well-tested and qualified hardware, software platforms, compilers and libraries. Then, like critical mechanisms in any application (aircraft, missiles, nuclear reactors etc), you have redundancy as well. There isn't one set of software running the auto-landing system (ILS) on an aircraft, there are several. All doing the same thing using purposely different software approaches using clean room development.

Most of what separates 'mission critical' software from software that you and I use on our computers, is this sort of redundancy (think of the critical logic undergoing constant double blind testing) and the amount of 'high integrity' testing that occurs. Vast amounts of inputs are fed in and compared with expected outputs. Huge libraries of test data accumulate based on real data drawn from whatever real world application the software is intended for. Over and over, months and months of constant testing. Even for relatively minor changes - it's just part of the qualification process. And test data is rarely discarded, only augmented with more test data. What was the input, what was the output, how long did it take? Every data set another barrage of testing to stimulate an unexpected response or anomaly in processing.

Most programmers boggle at the prospect of large codebases with few errors because they never work on projects that require this. If you asked me to sit down and think up a set of protocols for such testing before I worked in applications that required it, I doubt I could have come up with a hundredth of systems. This is because the methodology for such testing for been ongoing, built up and refined for decades.

Just because you're not doing it, doesn't mean others aren't ;)

EDIT: grammar.
 
Last edited by a moderator:
I've bought... at least 5 LCDs/TVs in the past decade and none had any dead pixels on arrival. My old TV developed one "stuck on" pixel after 6 years. It's either luck, or not a prominent problem.

Either way, at least in Germany (and probably other parts of Europe as well) that you're always allowed to return shipped goods within 14 days (Fernabsatzgesetz). If I bought a defective panel, I'd not keep it. Most retailers also give you this option to not have a disadvantage towards online sellers.

But in any case... I've been into blackbox testing for several years now. Cars always ship with "buggy" ECU software. And a LOT of the time, it's not because it's hard to debug (well... often times I wonder how the supplier of the software can't "repeat" the bug, when I have no problem doing so in my laboratory setting and I give him all the details surrounding it), but because of politics. And it's gotten a lot more problematic in the past few years (for different reasons... company merges etc., one company imposing its law onto the other etc), resulting in much less robust software. In our case, we wonder how the other companies can actually sell cars this "defective". But I've not seen widespread reports to say people are actually getting defective cars a lot (just software here, not hardware recalls). I could be a lot more specific here, but NDAs don't allow me to.
 
Two of my toasters and two of my microwave ovens have died. The latest toaster died before it was one year old. Our kitchen mixer also died (smoke coming out of it). It was brand new. Two of my Xbox 360s have died (RROD). Just two months ago my amplifier died. It was very expensive (and very good), and only two years old (obviously it died just after the warranty period ended). My bike body also cracked half a year after the body warranty ran out (it had an aluminum body, so welding was not possible). I have spend almost 2000 euros during the last three years to fix various problems in my car. But the worst of them all is that our brand new apartment had 40 issues (bugs) on the first day inspection. The building company fixed 20 of them in the first months, but it took them 1.5 years to fix the remaining 20. Some of our neighbors had to sue them because they have even bigger problems.
...

Off Topic:
Have you inspected your electric network ? It sounds like you might have a problem there.
 
Obviously none of this means that games are allowed to have bugs :)
Sure. All it means is that glamorizing other industries and thought experiments people suggest in this thread are bonkers.

Systems that have to tolerate errors (typical magic mentioned in this thread: cars, planes, spaceships, medical equipment) usually do so by redundancy. I can easily imagine Davros running 3 consoles in parallel and negotiating output for reliability. Oh, wait, no I don't. ;)
 
But are we really discussing the no-bug problematic?? (I know Davros does, but he is special :D)

It is about game breaking bugs. Bugs that hit most of the consumers. Bugs that don't let you use the product. I could not play Driveclub online for several weeks because it just did not work - I could not use half of the game that was promised and the reason actually most people bought it: the sensational social online community.

An analogy would be: you buy a new car. And it just does not work...engine does not start. And it is not only one specific car under a million other cars. It is like very second consumer who bought this specific car can not use it for driving (but it still plays music...like singe player in Driveclub :)) And the constructors of the car would argue: but but but, car did get so complex in recent years, development cycles getting shorter and shorter to save money, but the consumers still are not willing to pay more money....meh.

And I am 100% sure that such releases are down to pure incompetence on the dev side/management side/publisher and not related to the fact that also in gaming industry, the products are getting more and more complex (who had of thought this?!?). The proof for my conclusion is that there are indeed games released, even car games, that have working online component...does, the complexity is not to high for at least some capable game devs out there.

And hence, for those dramatic cases (and these are the cases that actually increase recently), I think that devs and publisher really need to get their shit together and improve their QA and if really needed use the consumers as debuggers and testers by doing real(!) beta tests and not only "a week before release demo show for the product where it is clear that no feedback can be put back into the code as it is already gold" - betas (BF4...)!
 
An analogy would be: you buy a new car. And it just does not work...engine does not start. And it is not only one specific car under a million other cars. It is like very second consumer who bought this specific car can not use it for driving (but it still plays music...like singe player in Driveclub :)) And the constructors of the car would argue: but but but, car did get so complex in recent years, development cycles getting shorter and shorter to save money, but the consumers still are not willing to pay more money....meh.
No-one's excusing game-killing bugs. There are three aspects to this thread.

1)Is the difficulty of debugging complex games non-linear. - Yes, debugging difficulty is exponential with software complexity.

2) Can/should software be bug free? - No, it's nigh impossible, especially within the economic constraints of the games industry.

3) Are game killing bugs acceptable? - No, and better QA, management and software engineering could reduce these issues to acceptable levels (nigh zero, with some short-lived bugs when the unpredictable occurs).

And hence, for those dramatic cases (and these are the cases that actually increase recently), I think that devs and publisher really need to get their shit together and improve their QA and if really needed use the consumers as debuggers and testers by doing real(!) beta tests and not only "a week before release demo show for the product where it is clear that no feedback can be put back into the code as it is already gold" - betas (BF4...)!
There does seem to be some industry fault here. I was chatting with a dev looking to get something on PSN and they were worried about bugs. I raised the issue of QA and they reported that as an indie, they are held to a standard by Sony. Clearly some big-name pubs are being given an unfair pass if smaller companies are being held to a quality standard before being allowed to sell.
 
No-one's excusing game-killing bugs. There are three aspects to this thread.

1)Is the difficulty of debugging complex games non-linear. - Yes, debugging difficulty is exponential with software complexity.

2) Can/should software be bug free? - No, it's nigh impossible, especially within the economic constraints of the games industry.

3) Are game killing bugs acceptable? - No, and better QA, management and software engineering could reduce these issues to acceptable levels (nigh zero, with some short-lived bugs when the unpredictable occurs).
4) Is it possible to ship with experience-destroying bugs despite great management, practices and testing? Yes, it is. But people who don't write code that ships to millions of people think otherwise.

I'll provide an anecdote. Back at Microsoft I was responsible for debugging Dr. Warson reports from our component. These get anatomized (really, they do) and automagically bucketed based on the stack trace and internal state. We've shipped with an undetected bug that generated hundreds of crashes daily. MS spends tons of monies on testing, there's roughly as many people writing tests as there are those writing production code, plus there are lab technicians running tests, plus there are gatherings with other companies so MS code can be pitted against their tools in order to catch problems early. Only through the crash reports could we identify problems that we've never hit internally. So for several months I was debugging buckets of issues from the most common to the most rare that still made sense. And "made sense" doesn't mean some arbitrary number of hits per month but stack traces that could be understood and fixed. Because usually it was still impossible to get a local repro and if stack looks "just bad", there's nothing you can do. It's fun when you can identify problem as a bit flip caused by heck knows what. These happen more often than I'd like to.

So saying that it's possible to guarantee that some obscure bug won't destroy somebody's experience is pure nonsense.
 
Back
Top