share
ProgrammersWhat is the worst software bug in history?
[+68] [20] Amir Rezaei
[2011-01-27 11:31:51]
[ software bug history ]
[ http://programmers.stackexchange.com/questions/40477] [DELETED]

By having for example money and human suffering as the metric.

Note this is a specific question.

Last month automaker Toyota announced a recall of 160,000 of its Prius hybrid vehicles following reports of vehicle warning lights illuminating for no reason, and cars' gasoline engines stalling unexpectedly. But unlike the large-scale auto recalls of years past, the root of the Prius issue wasn't a hardware problem -- it was a programming error in the smart car's embedded code. The Prius had a software bug.

(16) The ultimate Y2K bug because it didn't really exist? - Pierre 303
Another vote for Y2K bug. - Eugene Mayevski 'EldoS Corp
@Pierre I wonder how much that cost companies all around the world. However it created job. :) - Amir Rezaei
Fear is very powerful. It drives the economies. - Pierre 303
@Pierre Add to that "People". - Amir Rezaei
(4) I'm pretty sure that quite a lot of Y2K bugs existed, but most of them were fixed in time, since, you know, the year '100 didn't come unexpected. - ammoQ
(11) In number of people dead, money loss or some other criteria? - Thorbjørn Ravn Andersen
@Thorbjørn I have trouble saying what metric. But human suffering (dead and economic) must be good one. - Amir Rezaei
(1) @Amir, perhaps you should open a separate question for each kind of "worst" then. - Thorbjørn Ravn Andersen
@Thorbjørn Do you have a better word in your mind. - Amir Rezaei
(4) @Amir, I am just suggesting that you are more explicit in your question about what "worst" implies. Look at the discussions about what "best" is. - Thorbjørn Ravn Andersen
(28) The Y2K bug DID exist - it just got fixed in time for the most part. There were some localized incidents (and some are still not fixed properly), but that's about it. Damned if you do, damned if you don't. - MetalMikester
@MetalMikester: I think Y2K would be better described as the most overhyped bug. It had potential to be a lot worse, but fortunately, it wasn't so bad. - FrustratedWithFormsDesigner
(1) It's a tie between Microsoft Windows ME and Microsoft B.O.B. :-D - Jesse McCulloch
(47) Quite puzzled that people think the Y2K bug didn't exist. I spent most of 1999 fixing legacy financial systems that most certainly did fail with Y2k dates (often these were spectacular, hilarious failures too). The fact that it fizzled out and for the most part the whole Y2K thing was considered overhyped (by everyone outside the industry) is a reassuring confirmation that our hard work with these ageing systems was worth it. :-) - robsoft
(17) The last one will be the worst: if( UnderAttack() ); { FireTheMissiles(); } - Travis Christian
@robsoft -- Everyone seems to have the memory retention of bats when it comes to software. - Rei Miyasaka
(2) @Travis: Or if (isUnderAttack = true) FireTheMissiles();. My work C++ compiler will display a warning for yours and not mine. - David Thornley
(2) @Travis try { if (UnderAttack()); {FireTheMissles()} } catch (LeTiredException x) { /* nailed it */ } - Paul Fisher
Metal: i'm talking about the one that would put the world into chaos :) - Pierre 303
(1) Interesting. The question was closed and Joel Spolsky targeted it on twitter ;) - Oscar Mederos
probably not the worst but surely the funniest github.com/MrMEEE/bumblebee/commit/… - systempuntoout
[+121] [2011-01-27 11:43:17] Dan McGrath [ACCEPTED]

We studied the Therac-25 [1] software bug at University. I'm not sure it is history’s worst, but it really does go to show how much of a threat to our lives software bugs can be.

The Therac-25 was a radiation therapy device in which a software bug resulted in patients being exposed to lethal doses. The software bug was a race condition which would have been fine in the previous model (Therac-20) which had had its electromechanical locks replaced with software controls in the 25.

[1] http://en.wikipedia.org/wiki/Therac-25

(10) If I recall, it was because a program used a simple locking mechanism during atomic operations. Most of the time it would work, unless the code was interrupted at the exact spot where it flags that it's entering an uninterruptable area thereby turning on radiation and not turning it off immediately afterwards. It was a classic example at my university of why the simple locking mechanism bool isLocked; does not work. - Neil
(3) +1 Answer I'd agree this was the worst as it killed people. While others cost money. I found the analysis a hard read so, was the problem inadequate implementation integration testing of software components known to work on previous products? I would agree with @Neil if this was also a factor, a classic TOCTOU race condition whereby the checking and doing of the lock events have time elapsed between them, in which the lock state can be changed by another process after the checking of the original process because the lock hadn't been locked by that yet. - therobyouknow
(2) They teach us about this one in the undergrad computer systems class at MIT. It's an example of how something we all encounter -- race conditions -- can have consequences many of us can hardly imagine -- killing people. - Lawrence Velázquez
Think about this when you go through one of those airport full-body scanners -- the government has tested that they shouldn't expose you to too much radiation. - jm01
(1) The Therac machine only killed six or so people I believe. I often wonder how many people would have otherwise died had the Therac machines never been built in the first place, and whether the Therac-25, despite its bugs, improved society at large.... - whatsisname
(4) And what have we learned since then? Such things are still happening. See, for instance: nytimes.com/2010/01/24/health/24radiation.html and nytimes.com/2010/01/27/us/27radiation.html - Kyralessa
(7) This is why it is a bad idea to toss mechanical interlocks as a cost-cutting measure... - SamB
(2) At least this was by accident. America do worse things in the name of experimentation on their own population...en.wikipedia.org/wiki/… - Maltrap
(6) Therac-25 was the first thing I thought of on reading the question. - Justin E. Morgan
Sounds like the worse mistake was a design/engineering one. Why on Earth would they forgo the mechanical interlocks when the potential consequences are death? For a problem like this "Physical component X must be at position Y before Z happens" a mechanical solution is going to be much simpler and with fewer potential points of failure. - Tim Goodman
1
[+67] [2011-01-27 11:53:56] thorsten müller

I think one of the most expensive was the Ariane 5 [1] bug.(US$370 million)

The Pentium bug [2] was rather expensive too.

A bug in the software of the Patriot Missile [3] killed 28 US Army soldiers at Dhahran.

[1] http://en.wikipedia.org/wiki/Ariane_5_Flight_501
[2] http://en.wikipedia.org/wiki/Pentium_bug
[3] http://en.wikipedia.org/wiki/Patriot_missile#Failure_at_Dhahran

What's interesting about the Patriot missile situation is that there was already a patch (the oldest trick in the book - just reboot). - Uri
(5) @Jaap My understanding of the bug is that it was a floating-point error. The code repeatedly incremented floating point timer by an unrepresentable quantity like 0.1. After running for 100 hours, the accumulated error was enough (1/3 second, according to Wikipedia) to cause the missile to miss its target. Rebooting reset the timer counter and therefore the accumulated error. If they rebooted the missile daily, the error stayed within tolerance. - Michael E
(1) @Michael: Actually, the missile would have correctly tracked the target had it been fired. The problem is that the computer was comparing things across time (the difference between the correct clock and the bogus clock) and came to the conclusion the missile couldn't catch the target so it wouldn't fire in the first place. - Loren Pechtel
(6) For some reason "fixing" missiles by rebooting them daily is not quite what I expected from the US Army... - Thorbjørn Ravn Andersen
(6) The Patriot missile was created as an offensive weapon, so it would only be in a ready state for short periods of time. This didn't allow the bug to show up with that use case. It didn't become a problem until they re-tasked it as a defensive weapon (and running constantly) during the Gulf war to guard against SCUD missiles. - jmohr
@jmohr I seem to recall it being designed for defense against Russian attacks in West Germany/Berlin, for which the missile launchers would be packed up and moved every 6-12 hours or so, so that the enemy couldn't always know their exact locations. - Daemin
(1) I’m not sure if the Patriot Missile can be counted as a bug. The problem was known the whole time and the developers warned explicitly that the computer had to be restarted in well-specified intervals. It’s failure to adhere to these regulations that caused the malfunction. Saying that this is a software bug (rather than a shortcoming) is like saying that rather shooting yourself in the face with a revolver is a bug in the revolver. - Konrad Rudolph
2
[+14] [2011-01-27 16:17:17] John Bode

The sendmail bug that allowed the Morris worm to propagate and introduced a generation of hackers to the wonderful world of buffer overflow (brought to you, more often than not, by the standard C library function gets(), which is finally being removed from the 201X standard).


I would hardly call that a disaster. Instead I would say it pointed out the danger of buffer overflows and the poor design of the C language in general. Unfortunately people refuse to learn from this and still have buffer overflows. - community_owned
Exactly; it shone a spotlight on an easy exploit. - John Bode
@Pickle: I'd call continuing to use the C language, (or its equally-unsafe derivatives such as C++ or Objective-C,) after the time of the Morris Worm for any software in which security requirements exist (operating systems, network software, browsers, etc.) a disaster, and an act of criminal negligence. - Mason Wheeler
(1) @Mason Agreed. That's why I only write in LOLCODE now: HAI; CAN HAS STDIO?; VISIBLE "HAI WORLD!"; KTHXBYE - community_owned
3
[+9] [2011-01-27 13:16:16] hiena

Excel 2007 Multiplication Bug [1]

"Simply when you try to multiply 850 by 77.1 excel display the result to be 100000 !!!"

[1] http://groups.google.com/group/microsoft.public.excel/browse_thread/thread/2bcad1a1a4861879/2f8806d5400dfe22?hl=en#2f8806d5400dfe22

(7) that's hardly the worst in history... (it is pretty funny) - Matt Ellen
(1) Does this answer have anything to do with this: blogs.msdn.com/b/oldnewthing/archive/2011/01/06/10112270.aspx - Ben
(5) @Matt Ellen considering the tens of millions of people who rely on Excel for accurate calculations, its pretty terrible. - yahelc
(2) But the actual value underneath was still accurate, it was just the displayed value that was incorrect. So you could still use the result for further calculations. - Daemin
4
[+9] [2011-01-27 12:45:13] Ian

To make this the most expensive you would have to index the cost and convert it into today's prices.

The bug struck on the 22 July 1962 (the day before I was born!) and led to the loss of the Mariner 1 [1] space probe that was due to fly to Venus.

If you believe that this meets your definition of a software bug, it must stand as one of the most serious ever. It also shows that there is nothing new about bugs in space/missile software.

[1] http://en.wikipedia.org/wiki/Mariner_1#An_Infamous_Bug

5
[+6] [2011-01-27 14:34:39] sglantz

It is still debated whether it was actually caused by a bug or not, but the Flash Crash of the US stock market from May 2010 has to be the worst from a purely financial standpoint. The crash apparently wiped out over a trillion dollars worth of equity in a few minutes. While a large percentage of that was regained as the stock market quickly rebounded, there was certainly a lot of money lost (and made) that day.


What's the "Flash Crash"? - ChrisF
(2) If you would assign the value of human life somewhere between $1 mln and $10 mln (as is done in many cases), a trillion dollars would be equivalent to losing 100,000 - 1 mln lives... That's quite a bug... - Jaap
(6) If most of that money was lost by filthy rich bankers who already have more than they will ever need, I’d say it’s not a bug, it’s a feature. - Timwi
(5) @Timwi it was our investment money lost by filthy rich bankers (who will charge you a fee if it goes up, but not suffer a loss if it goes down). - dbkk
6
[+6] [2011-01-28 02:43:46] Justin Ethier

A defect in the control software for the Soviet Urengoy - Surgut - Chelyabinsk natural gas pipeline resulted in " the most monumental non-nuclear explosion and fire ever seen from space [1]".

But, although this is an interesting story, it is not a bug per se as the CIA allegedly sabotaged the software which was subsequently stolen by the KGB for use in the project. Crazy stuff...

[1] http://en.wikipedia.org/wiki/Siberian_pipeline_sabotage

7
[+5] [2011-01-27 14:59:09] Fanatic23
If I understand correctly, this Pathfinder bug only caused the loss of some but not all scientific data. However the Mars Climate Orbiter broke up in the Martian atmosphere due to a metric/imperial bug. - Hugo
8
[+4] [2011-01-27 20:50:56] ja72

Honorable mention to the Zune software bug that froze all the devices in Dec 31, 2008 before midnight. Zune Bug Explained [1]

[1] http://www.crunchgear.com/2008/12/31/zune-bug-explained-in-detail/

9
[+4] [2011-01-28 12:42:59] WarrenFaith

Its not really a bad software bug, but I guess the pilot was pretty surprised when it happened the first time:

F16 autopilot flipped plane upside down whenever it crossed the equator

Source: http://www5.in.tum.de/persons/huckle/horrorn.pdf


I'd love to see a more accountable citation for this one. It's a hilarious idea, but it does seem to be an urban myth. - Chris Burt-Brown
10
[+3] [2011-01-27 21:16:05] aking1012

I wonder how many trade secret housing and government computers were penetrated by the SSH unauthenticated session bug that allowed ssh access without a password. Still looking for the reference, but some people might know what I'm talking about.


11
[+3] [2011-01-27 18:55:58] sal

The Program Trade defects associated with the 1987 Stock market crash [1] comes to mind. None of the firms using program trading thought through the implications of how things would change when the majority of trading was automated.

[1] http://en.wikipedia.org/wiki/Black_Monday_%281987%29#Causes

12
[+3] [2011-01-27 18:24:16] uvita

The British destroyer H.M.S. Sheffield was sunk in the Falkland Islands war. According to one report, the ship's radar warning systems were programmed to identify the Exocet missile as "friendly" because the British arsenal includes the Exocet's homing device and allowed the missile to reach its target, namely the Sheffield.


(3) Do you have any evidence to back this up? - ChrisF
Have a look at cs.tau.ac.il/~nachumd/horror.html - uvita
(1) Hmmm I seem to recall that the Sheffield's powerful radar was interfering with communications on nearby British ships, so they'd turn it off for a while. And that's when the Exocet was fired at it - never saw it coming. - MetalMikester
(1) "Officers dismissed radar warning of Exocet attack on HMS Sheffield" guardian.co.uk/uk/2000/sep/26/falklands.world "Her own radar was jammed because officers were making a satellite phone call to fleet headquarters in Northwood, London. They ended the call and spotted the Etendards on their radar 20 miles away." - John Breakwell
13
[+3] [2011-01-27 13:00:07] Ranger

The Wired Magazine article [1] has some useful and interesting information on software bugs.

[1] http://www.wired.com/software/coolapps/news/2005/11/69355

(1) Could you please be more specific ? - Pierre 303
(41) Inline the bugs. Answers should be as self standing as possible. - Thorbjørn Ravn Andersen
(2) That sound more like an ad than an answer - Pierre-Alain Vigeant
(7) Downvotes for a completely relevant link? The likelihood that this guy is trying to promote Wired is close to zero. Meanwhile the guy (Karthik) who rips off this Wired article (almost verbatim) gets 24 upvotes. Hmmm. - Greg
(2) @Greg - there needs to be more than just an link in an answer for it to be useful (as per the tooltip for the up-vote). - ChrisF
(1) @Greg, the downvotes are - most likely - because the answer isn't self-standing, but deliberately refers to external sources. - Thorbjørn Ravn Andersen
@ChrisF @Thorbjørn Ravn Andersen, I totally agree of course. But it is possible to comment without downvoting and given that he is new (11 days) and contributing (7 answers, 0 questions) that would seem more appropriate. Plus, I admit I was reacting the irony of this in the context of Karthic's upvotes. - Greg
(1) @Greg: I agree with you 100% SHAME on this community for rewarding theft and punishing those who are trying to be considerate. (This is the same community that downvotes people who ask about running MacOS in a VM because "it's not legal"). Amazingly inconsistent. - ראובן
(3) It's about the worst worded answer I have ever seen. It should have read something like: "[This][1] article in wired magazine has some great examples". [1]: wired.com/software/coolapps/news/2005/11/69355 - rjmunro
(1) It's a terrible answer because he could do both: provide a link and post some of the bugs. As it is I'm not clicking on anything because it looks like comment spam like here: some_spam_link_with_a_billion_pop-ups.com - community_owned
+1 for useful link which some other guy summarized and got little too many upvotes..but word it better as others have already stated and don't just write check it out or incomplete answers.. - Misnomer
@ Everyone : Why are we crying for the words, we always have an EDIT option. I am a responsible member of the community and i understand my responsibilities, so why will i give any link to SOME_SPAM_LINK_WITH_A_BILLION_POPUPS. Those who really wanted the answer they have seen the link others well they are just CRYING without any reason. - Ranger
@Greg, I didn't downvote. - Thorbjørn Ravn Andersen
@Thorbjørn Ravn Andersen no problem. That comment wasn't directed at you in particular. At the time I made it this answer had a ton of downvotes. And FWIW, I agreed with your comment. - Greg
14
[+1] [2011-01-27 22:25:38] regularfry

The Patriot Bug [1] always comes up in questions like this. The thing to remember is that it wasn't actually a software bug - it was user error. The software required a reset every couple of days to keep timers in sync, and this was according to spec, in part to keep the cost of the components down, and in part because the components were designed a long time ago.

What they didn't take into account was that the Patriot crews didn't want to turn off their rigs for even the few minutes it would take to reboot, in case a missile happened to fly over while they were down. They didn't know, or had forgotten, that by keeping the rig active, they were making it less accurate over time, until after a couple of days without a reset it literally couldn't hit its target.

Maybe calling this "user error" is a little harsh. If the users' training had been adequate, this would not have been an issue.

[1] http://sydney.edu.au/engineering/it/~alum/patriot_bug.html

(2) This was a bug. The recommendation for reboots was because the clock drift was known but a patch hadn't been rolled out yet. The day after this incident the maker supplied the patch. - Jim
I agree with it being a bug - "you must reboot regularily to for this to work correctly" is not what I would expect from a modern weapon. - Thorbjørn Ravn Andersen
It's a little disingenuous to say that the maker supplied the patch the day after the incident. The maker had supplied the patch 9 days earlier, but it took 10 days to get into theater. I still don't consider this a bug: the weapon was performing as originally designed, but was being used on targets outside its specified scope, and that is because it's not a "modern weapon." - regularfry
15
[+1] [2011-01-27 21:44:29] knb

Coming soon: The Year 2038 problem [1], i.e. the fact that the POSIX time_t data type,

started in 1901-12-13 20:45:53 GMT will wrap around in 2038-01-19 03:14:07 GMT.

Will cause some computer software to fail at some point before, during, or after the year 2038 (from the wikipedia article)

[1] http://en.wikipedia.org/wiki/Year_2038_problem

That's hardly "soon". It's more than 25 years into the future, and by that time they'll have long since switched to a 64-bit representation. - Mason Wheeler
(1) 32-bit counters are typical of 32-bit operating systems. The 64-bit ones have fixed this, and with the current speed of adoption there is a reasonable chance that the problem mostly goes away within a decade or two. In 1995 everybody used DOS programs and emulation was very important. Today people live with that 64-bit Windows 7 cannot run Win16 anymore. - Thorbjørn Ravn Andersen
(3) yes well-spotted indeed, it's not a completely serious answer, but the problem will affect lots of programs, and your prediction is like someone from 1973 saying y2k problem will not occur because everyone will have switched to 32bit representation. - knb
(2) This won't be a big problem in the same way that Y2k wasn't a big problem. Mortgages, life insurance etc. deal in 25+ year time scales. Their maturity dates are going to hit 3038 very soon (if they haven't already). Even if the systems haven't been changed already they will be change long before 2038 rolls round. - ChrisF
Um...if we haven't all transitioned to x64 by then we should be summarily executed. - aking1012
@knb, there is a vast difference between a system call data type and an internal variable in a program. - Thorbjørn Ravn Andersen
16
[0] [2011-01-28 03:54:40] stefan

Winnuke, Denial of service attack on Windows 9x was brutal.


17
[0] [2011-01-28 13:33:33] JoelFan

I remember hearing about a missile that crashed on launch due to a typo in a Fortran program, wherein a period was typed instead of a comma, resulting in a program the was syntactically correct but semantically very wrong.


(1) That was Ariane 5 (en.wikipedia.org/wiki/Ariane_5_Flight_501), has been mentioned already - Simon
That article does not seem to be the same bug I'm remembering... it's Ada, not Fortran and is not related to a typo - JoelFan
18
[-1] [2011-01-27 21:52:56] Daniel A. White

My vote is Y2K. With all the preparation. (Entered from my phone)


19
[-1] [2011-01-27 18:50:29] Alex

Visual Basic Script (VBS)! Or you can blame the ingenious feature of hidden extensions that is turned on by default (thank you Microsoft) which fooled people into thinking LOVE-LETTER-FOR-YOU.TXT.vbs was a text file.

That bug allowed a virus called 'ILoveYou' to spread more than rabbits in Australia causing an estimate damage of $5.5 billion.

You say it's not a bug!? Apple and Oracle are now watching you...


(2) Chill a bit. I don't know why you think Apple and Oracle are suddenly Microsoft's big brother, but OS X hides file extensions by default too. - Rei Miyasaka
Have the rabbits caught up by now? - Thorbjørn Ravn Andersen
I don't see that as a bug. The hidden extensions feature is working as it was designed to. Just because someone used it as a way of getting an exploit through doesn't make it a bug. - John Breakwell
(1) @JohnB.: imo, it is a bug. even designed bugs are bugs. hiding things is a way of showing the users/children a simpler world. to not explain and teach imortant things that are hidden but openly accessible is either a bug or a con. it could still be a con, if someone did intentionally design this as a backdoor. - comonad
Exactly @JohnB, it's hard to define what really a 'bug' is. Missing feature sometimes can be thought as a bug too. I've had this discussion with some clients of mine several times. Go ask for those companies losing billions if they lost money on a feature or a bug... - Alex
(1) @Alex Sure, I agree. Companies don't care what the problem is - they just want it fixed. From working with product groups at Microsoft, though, I feel there is a distinct difference between a bug and a misused feature. @Comonad The push to make UIs easier to navigate does have the side-effect of hiding things it may be better that people understood but that's a discussion of pros and cons - no approach is going to work for everybody. - John Breakwell
20