We have heard the term software glitch mentioned in the press a lot these last few weeks. A glitch seems to have a very wide range of impact to people and businesses. The use ranges from a single gas station in Texas that installs new software, where the software sets the price of a gallon of gas to $1.01. This is localized the poor station owner, who sold gas till his tanks were empty, costing him initially thousands of dollars. But I am sure the company that installed and developed the software will helpl him or her out.
The other end of the scale is what happened during the Facebook IPO and the Knight Capital glitches. Each of these totaled losses in the hundreds of millions. The glitch was so impactfull to Knight Capital, that the company was on the brink of being shutdown. Kknight Capital also installed new software and the issue became apparent in the first 30 minutes.
The Facebook IPO occurred with the existing software systems, with out any new software or upgraded software as far as I know. The change here was the new workload being generated by the traders coupled with the outrageous volume they generated. The test plan might not have covered the very large workload, or the entire system may not have been expecting this volume.
Other examples of glitches;
- Southwest Airlines – Large number of small transactions crashing systems and cascading (web site, billing systems, call centers). An extreme workload, the airline sent emails to all it Facebook friends that a sale was going on, because they reached the milestone of 3 Million likes.
- Local Conoco gas station – New computer system, at midnight, the price of gas was set to $1.01. This caused a frenzy.
- Six Flags Roller coaster – suddenly stopped at the top of its ride. They ruled out mechanical issues, and are looking into the programming, it may be a computer glitch.
- Tokyo Stock exchange – Computer error halted derivatives trading for 95 minutes. This was the second Glitch in seven months.
Two themes emerge from these glitches;the first is new software introduces a large amount of risk, and must be reviewed, tested for performance scalability, and antagonistically. The second is the impact and the uncertainties of new and extreme workloads cannot be underestimated.
Maybe we can start using the terms, small, medium, large, extra large, and business glitch.
See the www.collaborative.com for a paper on Application Performance Risk, I have defined a list of criteria that can help determine the degree of the risks of your application and workload.