Watch out for systems that are too complex

Today I experienced a problem at a bank while trying to withdraw more cash than an ATM would ordinarily allow.  It has a lot in common with today’s failure of Delta Airlines entire software system, and a similar failure at Southwest last week.  The Delta problem was believed to have started with an electrical power failure in Atlanta, which shut the entire system down.  It was widely reported that much of the online flight information data or what was posted on airport monitors was incorrect for a period of many hours.  A similar system-wide failure occurred at Southwest.  How much of the problem still persists in the form of incorrect billing for changed flights, corrupted credit card information, and other security  issues?  In each case, tech support staff worked hard to get the systems up and running.  Much of my research in the last 30+ years has focused on the engineering of complex software systems, so none of the technical issues surprise me.

(If you want to learn more about the technical details of how such complex things can be engineered, buy the second edition of my Introduction to Software Engineering book.  Both my publisher, Chapman and Hall, and I will appreciate the $99 you spend.  Thank you.)

Here’s what happened at my bank.  None of the tellers at the bank branch could get access to the bank’s computer system.  It took about five minutes for the system to come up.  The ATMs outside the bank seemed to have problems, also.

So, what’s the issue?  The problem from a consumer perspective is that any consumer transaction or data might have been corrupted, or even intercepted.  In this case, the tellers simply waited, and their system became available.  Not much tech support seemed to be needed at this bank branch today.

Did the Delta or Southwest tech support completely solve the problems?  Perhaps.  Did the bank’s software problem solve itself?  Perhaps.  We will never know.

What we do know is that, in each case, a system did not behave the way it should and, in the case of the airlines, the problems cascaded.  Not surprisingly, the problems of complex software systems tend to increase greatly when they are under heavy load.

What this means for you is that you should avoid being at the mercy of such failures, so your data is correct and you have evidence of what transactions you made.  Keep in mind that complex systems fail often, and you should avoid unsafe practices.  Here are some unsafe practices.

  1. Deposit cash through an ATM?  Do you have any recourse if a problem occurs?
  2. Deposit a check through your smartphone?  Only if you keep the physical check and have a nearby location where you can talk to a person who can fix any problem that occurs.
  3. Rely only on data stored on a smartphone for airline reservation information, without either a paper copy or a way to print a hard copy? Remember, your copy of an electronic reservation data may be out of date if a system failure occurs.
  4. Use free wifi to log in in an airport?  The wifi may be overloaded, so whatever limited protection of your private is sometimes available may not be present at all.