Monday, January 16, 2023

Can Someone Give me a Call!


It is a familiar aphorism in intellectual circles that history does not repeat, but it often rhymes. Last week, I talked about how technical debt and poor maintenance helped ground Southwest Airlines during one of the year's busiest travel periods. This week, the FAA software, which helps maintain air traffic safety, crashed, causing planes to halt for the better part of the morning. 

The system in question is called the NOTAM, which stands for "notice to Air Missions." It is a general-purpose system that tells pilots about hazards they might encounter moving commercial flights from place to place. An air show could fill up the airspace around Wichita, or a flock of birds could clog up a flight path to Cleveland. Air traffic controllers and pilots count on this system to avoid accidents. When the system went down, air traffic came to a crawl, and customers were grounded. 

As the crisis unfolded, it became clear that a software update caused the crash and upset the system. Someone uploaded a corrupt file, killing the central NOTAM system and the backup. The responsibility for uploading that system was with an outside contractor, not an FAA employee. My heart sank when I learned this because I made a mistake like this on a much smaller scale. I broke 56 credit card transactions out of 1800 for TOMS shoes during the holiday shopping season in 2008. It was a fraction of a single day's business, but it was enough of the hassle that TOMS shoes threatened to stop paying the consulting company for the trouble caused. Our company was withheld payment from TOMS for December even though we fixed the problem within twelve hours. A week before Christmas, my company fired me, and to this day, I still harbor some ill will toward TOMS and its brand. 

I imagine a consultant is looking for a new job after making a similar mistake. Unfortunately, I think that mishaps like this happened because the FAA and the contractor created a NOTAM system that was fragile and easy to break. According to John Cox, an aviation safety expert, an outage like this has not happened in 53 years. Fifty-three years without an outage is a mighty impressive record in the IT industry. Reliability like this gets CIOs extensive bonus checks. Still, it looks terrible that air traffic controllers and pilots are grounded because a software update did not go well. 

Delta airlines CEO Ed Bastian called the shutdown unacceptable but said that the incident was not the fault of the FAA but instead the result of a lack of funding. Senator Maria Cantwell of Washington said that Congress would hold a hearing on the subject. In my last blog, I said moments like this were necessary to point out system problems and call attention to maintenance and technical debt. A moment of clarity should focus the minds of members of Congress and members of the executive branch toward improving the system.

So far, the FAA website has one update on January 11th about the outage, and the news has moved on to other subjects. Transportation Secretary Pete Buttigieg has said that they have made some changes to prevent this from happening again. Still, I suspect it will take more than changing the procedures to ensure something like this does not happen again. I am suspicious that decades of technical debt and outdated servers are at the heart of the NOTAM system. It looks like the perfect project for an Agile coach and change agent to make a difference. Secretary Buttiegieg, give me a call, and we can talk about how I and CAPCO can help update your systems. 

I must confess that I am a little glib about the subject, but technical debt is a big deal. We have spent plenty of time and energy making systems efficient, so they are not resilient when bad things happen. Making systems faster, better, and cheaper is essential, but it undermines our trust in those systems if they are not resilient. Eventually, that lack of confidence will hurt the airline industry and the country. That is something that no one wants. 

Until next time. 


No comments:

Post a Comment