Showing posts with label technical debt. Show all posts
Showing posts with label technical debt. Show all posts

Monday, January 16, 2023

Can Someone Give me a Call!


It is a familiar aphorism in intellectual circles that history does not repeat, but it often rhymes. Last week, I talked about how technical debt and poor maintenance helped ground Southwest Airlines during one of the year's busiest travel periods. This week, the FAA software, which helps maintain air traffic safety, crashed, causing planes to halt for the better part of the morning. 

The system in question is called the NOTAM, which stands for "notice to Air Missions." It is a general-purpose system that tells pilots about hazards they might encounter moving commercial flights from place to place. An air show could fill up the airspace around Wichita, or a flock of birds could clog up a flight path to Cleveland. Air traffic controllers and pilots count on this system to avoid accidents. When the system went down, air traffic came to a crawl, and customers were grounded. 

As the crisis unfolded, it became clear that a software update caused the crash and upset the system. Someone uploaded a corrupt file, killing the central NOTAM system and the backup. The responsibility for uploading that system was with an outside contractor, not an FAA employee. My heart sank when I learned this because I made a mistake like this on a much smaller scale. I broke 56 credit card transactions out of 1800 for TOMS shoes during the holiday shopping season in 2008. It was a fraction of a single day's business, but it was enough of the hassle that TOMS shoes threatened to stop paying the consulting company for the trouble caused. Our company was withheld payment from TOMS for December even though we fixed the problem within twelve hours. A week before Christmas, my company fired me, and to this day, I still harbor some ill will toward TOMS and its brand. 

I imagine a consultant is looking for a new job after making a similar mistake. Unfortunately, I think that mishaps like this happened because the FAA and the contractor created a NOTAM system that was fragile and easy to break. According to John Cox, an aviation safety expert, an outage like this has not happened in 53 years. Fifty-three years without an outage is a mighty impressive record in the IT industry. Reliability like this gets CIOs extensive bonus checks. Still, it looks terrible that air traffic controllers and pilots are grounded because a software update did not go well. 

Delta airlines CEO Ed Bastian called the shutdown unacceptable but said that the incident was not the fault of the FAA but instead the result of a lack of funding. Senator Maria Cantwell of Washington said that Congress would hold a hearing on the subject. In my last blog, I said moments like this were necessary to point out system problems and call attention to maintenance and technical debt. A moment of clarity should focus the minds of members of Congress and members of the executive branch toward improving the system.

So far, the FAA website has one update on January 11th about the outage, and the news has moved on to other subjects. Transportation Secretary Pete Buttigieg has said that they have made some changes to prevent this from happening again. Still, I suspect it will take more than changing the procedures to ensure something like this does not happen again. I am suspicious that decades of technical debt and outdated servers are at the heart of the NOTAM system. It looks like the perfect project for an Agile coach and change agent to make a difference. Secretary Buttiegieg, give me a call, and we can talk about how I and CAPCO can help update your systems. 

I must confess that I am a little glib about the subject, but technical debt is a big deal. We have spent plenty of time and energy making systems efficient, so they are not resilient when bad things happen. Making systems faster, better, and cheaper is essential, but it undermines our trust in those systems if they are not resilient. Eventually, that lack of confidence will hurt the airline industry and the country. That is something that no one wants. 

Until next time. 


Monday, January 9, 2023

Southwest and Lessons of Technical Debt


Kicking over an ant hill resembles an Irwin Allen disaster movie. It is a frantic and desperate scene. Insects scurry in all directions attempting to make sense of the catastrophe and repair the damage. Some ants try to bite or sting the offender who upset their home. It constantly happens in nature, and the scene is repeated in business as unforeseen events conspire with neglect to create desperate situations for professionals. The latest example is the recent trouble at Southwest Airlines as it attempts to recover from bad weather and worse software. Today, we kick over a metaphorical ant hill and look at how to avoid this catastrophe. 

News stories circulated over the holidays about Southwest Airlines and countless passengers stranded in the middle of holiday travel. The press was so bad that Southwest offered 25,000 free air miles to its passengers as a sign of goodwill. How on Earth did this happen, and why? 

While working on my MBA, Southwest Airlines was a thorough case study on how to run a business properly. Flight attendants worked with management to understand scheduling. The organization and its labor union have a collaborative relationship. A reality TV show featured Southwest gate managers highlighting customer service challenges. Seeing headlines featuring incompetence and dissent within the Southwest organization was discouraging. 

How did such a great business fall so quickly? The answer became clear when Business Insider shared an open letter from the head of the pilots union, which pointed out that the former CEO, Gary Kelly, did not invest in updated software for pilot scheduling. Kelly also packed the organization with cronies with a common background from the accounting program at the University of Texas. Discussion about how to run the business moved from the pilots and other employees to the executives' leadership surrounding Kelly. It was great for the organization's profitability, which made money when other airlines faltered, but it was a situation where the qualities that make Southwest unique atrophied. Most damming was the deliberate neglect of technical debt in the company's software scheduling of pilots. 

I talk about technical debt as an agile coach and consultant. I have written about the subject on several occasions. Software and data are as vital to customer service as adequately trained people and well-maintained aircraft. As software continues to eat the world, it is incumbent that business leaders pay attention to the operation of their software systems because if they do not, they will experience expensive and embarrassing episodes of business interruption. It appears that Kelly, who is still the chair of the board of Southwest, is being exposed to that costly lesson.

Technical debt in a business organization point to deep organization failures. If a business fails to update its systems, it shows it does not care about the proper operation of the company. Somewhere, someone calculates that updating systems does not have an immediate financial benefit, so they neglect it. Over a series of years, the systems become more brittle and unable to deal with changing conditions. The result is ill will from customers, lawsuits for poor service, and a large financial hit to the organization. All of which could have been avoided if leadership had paid attention to the technical debt sooner. The east coast snowstorm around Buffalo was the final straw, and it took Southwest longer to recover. 

I am not surprised by this story. Many organizations have this problem, and it takes an event like this to make it relevant to business leaders. In truth, having your organization experience something like this is an excellent way to avoid inertia and complacency. By having your metaphysical ant hill kicked over, you pay attention to the operational issues that matter. Southwest will pay for this; I hope they have learned their lesson.

Until next time. 


Monday, October 18, 2021

Southwest Airlines and the Gremlins of Technical Debt


It is Halloween season, so I indulge in a few monster movies when I have downtime.  I am partial to the old Universal monster movies with Bela Legosi and Boris Karloff.  I also enjoy anything with Vincent Price, and I consider his film “The Abominable Dr. Phibes” one of the most frightening things I have ever seen.  There is something about monsters lurking in the shadows which always gives me a great scare.  One of my favorite monster movies comes from director Joe Dante entitled “Gremlins,” which is a fantastic popcorn movie and a parody of entertainment culture at the same time.  Today, on the blog, I want to discuss a different kind of gremlin lurking in the shadows and how it has been fouling up air travel.   

The term gremlin was invented by the British.  In the early days of aviation, airplanes were not mechanically reliable; engines would jam, flight controls would snap, and canvas would tear without explanation.  Mechanics and pilots often blamed these mishaps on “gremlins,” nasty elf-like creatures who liked to cause mischief on an aircraft in flight.  By the Second World War, pilots from the United States and Royal Air Force had stories about gremlins.  If anyone has stories about these creatures from the German, Russian or Japanese Air Forces, please share them in the comments.  Suffice to say, gremlins were an excellent alibi for poor maintenance, bad design, or dumb luck.  The gremlin became a part of aviation culture. 

I keep thinking about these critters the more I work in technology.  I wish I could invoke them during a debrief of a poorly executed project or use them to explain a server outage.  Unfortunately, gremlins are mythical creatures, and if I use them to present a technical problem, the CIO of my client would laugh at me and then ask me to pack my desk and leave the building.  

Gremlins are comforting, compared to the problems technical professionals face with increasingly complex systems.  Earlier this month, Southwest Airlines could have invoked the little monsters during a three-day weekend when it faced a severe shortage of flights.  Some pundits on the internet spread the false rumor that the slowdown was a strike created by pilots who refused to receive the COVID-19 vaccine.  The reality is less about the civil disobedience of pilots than the negligence of Southwest Airlines and its Information Technology systems.

According to the Southwest Airline Pilots Association spokesperson, “I point to how they (Southwest) manage the network and how I.T. supports that network.”  It seems the union has been complaining about the reliability of I.T. systems for over four years.  Company officials have not commented on the claims but based on the events of the long holiday weekend, it is easy to see how an outage can ground an entire fleet of planes.  

I understand why something like this could happen at a large organization.  The internal system which schedules flights is buggy or unreliable.  Debate within the organization happens, and a decision is made not to fix the system because the cost and inconvenience are greater than dealing with the flawed system.  The conscious choice to do this is called technical debt in the agile community.  It sits in the organization like a time bomb waiting to explode the business at the least convenient time.  I suspect that is what happened to Southwest Airlines. 

Having technical debt in your organization is like having a box of gremlins and tossing them into a swimming pool.  Bad things are going to happen.  It is why everyone in an organization needs to regularly look at technical debt and give it a serious evaluation. Otherwise, your organization will get grounded.  To avoid a horror movie corral the gremlins of technical debt, you will be glad you did.  

Until next time. 



Monday, January 21, 2019

Transform at the speed of the Team

Coaching is more than presentations.
Software development is not rocket science; it is a branch of engineering but, it is not rocket science.  I say that because rocker science depends on the laws of chemistry and physics which have not changed since the big bang.  Software development is changing daily.  Javascript libraries are constantly being updated and going in and out of fashion.  Versions of PHP change and open source code is in constant flux.  Finally, software development is dependent on the fickle demands of consumers who use it.  The level of chaos and change are staggering.  It is why software development is such a challenging profession.  As a scrum master and coach, you must understand those challenges and guide development teams through the process.

One of my favorite pieces of journalism is Bloomberg’s weighty essay entitled “What is Code?” It talks about the person in the taupe blazer and the frustrations of software developers.  It also does a great job talking about the headaches the executives who manage software developer face.  The essay captures perfectly how smart people struggle daily to get dumb machines to act intelligently.

The world of software has tremendous power, but that power belongs in a small subset of the world population.  I calculated that less than .05% of the global population of 7.4 billion could maintain software and computer networks.  Many of these individuals work in the quiet recesses of government and business keeping things running.  They go home to families and friends.  They pay bills and try to live their lives as best they can.

Because of the laws of supply and demand, computer professionals receive large compensation, but the compensation comes with a trade-off.  The trade-off is long hours on uncompensated overtime and business leaders expecting them to perform magic.  It creates conditions which lead to poor quality and burn out.  I have experienced this situation as a developer and as a manager.  As a customer, I have stumbled on numerous situations where fatigue, complexity, and unrealistic expectations have combined into a poor product.  The history of the internet contains plenty of companies which had a few pixels and an unhealthy dose of hype.

Technology professionals have lived in that world since the early 1990s, and you can excuse them for being suspicious of new approaches to doing things.  For every Amazon.com there are hundreds of companies like Pets.com.  So bringing ideas like Test Driven Development, S.O.L.I.D. programming and Agile is going to face resistance.  As a scrum master or coach, I recommend you begin slowly introducing concepts letting people test out an idea to get comfortable with them.  It also helps if you understand and recognize the pressures the team faces.  Are they distracted by requests which are urgent but not important?  Do you have a healthy cadre of product owners or is the role being performed by a manager?  Finally, are they working with a brittle technology stack? Answering those questions will determine how fast you can go during your agile transformation.

Software development is not rocket science.  It is a challenging field prone to error and burn-out.  Only by paying attention to individual challenges each software development team faces can they be coached into an agile way of doing things.

Monday, January 15, 2018

Smoke detectors explain technical debt

Should have checked the smoke detectors.
Since the holidays, I have made a point to spend time with people outside the technology field.  This experience has been beneficial because I spend my time explaining what a scrum master does and how we do it.  This review of the basics is allowing me to reflect on work and how to make it better.  It is a fresh perspective which has allowed me to look at old concepts in a different light.  This week I want to revisit technical debt.

I own my own home.  Since I am a homeowner, I have smoke detectors.  These little battery powered devices warn me when there is a fire or when I am burning a roast on the stove.  So smoke detectors offer protection to a homeowner so they can escape the house quickly and call the fire department.  Smoke detectors are so useful you receive a discount on your home insurance if you have one, and, in some municipalities, you are required to have at least one in your home.

Smoke detectors have one significant drawback; they are battery operated.  When the batteries run out, and a fire breaks out you are helpless.  The smoke detector companies fix this by forcing the alarms to “chirp” which is a friendly reminder to change the batteries.  This week, I awoke to my smoke detector “chirping” at 2 AM in the morning.  Like many men my age, I attempted to roll over in bed and ignore the situation.  Ninety minutes of insomnia later, I wandered the house searching for batteries to replace the faulty one in the “chirping” detector.

The next morning over an extra cup of coffee, it occurred to me that I treated my smoke detector like many organizations treat technical debt.  I do not change batteries until I have to and usually it is at an inconvenient time.  Fortunately, being a former boy scout, I was prepared with batteries in an easy to find location.  I swapped out the batteries and went back to bed. 

If you are a homeowner, you have four strategies to deal with smoke detectors.

  1. Change all the batteries at once typically during daylight savings time.
  2. Change individual batteries when they run out of charge and begin chirping.
  3. Ignore chirping smoke detectors until you get fed up and change the battery.
  4. Remove and disconnect all the smoke detectors and hope you never have to deal with a fire.

As a homeowner, I use strategy two and three.  I know others who use the other two approaches.  Swap out smoke detectors and batteries, and you have the four classic strategies companies use to address technical debt.

The most efficient way to deal with technical debt is to follow the first strategy by changing batteries and updating systems on a regular basis.  By doing this, you reduce expected outages.  Agile and scrum encourage this approach.

Many CIO’s and managers I know consider this madness because there is a not enough time, money or people to keep updating systems.  It means they rely on strategies two and three.  It may be suitable for a chirping smoke detector on a cold night but is lunacy for a multi-billion dollar enterprise.  It creates situations where firms could lose millions of dollars while they wait to bring systems back.   

So the next time someone looks are you funny when you talk about technical debt just explain it to them like changing the batteries in a smoke detector.

Until next time.

Monday, October 16, 2017

Product Owners Need to do the Damn Job.

A product owner and scrum master
should be equal professionals committed to the same goal.
I have spent the last two weeks talking about technical debt.  It is an important topic to me and has been a significant issue during my career.  My agile journey has had three themes; mitigation of technical debt, organizational change, and teaching product owners to be more successful.  This week I want to talk about product ownership. 

The most significant frustration of my career as a scrum master and agile coach is how I have been unable to work with a real product owner.  I am spending most of my time training former business analysists on how to do the job or working with someone who is doing the job in a “part-time” fashion.  I even had a product owner say they were not responsible for software delivery just requirements.  This kind of experience causes me to consume alcohol and have an unhealthy relationship with food.

I learned that this was not just my grievance.  At the Agile Coaches Symposium in Chicago, a common point of discussion was the state of product ownership.  On the last day of the conference, I hosted an open space where I asked other coaches what I could do to improve the performance of my product owners.  It was a good discussion, but we centered on theory rather than practical approaches. 

I hit my emotional wall when I asked product owners to prepare a list of stories for the developers, so they do not go into sprint planning unprepared; the product owners greeted me with blank stares and then complaints I was creating “busy work.”  It was at that moment when I realized agile, and scrum could not change lazy or ambivalent business partners.  I wanted to scream. Since that moment I reviewed my blog post about the difficulty of being a product owner.  I also took time out to reread Roland Pilcher’s book on product ownership.  Being a product owner is the most challenging job in Agile, but it is still a job.  As a professional, you should aspire to do it well.

An agile team with a lousy product owner is like an airplane with a weak pilot.  You might reach your destination, but there is no guarantee you will arrive safely and in comfort.  An inferior product owner is not going to deliver business value, and they are going to miss numerous deadlines.  It makes it the interest of your organization to make product owners competent and capable.  If the people tasked with the responsibility are not interested then someone else needs to fulfill the role.  If not, your organization deserves a spectacular crash. 

Until next time.


Monday, October 2, 2017

Fix Technical Debt NOW!

Technical debt is a lot like leaky pipes
I am very fortunate to spend time with smart people.  The day goes by faster when you spend it with intelligent and capable people.  One of those talented people is a former colleague of mine, Larry Gassik.  He was asking me a few questions about technical debt, and it occurred to me that I have not shared many of my thoughts about it on the blog.  Technical debt is becoming a growing concern in the agile community as more teams expand into enterprise systems and confront legacy code.  This week a brief conversation on technical debt.

When I think about technical debt, I use the metaphor of plumbing.  Indoor plumbing has existed since Roman times, but its innovations only became global in the 20th century.  Thanks to plumbing, the spread of cholera have ended.  Indoor plumbing has given millions of people clean drinking water and helped reduce pollution.  Plumbing is so ubiquitous that the only time we notice it is when it is not working.  When a toilet backs up or a pipe bursts, we become very aware of the effects of plumbing on our lives.

The “if it isn’t broke don’t fix it,” attitude we have about plumbing is prevalent in the business world.  Many business professionals in the corporate world are focused on shareholder value and profitability.  When business professionals think about technology, it is either an expense or inconvenience.  It is why many organizations have not made the switch to cloud-based systems and use old versions of Microsoft office.  To them, the investment of money is not worth the rate of return.  The reality is that not paying attention to older technology systems is just as negligent as ignoring the maintenance of your home; you risk broken pipes and greater expenses caused by water damage.

The technology of pipes and plumbing has changed over the centuries to be safer and less expensive.  In Roman times, pipes were lead.  The contaminated drinking water caused outbreaks of “Saturnism” which was a polite term for lead poisoning.  Terracotta pipes followed, but those broke down over time thanks to tree roots and natural decay.  Iron pipes came along, but they were brittle and caused water to be rusty.  Copper pipes came along and have been a good solution, but they are expensive and require welding which creates maintenance a problem.  Today, most new construction relies on PVC pipes because they do not corrode, are inexpensive, and easy to maintain.  If the materials of plumbing can change so radically, image what is happening with technology moving at the speed of the internet.

Forty years ago, while the Sex Pistols were singing “Anarchy in the U.K.” there was no personal computer market in the United States.  Microsoft and cellular phones did not exist, and a modem was fast if its speeds were 300 bits per second.  Mainframes dominated computing, and most business transactions were done over the phone or in person.  Compare that business environment to smartphones, personal computers, and Gigabit speed internet we have today.  There is no credible way the technology of 1977 could support the needs of business today.  The difference between the needs of the firm and the ability of the technology to support the business is something agile professionals call technical debt.

Technical debt is cancer threatening to metastasize and kill the business.  Here is how it happens.  Slow or ineffective systems undermine customer confidence.  Weak confidence means less use and less use guarantees less money for the company to maintain the system.  Less money translates into slower time to market for new features and updating the system.  It means employees and IT professionals will take shortcuts to bypass the pokey system.  With the system jury-rigged to address business problems, it becomes more expensive to maintain, and improvements take longer to roll out.  Finally, you create a situation where the system fails, and it does not provide benefit to the business.  If you pay attention to the technical debt, you can avoid this kind of failure.

A business with significant technical debt will have trouble attracting talent.  Computer professionals being smart, know what technologies the market supports, and they are terrified of having skills which are obsolete.  It means they will gravitate to businesses and projects which have a smaller portion of the technical debt.  It also means that college graduates will avoid working for companies with old technologies.

Technical debt is the difference between what the business needs and what they technology systems support.  If you do not address technical debt, it is a threat to the success of the firm.  Finally, the mitigation of technical debt is no different than routine household maintenance.  Do the right thing and focus on technical debt before the pipes burst in your business.

Until next time.