Bad Luck or Bad Management?
What a mess. But at least it's a familiar mess.
Hundreds of airline passengers were stranded in dozens of airports throughout the eastern United States Tuesday, after a Federal Aviation Administration computer system in Atlanta abruptly failed.
Travelers in some of the nation's largest hubs, including Atlanta and Chicago airports, were hit hardest. Some planes were delayed several hours after the National Airspace Data Interchange Network—or Nadin, which manages and coordinates flight plans in and out of U.S. airports—suffered a software glitch around 1:30 p.m. E.ST. The F.A.A. said bad weather triggered the problem.
All of the flight plans normally handled in Atlanta, the F.A.A. said, were rerouted to a backup Nadin system in Salt Lake City. But pilots and dispatch centers kept "repeatedly refiling" flight-plan data into the F.A.A.'s system, overloading the Salt Lake facility, said Diane Spitalieri, an F.A.A. spokeswoman.
"This was a failure mode we've not seen before," said Hank Krakowski, chief operating officer of the F.A.A.'s Air Traffic Organization division. "It looks like an internal software processing failure. But we're going to have to do some forensics on it to figure out exactly what the failure mode was."
Tough luck, you might say. Except it echoes a long history of tech mishaps at the F.A.A. Indeed, a similar snafu happened just last Thursday.
An F.A.A. bulletin distributed to employees said "a Nadin failure last evening caused more than 100 delays after flight plans were rejected. The legacy Nadin in Atlanta crashed. Salt Lake City took over but had problems with the high queue level."
Even worse, the same dynamic happened 14 months ago, leaving plenty of time for forensics. The Atlanta computer system went down in June 2007, and before the sister facility in Salt Lake could pick up the slack, it was overwhelmed. Six months later, the F.A.A. still had no clue what hit it.
With twenty-twenty hindsight, the F.A.A. should be able to respond quickly, right?
"They didn't even acknowledge to us there was an outage today," said Doug Church, communications director of the National Air Traffic Controllers Association. "They were still telling our controllers everything was fine today, when we knew it wasn't."
On the hotline that the F.A.A. set up for reporters, officials tried to explain what was happening and how it was responding. One claimed that delays in Atlanta were running as long as 90 minutes Tuesday afternoon, while the agency's own website showed it was more than two hours.
Asked whether the software failure in Atlanta, or the subsequent failure of the Salt Lake facility to take on all of the rerouted flight plans, was due to the software programming of an outside vendor, a server problem, or even human error, F.A.A. spokeswoman Laura Brown would only say that the agency would be looking into the situation.
Brown said that the software snafu happened while the F.A.A. was upgrading Nadin. Its mission-critical system appears to rely on Microsoft Windows, and who of us has never suffered a computer outage while upgrading to a new version Windows?
"This is an agency that is pretty much out of control right now," said Church. "They've failed on big stuff like H.R. staffing, so how can they handle stuff like informing controllers what's going on?"
For its part, the F.A.A. was clear about which employees were bearing the brunt of the burden of the delayed planes. One reporter asked: "How many people were getting skewered by all this?"
The F.A.A.'s Brown responded wryly, "I think it's primarily public affairs."






