clock menu more-arrow no yes mobile

Filed under:

Redline

The many human errors that brought down the Boeing 737 Max

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

The first sign of trouble appeared just after takeoff.

Inside the cockpit of PK-LQP, a brand-new Boeing 737 Max belonging to Lion Air, the stick shaker on the captain’s side began to vibrate. Stick shakers are designed to warn pilots of an impending stall, which can cause a dangerous loss of control. They’re unmistakably loud for that reason.

But the airplane was flying normally, nowhere near a stall. The captain ignored it.

About 30 seconds later, he noticed an alert on his flight display — IAS DISAGREE — which meant that the flight computer had detected a sensor malfunction. This required a bit more attention.

A modern-day passenger airplane is less like a racecar and more like a temperamental printer: you spend more time monitoring and checking systems than you do actually driving the thing. So the captain passed control of the aircraft to the first officer and began the troubleshooting process from memory.

Like all commercial aircraft, the Boeing 737 Max has multiple levels of redundancy for its important systems. In the cockpit, there are three flight computers and digital instrument panels operating in parallel: two primary systems and one backup. Each system is fed by an independent set of sensors. In this case, the captain checked both instrument panels against the backup, and he found that the instruments on his side — the left side — were getting bad data. So with the turn of a dial, the captain switched the primary displays to only use data from the working sensors on the right side of the airplane. Easy.

All of this took under a minute, and everything appeared to be back to normal.

At 1,500 feet of altitude, the takeoff portion of the flight was officially complete, and the first officer began the initial climb. He adjusted the throttle, set the aircraft on its optimal climb slope, and retracted the flaps.

Except the airplane didn’t climb. It lurched downward, its nose pointed toward the ground.

The first officer reacted instinctively. He flicked a switch on his control column to counteract the dive. The airplane responded right away, pitching its nose back up. Five seconds later, it dove once again.

The first officer brought the airplane’s nose up a third time. It pitched back down.

There was no memorized checklist that seemed to apply to this situation, so the captain reached for the airplane’s Quick Reference Handbook (QRH). The QRH is a series of simple checklists that are designed to help pilots rapidly assess and manage “non-normal” situations. The idea is that Boeing has thought of every conceivable thing that might happen to one of its airplanes, and it has included all of them in the QRH. Basically, it’s more troubleshooting.

But nothing in the QRH seemed to apply, either.

Over the next six minutes, as the first officer struggled to control the airplane and the captain searched for the right checklist, PK-LQP climbed and dove over a dozen times. At one point, the airplane pulled out of a 900-foot dive at an airspeed of almost 375 mph, which is uncomfortably close to the 737’s “redline” of 390 mph.

The flight crew had to figure something out fast before they lost control of the airplane.

Then the third person in the cockpit, who was technically off-duty, “dead-heading” to his next assignment, reportedly spoke up.

What about the runaway stabilizer checklist?

It was a shot in the dark, another checklist. “Runaway trim” occurs when some kind of failure causes an airplane’s horizontal stabilizer to move — or “trim” — when it shouldn’t be moving at all. Usually, this creates a constant up or downforce that the flight crew has to try to counteract for the remainder of the flight. It’s kind of like trying to drive when your wheels are out of alignment.

Runaway stabilizer checklist for the Boeing 737 Max.
Image: Preliminary Aircraft Accident Investigation Report, Lion Air flight 610

PK-LQP’s problem was a little different. It was intermittent, temporarily reversible, and it wasn’t even clear if the horizontal stabilizer was causing the problem. But they were running out of options. They followed the checklist and flipped the STAB TRIM switches to CUT OUT on the center console.

The airplane stopped pitching down. Five seconds passed. Then five minutes. Once again, PK-LQP was under their control and out of danger.

An hour later, Lion Air flight 043 landed in Jakarta, Indonesia, only a few minutes delayed. Following standard procedure, the captain reported the episode to the airline, and the airline’s maintenance team checked for serious equipment failures, finding none.

The following morning, PK-LQP, operating as Lion Air flight 610, took off at 6:20AM local time on its way to Pangkal Pinang, Indonesia. Its stick shaker activated just after takeoff. It threw multiple errors on the flight display. It dove just after the flight crew retracted the flaps. And it relentlessly activated its automatic pitch trim in the nose-down direction 28 times over the course of eight minutes.

This time, there was no third pilot to help the flight crew.

PK-LQP may have reached 600 mph, faster than a Tomahawk missile, as it plunged into the water. It was the first 737 Max accident in its 18 months of service.

Lion Air Flight Telemetry

To industry outsiders, it was a shock. What could have brought down one of Boeing’s newest, most technologically sophisticated airplanes? But those closer to the airplane’s development knew better: there had been warning signs from the start.

The Verge spoke to a dozen pilots, instructors, engineers, and experts about the 737 Max and its development, rollout, and the two crashes that have claimed the lives of 346 people. What emerged was a story of cascading failure — the many small human errors at every phase of the airplane’s design, certification, and operation process. Those errors came to a terrible and deadly climax in the skies above the Java Sea in October 2018 and above the Ethiopian countryside five months later.

The story of the Max is ultimately the story of the Darwinian business cycle where mature companies like Boeing face constant threats from new products, new competitors, and the search for new growth. Sometimes this motivates them to new heights of innovation and progress. Other times, it prompts them to pull everything back in the name of cost-cutting.

The events that led to these two fatal crashes were set in motion nearly a decade ago, and they started not with Boeing, but with the company’s European archrival, Airbus.

Boeing’s 737 and Airbus’ A320 are the two main players in the massive — and massively profitable — market for narrow-body passenger jets. Together, both airplanes comprise nearly half of the world’s 28,000 commercial airliners. Chances are that if you’ve ever flown anywhere at all, you’ve flown on one of them.

Both manufacturers are locked in a race to make their airplanes cheaper for airlines to operate, especially when it comes to fuel.

In 2018, for instance, Southwest Airlines’ fleet of 751 Boeing 737s burned through 2.1 billion gallons of fuel at an average cost of $2.20 per gallon for a total of $4.6 billion. A 1 percent increase in fuel efficiency would save $46 million. That’s nothing to sneeze at, even for a company that earned $2.5 billion in net profit.

So Airbus and Boeing constantly tweak their airplanes to squeeze single percentage-point gains out of them. But complete overhauls are rare: the 737 last received one in 1997, with the debut of the third-generation 737NG, while the A320 hadn’t been refreshed since its launch in 1988.

Then, on December 1st, 2010, Airbus stunned the aviation community. In secret, it had developed a more efficient version of the A320 called the A320neo (which stands for “new engine option”). It would burn about 6 percent less fuel than the 737NG. That was a stunning leap in fuel efficiency, delivered at a time when the price of jet fuel was at a near-record of $2.50 per gallon.

Airlines loved it. The following summer at the 2011 Paris Air Show, the aerospace industry’s equivalent of Black Friday, Airbus sold a record-setting 667 A320neos in the span of a week. That was more orders than the 737s had received in the entirety of 2010.

Boeing was caught flat-footed. It had spent four years debating the future of its narrow-body jet program, and it still did not have an answer to its most basic question: whether Boeing should make a brand-new design or revamp the 737 yet again.

In the face of the existential threat from the A320neo, Boeing’s execs made up their minds in a matter of weeks. The company would launch a fourth-generation 737, and it would do it in record time.

The 737 Max was, plain and simple, a stopgap measure.

Boeing could save billions of dollars in engineering costs by basing the Max off of the 737 platform. That gave the company a head start on design and engineering work — enough, Boeing hoped, to allow the Max to enter service just months after the A320neo.

But the project’s engineers would have to overcome some monumental challenges in order to deliver on time. The first was the 737 platform itself. It would take a considerable amount of work to update a 46-year-old design with all of the technology it needed to be just as efficient as the competition.

“The 737 was conceived in the 1960s as what today we would call a regional jet, and with every variant, they’ve pushed and pushed the thing to the end of its envelope,” says Patrick Smith, an airline pilot and blogger at Ask the Pilot. “It makes you wonder if the platform that they’re working with is just so outdated at this point.”

At the same time, the designers couldn’t update it too much. By law, a pilot can only fly one type of airplane at a time. However, the Federal Aviation Administration allows different models of airplanes with similar design characteristics to share a common “type certificate.” So, for instance, the 737’s three previous generations all have a common type certificate. When you get qualified on one model, you can fly all of them.

This allows airlines with common-type fleets to more easily substitute pilots and airplanes, making their operations more flexible. As a result, many airlines limit themselves to aircraft from one manufacturer over the other. Some, like Ryanair and Southwest, only operate a single type of plane for maximum operational efficiency.

Boeing Holds A Press Conference Addressing The 737 MAX Software And Training Update
A Boeing employee works outside of the cockpit of a Boeing 737 Max 8 airplane in the company’s factory.
Photo by Stephen Brashear / Getty Images

It also incentivizes manufacturers to design aircraft that will earn these common type certifications. But a type certificate is so detailed and comprehensive — covering everything from airplane dimensions to the configuration of the passenger cabin to the way the jet moves and feels in flight — that it can limit the amount of leeway designers have when trying to add a new model to an existing certificate.

The Max, for instance, not only had to be similar to the previous-generation 737NG, which first launched in 1993, but it also had to be similar enough to the 737 Classic from 1980 and the original 737 from 1964. In essence, it had to be a cutting-edge, 21st century airplane that still felt and flew like ones designed when The Beatles were still together.

Boeing gave itself six years to do all of this — a year less than it took to develop the 777, and 18 months less than the 787. To beat Airbus, it would have to break the one unbreakable law of project management: that a development cycle can’t be fast, cheap, and good. If it failed, Airbus could corner a $35 billion market for single-aisle airplanes for a decade or longer.

So Boeing could not afford to fail.

Early signs were encouraging. Two years into development, Boeing promised the Max would be 8 percent more fuel-efficient than the A320neo. Five and a half years in, the FAA granted the Max its Amended Type Certification. Just months later, the program’s chief pilot, Ed Wilson, boasted that pilots rated on previous versions of the 737 could switch to the Max with just “2 ½ hours of computer-based training.”

This was another key selling point for airlines: no expensive classroom time, no costly simulator time. In theory, pilots could read about the Max at home, take a self-administered computer course in the morning, and be ready to fly in the afternoon.

So between its fuel and training efficiency, the Max seemed like a winning prospect for everyone — especially Boeing, which sold a record-breaking $200 billion worth of Maxes before the first prototype took to the skies.

The slick PR campaign masked a design and production process that was stretched to the breaking point.

Designers pushed out blueprints at double their normal pace, often sending incorrect or incomplete schematics to the factory floor. Software engineers had to settle for re-creating 40-year-old analog instruments in digital formats, rather than innovating and improving upon them. This was all done for the sake of keeping the Max within the constraints of its common type certificate.

And many pilots felt that, for the first new 737 in over 20 years, Boeing seemed to be oddly reluctant to prep them for it.

Captain Laura Einsetler, who’s flown for over 30 years, including on 737s, considers an all-computer-based course to be completely inadequate as an introduction to a new airplane.

“I don’t have the schematics. I don’t have the cockpit panels. I don’t have an instructor that I can ask questions to,” she says. “You’re hoping that the first time you see the Max is on a nice clear day. But sometimes it’s not, and you’re showing up at night or in bad weather into an airplane that has all these changes.”

There was something else Boeing hadn’t mentioned about the 737 Max. Eight days after the Lion Air crash, a bulletin appeared on MyBoeingFleet, the company’s online portal for pilots and airlines. It read:

“Boeing would like to call attention to an [Angle of Attack] failure condition that can occur during manual flight only.”

Boeing’s first public acknowledgment of MCAS, via a technical bulletin released after the Lion Air crash.
Image: Preliminary Aircraft Accident Investigation Report, Lion Air flight 610

In bland technical jargon, Boeing described the exact series of events that brought down PK-LQP. The confusing series of alerts. The sudden dives. The fact that this “failure condition” would keep occurring until and unless the crew flipped the STAB TRIM switches to CUT OUT — just like the crew on PK-LQP’s penultimate flight had correctly guessed.

The presence of this system, lurking somewhere in the Max’s software suite, was shocking enough. Even more frightening, Boeing only gave the bare minimum of information to airlines and pilots. The bulletin didn’t give the system a name or explain what it was designed to do in normal operation. It only said that sometimes it malfunctions, and that can crash your airplane.

“It was a little bit like, ‘Ok pilots, good luck with that, figure it out,’” Einsetler says.

For four days, angry pilots and airline officials bombarded Boeing with demands for more information. Finally, on November 10th, another message appeared on MyBoeingFleet:

“Boeing has received many requests for the same information from 737 fleet operators,” it read.

At last, Boeing admitted what the world had feared: something was fundamentally wrong with the brand-new 737 Max.

The culprit was the Maneuvering Characteristics Augmentation System (MCAS). Like the 737 Max, MCAS was made to be a stopgap.

The Max was designed around a new set of engines called LEAP-1Bs. These are much more efficient than the engines on the 737NG, but they are also much heavier and larger.

This created a design problem. The engines on the NG sit only 18 inches off the ground, and mounting the LEAP-1Bs in the same spot gave them too little clearance during takeoff. So Boeing placed them further forward and slightly higher on the wing of the Max.

That solution created an aerodynamics problem. Due to their size and position, the engines on the Max create lift when the airplane enters a steep climb (or, in aviation parlance, at high angles of attack). This extra lift causes the Max to handle differently than previous versions of the 737, but only when it’s climbing steeply.

That solution created a regulatory problem. In order for different airplane models to share a type certificate, the FAA requires that they all handle the same way. A model of airplane with sensitive controls, like a sports car, can’t share a type certificate with a model whose controls are much more sluggish, like a semi truck. Boeing was concerned that the FAA might consider this enough to give the Max its own type rating, undermining one of its promised selling points.

The right fix was not obvious, says Alex Fisher, a retired British Airways pilot who writes about flight safety. Because the problem only occurred in specific circumstances, Boeing couldn’t just slap an extra set of fins on the airplane and call it a day. Aerodynamic changes “work” all the time and require a lot of design and testing to get just right. Boeing needed something precisely targeted, carefully calibrated, and nonlinear in effect. It needed software.

So MCAS was designed to compensate. It would use an angle of attack (AoA) sensor to detect when the airplane entered a steep climb. It would activate the airplane’s pitch trim system, which is routinely used to help stabilize the airplane and make it easier to control, especially during climb and descent. And it would trim the airplane in modest increments for up to nine seconds at a time until it detected that the airplane had returned to a normal AoA and ended its steep climb. It seems simple enough — on paper, that is.

Engine placement on the third-generation 737 NG (left) versus the MAX (right).

Boeing, meanwhile, defended its previous silence about MCAS.

“Since it operates in situations where the aircraft is under relatively high g load and near stall, a pilot should never see the operation of MCAS,” read a Q&A distributed to Southwest Airlines.

The subtext: pilots were on a need-to-know basis about MCAS, and until the Lion Air crash, Boeing felt that they hadn’t needed to know.

Einsetler strongly disagrees. “We need to have the understanding and knowledge of how everything works on the jet, so that we can command the jet to do what we need it to do, not just be along for the ride,” she says.

“Not a lot of information got out there in a timely fashion,” concurs Juan Browne, a 777 pilot with over 40 years of flying experience. “It almost makes me wonder, did Boeing engineers themselves really understand how much power and authority they built into this system?”

As Boeing burned its bridges with pilots, it sought to repair ties with its primary customers: the airlines.

Within days of the Lion Air crash, Boeing deployed account reps around the world to shore up confidence in the Max. They succeeded: between November 2018 and March 2019, Boeing announced new orders from multiple airlines, and it even managed to talk Lion Air out of canceling its $5 billion order.

The Max continued to fly.

Then, on March 10th, 2019, disaster struck again. ET-AVJ, another 737 Max 8 owned by Ethiopian Airlines, took off from Addis Ababa, Ethiopia, bound for Nairobi, Kenya. In command was Yared Getachew, the airline’s youngest-ever captain. On his right was Ahmed Nur Mohammed, a fairly new first officer.

The stick shaker on the left control column activated just after takeoff. The altitude and AoA indicators on one side of the airplane malfunctioned. About 90 seconds after takeoff, and immediately after the first officer retracted the flaps, the airplane dove unexpectedly.

The Ground Proximity Warning System sounded in the cockpit: “DON’T SINK. DON’T SINK.”

Instinctively, Captain Getachew pulled his control column back to point the nose skyward, then flicked the electric trim switch on his yoke. First Officer Mohammed, meanwhile, radioed air traffic control.

“Break, break, break,” he said. “Request back to home. Request vector for landing.”

Five seconds later, MCAS activated again.

“DON’T SINK. DON’T SINK.”

Captain Getachew again pulled up and again flicked the trim switch. But every time the pilots gained a few hundred feet of altitude, MCAS pushed the airplane right back down again.

It was Mohammed — the pilot whose experience was called “absurdly low” by Chesley “Sully” Sullenberger himself — who correctly diagnosed the problem.

Emergency Services Work At The Crash Site Near Bishoftu Of Ethiopian Airlines ET302 To Nairobi
Emergency services work at the crash site of Ethiopian Airlines flight 302.
Photo by Jemal Countess / Getty Images

“Stab trim cut-out, stab trim cut-out,” he called. Getachew concurred, and Mohammed flipped the switches to disable MCAS.

At over 400 mph of airspeed, the airplane was already past its redline. The crew had just a few hundred feet of altitude to work with, and at that speed and altitude, the aerodynamic forces on the airplane would have been immense, making it difficult to control.

“Pull up! Pull up!” said Getachew, which they did, in unison, dozens of times over the next two minutes. The airplane barely responded. Mohammed tried to adjust trim with the manual crank located on the center console. That didn’t work either.

Almost three minutes after turning the electric trim system off to disable MCAS, the crew reactivated it. They must have believed that it was the only way to get the airplane back into a climb.

The pilots trimmed up twice using their thumb switches, and then MCAS activated one final time. Fifteen seconds later, the airplane crashed at over 500 knots of airspeed into a field near the town of Bishoftu, Ethiopia. None of the 157 people aboard survived.

The reckoning had come for the 737 Max. By the next day, regulators around the world began to ground the airplane.

The United States didn’t follow suit, however. Boeing CEO Dennis Muilenburg reportedly called President Trump to assure him that the 737 Max was safe to fly.

On March 13th, the FAA grounded the airplane anyway. Muilenburg admitted that MCAS was directly responsible for both crashes and promised that Boeing would fix its broken system. “It’s our responsibility to eliminate this risk,” he said. “We own it and we know how to do it.”

But why had nobody caught it in the first place? The answer might be infuriatingly simple: nobody read the paperwork.

Although the FAA is responsible for the safety of any airplane manufactured in the United States, it delegates much of the certification to the manufacturers themselves.

It has to in order to get anything certified at all, says Jon Ostrower, editor-in-chief of The Air Current and a former aviation reporter for The Wall Street Journal. Boeing already has the people and the expertise, it pays better, and it isn’t susceptible to government shutdowns. The FAA, meanwhile, says it would need 10,000 more employees and an additional $1.8 billion of taxpayer money each year to bring certification entirely in-house.

But there’s a difference between delegation and total submission.

During the Max’s certification process, FAA managers pressured their teams to delegate as much as possible back to Boeing. When Boeing looped the FAA back in for review, “there wasn’t a complete and proper review of the documents … review was rushed to reach certain certification dates,” according to one FAA certification engineer.

The results of that rushed review are clear.

The FAA’s general process for identifying and mitigating risk.
Image: FAA Order 8040.4B, “Safety Risk Management Policy”

Whenever it adds a new airplane to a type certificate, the FAA lists where that airplane does or does not differ from other models in the same type. In the case of the 737 Max, the FAA’s list extends to 30 pages, reviewing everything from engine noise to de-icing systems, aluminum fatigue to security doors.

Yet this document dedicated to minutiae does not mention MCAS once — not by name, not by description — which is kind of astonishing when you consider that even the seat belts get a mention.

The FAA overlooked MCAS in other places, too.

As part of its certification review, the FAA assigns a “failure condition” to each system, which is basically a guess as to what would happen if that system were to break. The lowest-severity systems should only cause “some inconvenience” to passengers, while the more serious “hazardous” and “catastrophic” failure conditions can endanger the aircraft and its passengers. The more severe the failure condition, the more redundancies that system is supposed to have.

At least, that’s the theory. MCAS received a “hazardous failure” designation. This meant that, in the FAA’s judgment, any kind of MCAS malfunction would result in, at worst, “a large reduction in safety margins” or “serious or fatal injury to a relatively small number of the occupants.” Such systems, therefore, need at least two levels of redundancy, with a chance of failure less than 1 in 10 million.

MCAS, however, does not meet any of these standards.

It has no redundancy: it takes input from just one AoA sensor at a time. That makes MCAS completely unable to cope with a sensor malfunction. It can’t “sanity check” its data against a second sensor or switch to a backup if the original source fails. It just believes whatever data it’s given, even if that data is bad, which is what happened on Lion Air flight 610 and Ethiopian Airlines flight 302.

It gets worse: over the last five years, 50 flights on US commercial airplanes experienced AoA sensor issues, or about one failure for every 1.7 million commercial flight-hours. Sure, that’s a low rate, but it’s still nearly six times above what the FAA allows for “hazardous” systems: they’re only supposed to fail once every 10 million flight-hours.

The FAA’s definition of “acceptable” versus “unacceptable” risk, and where angle of attack sensor failure falls on that spectrum.
Image: FAA Advisory Circular No. 25.1309-1A, “System Design and Analysis”

Worse still: the FAA did not catch the fact that the version of MCAS actually installed on the 737 Max was much more powerful than the version described in the design specifications. On paper, MCAS was only supposed to move the horizontal stabilizer 0.6 degrees at a time. In reality, it could move the stabilizer as much as 2.5 degrees at a time, making it significantly more powerful when forcing the nose of the airplane down.

“Although officials were aware of the changes,” The New York Times reported, “none were fully examined by the FAA.”

So had anyone checked, they might have flagged MCAS for one of several reasons, including its lack of redundancy, its unacceptably high risk of failure, or its significant increase in power to the point that it was no longer just a “hazardous failure” kind of system.

When asked for comment, the agency said, “The FAA’s aircraft certification processes are well established and have consistently produced safe aircraft designs.”

Boeing defended the process as well. “The system of authorized representatives — delegated authority — is a robust and effective way for the FAA to execute its oversight of safety,” a spokesperson told The Verge.

But that system only works when someone actually reads the paperwork.

In a strange way, the 737 Max’s story is less about what did happen and more about what didn’t. Nobody did anything criminal. Nobody did anything malicious. Nobody did anything wrong, in a strictly technical sense.

In fact, when viewed in business terms, Boeing did everything right. Between 2011 when the Max was first announced and 2018, Boeing’s total annual revenue grew almost 50 percent to $101 billion, its annual profits nearly doubled, and its stock price quadrupled. Its executives personally made tens of millions of dollars in bonuses for hitting their corporate performance targets, thanks, in large part, to the record-setting pace of 737 Max sales.

It’s a perfect example of the cross purposes at which business, technology, and safety often find themselves. With its bottom line threatened, Boeing focused on speed instead of rigor, cost-control instead of innovation, and efficiency instead of transparency. The FAA got caught up in Boeing’s rush to get the Max into production, arguably failing to enforce its own safety regulations and missing a clear opportunity to prevent these two crashes.

Boeing’s bet on the 737 Max now seems to have been badly calculated. Since the two crashes, the company has lost over $25 billion in market cap. It may have to pay billions more to its suppliers and airline customers for costs related to the grounding, and that’s not including the nearly $30 billion in orders that airlines have threatened to cancel. All this for an airplane whose initial development was supposed to be a great value at only $3 billion.

In Ethiopia, meanwhile, the consequences of Boeing’s bet are much less abstract. A week of mourning followed the crash in Addis Ababa. Relatives of the victims flew in from Kenya, Canada, and China. Others who lived in Ethiopia were bused to the capital by the airline.

Three days after the crash, hundreds of mourners and 17 empty caskets proceeded through the streets of Addis Ababa, ending at the Holy Trinity Cathedral.

At the crash site itself, relatives set up an arch wreathed in flowers as a memorial, under which they placed photographs of their loved ones. The airplane had struck the ground with so much force that there were no identifiable remains. Instead, families received bags of soil from the surrounding fields.

Correction: We’ve updated our discussion of the failure rate of angle of attack sensors. It now refers to failures per total flight-hours rather than total flights, and only considers commercial passenger flights (not cargo or private flights) in its calculation.