Skip to main content

Why Facebook keeps going down

Why Facebook keeps going down

/

More highlights from leaked audio inside Facebook

Share this story

Photo by Amelia Holowaty Krales / The Verge

The 2010 film The Social Network was almost entirely made up out of whole cloth, but one point made early in the film continues to ring true. Facebook has long had an obsessive focus with keeping the site up no matter what strain its servers were under, out of sheer competitive fear that users who couldn’t access the site would turn elsewhere and never return. 

That focus on keeping the site up helped Facebook to be competitive in its early years, when rivals like Twitter were routinely sidelined by melting servers. But this year, well into Facebook’s decade of dominance, things have taken a turn for the worse. In July, the company experienced a day-long outage across Facebook, Instagram, and WhatsApp. That followed the company’s worst-ever outage in March, which lasted more than 24 hours.

As I wrote here in July, outages like these are becoming more serious. As Facebook increasingly positions itself as a core part of the world’s communications infrastructure, a day-long outage can have serious consequences — especially if one were to take place during a catastrophe.

So what’s going on?

In July, Facebook’s official explanation for that outage was that routine maintenance had “triggered an issue.” The full story is more interesting — and Mark Zuckerberg shared it with employees during some of the leaked audio that we began sharing here last week. So today, here’s the full explanation why Facebook keeps going down. Of note here: the basic answer is that Facebook’s massive size means that even small changes have hugely unpredictable effects, and can bring down the entire network.

So here’s Zuckerberg’s answer in full. He’s joined on stage part way through by Santosh Janardhan, vice president of engineering. The answer is highly technical — and involves terms like “storm testing,” “traffic drains,” and “slope testing” for which Google offers little explanation. But the basic answer is clear: Facebook ran some tests, and the tests knocked the system over. (The transcript has been lightly edited for clarity.)

Question: We had several major outages this half. Is our reliability becoming a problem? What is the overarching root cause and how can we fix that?

Mark Zuckerberg: I’m glad that this is the top question, because it’s something I’ve been thinking about a bunch. We’ve had more downtime this year than the last few years combined. And it is an issue, and especially as we move towards more services in the private social platform area around messaging, that’s such a core utility to people that it’s really important that these services are reliable. Even from just a competition standpoint, what we see is that when we have downtimes in WhatsApp or Instagram Direct, there are people who just don’t come back. They may move their messaging behavior over to iMessage or Telegram or whatever the service is and that’s kind of it. 

And then it takes months to fight and earn back people’s trust and usage of our services. So yes, it’s a big deal. We’re doing worse on this now than we were before. We need to focus more on this ... There have been a few different outages recently ... But at a high level they come from different areas. So it’s not that there’s one technical being, except that just the complexity of the systems is growing. So things that previously would have just been a blip are now things that are causing systems to fall over, and we’re going to need to change the way that we react to that and change, focus a little bit more on reliability in the systems that we’re engineering. So this is going to be more of a focus. We have to get this right. It’s not that it’s currently in a very bad place, but it’s certainly trending worse than it should be. And we need to make sure we do better on this.

Santosh Janardhan: I’ll add some color to what Mark was saying. First off, I do want to make sure everybody relaxes. We take this very, very seriously. Now if you think about how we run the site, one of the things we end up doing frequently is that we do a lot of testing. ... Things like this help us understand the limits of the system and help us make the service a lot more resilient.

... One of the risks that we run when we run these tests is that we risk pushing our system just a little over the edge so that it fails in ways that we didn’t anticipate or plan for. Now this is exactly what happened last week. We were running a load test on … one of our biggest data centers. And we just pushed it over the edge, in our store which is where we store our photos, our videos, our Messenger attachments, your stickers, things of that nature. And it went into a series of cascading failures.

Now when this happens the recovery becomes long, complex, and the mitigation takes time, which is what ended up happening. I want to touch a little bit on the reliability theme that Mark was alluding to. We will do the short-term things here, do the logs, errors, graphs, put together an error or two and fix the technical issues in the short term. What we are grappling at this point is, are you coming to an inflection point in complexity that ... We’re still thinking through how to approach things.

For example, in the outage that happened last week, some of our tools and monitors that are designed to help us exactly deal with this actually failed us. They prolonged the outage. ... So we are dealing with a little different beast at this point. So what are we going to do about this? Two different work streams.

One is that we’re going to do something to literally tackle complexity. We’re going to create and augment new tools. We’re going to do failure testing so that we identify the dependency graphs and run a bunch of bugs. Second is, actually, we want our teams to focus more on what I call fast and graceful recovery. This is something that we have not focused on before. And the last thing here is that this is going to take a little bit of time. We arguably, if you look at across our family of apps, are probably running the busiest online destination on the planet right now. So busy. And many have to tackle complexity that at the same time keep the site humming along. This is going to take some orchestration. We’ll get there. Just bear with us.

The Ratio

Today in news that could affect public perception of the big tech platforms.

⬆️Trending up: Microsoft successfully thwarted an effort by the government of Iran to hack into the email accounts of the Trump campaign.

⬇️Trending down: Facebook’s Libra initiative appears to be on the rocks after PayPal pulled out. The announcement hit especially hard given that David Marcus, who is running Libra for Facebook, was the top executive at PayPal before coming to Facebook.

Governing

Facebook shut down 200 accounts associated with coordinated inauthentic behavior targeting Iran and Qatar. The accounts appeared to be professionally run by PR firms based in the Middle East and Africa. The company released a statement about the operations, noting they were focused on spreading propaganda and political news. Jane Lytvynenko and Logan McDonald report at BuzzFeed:

The UAE–Nigeria network spent close to $150,000 promoting its content on Facebook, and attracted close to 1.4 million followers for the associated pages, according to the Facebook announcement. The Instagram profiles were followed by nearly 70,000 people.

The action by Facebook today reinforces how malicious actors work across different platforms, run by different companies, to create coordinated disinformation operations, and can remain active even after major platforms take action against them. The UAE–Egypt–Nigeria network also demonstrates how outsourcing of digital trolling to marketing and PR firms is an increasingly popular way to conceal who’s behind information operations.

Microsoft said hackers backed by the Iranian government tried to gain access to the email accounts of people involved in the Trump campaign, to influence the 2020 election. The company released a statement about the hacks, noting only four accounts were successfully compromised. (Nicole Perlroth and David E. Sanger / The New York Times)

The debate over net neutrality (whether internet providers should have to treat traffic from all sites equally) was shaped by a groundswell of online comments — many of which turned out to be fake. A BuzzFeed investigation revealed the decision to scrap the Obama-era rule was shaped in part by a massive manipulation campaign involving political impersonation. (Jeremy Singer-Vine and Kevin Collier / BuzzFeed)

Rudy Giuliani was briefly kicked off Twitter for tweeting a Ukrainian official’s phone number, essentially doxing him. Twitter said the tweet violated its policy on sharing private information. (Makena Kelly / The Verge)

Someone tried to hack into Voatz, an app that lets soldiers from West Virginia vote overseas. An FBI investigation is ongoing, but it looks like the culprit might’ve been a student researching security vulnerabilities in the app rather than trying to change votes. (Kevin Collier / CNN)

A fascinating look at the 2020 election campaign through the lens of how much candidates are spending on Facebook ads, and how often people are searching the word “impeachment.” (DemCast USA)

Elizabeth Warren is fundraising off our leaked audio from Facebook’s internal meetings this summer. In a series of Facebook ads showing Zuckerberg’s face, Warren’s campaign wrote “Mark Zuckerberg considers me an ‘existential’ threat to Facebook.” (Taylor Hatmaker / The Daily Beast)

A new study shows 62% of Americans think social media companies have too much control over the news, and 55% think their efforts result in worse news offerings for users. Unsurprisingly, Republicans are the most skeptical, according to Pew Research Center. (Eliza Shearer and Elizabeth Grieco / Pew Research Center)

The Chinese government is successfully using pop culture and social media channels to spread propaganda. (Li Yuan / The New York Times)

Contrary to popular belief, the vast majority of deepfakes are pornographic, not political, in nature according to a study from cybersecurity firm Deeptrace. Almost all of them explicitly target women. (Joseph Cox / Vice)

California Gov. Gavin Newsom is trying to fix both those problems with two new laws aimed at deepfakes. The first, AB 730, makes it illegal to distribute political deepfakes within 60 days of an election. The second, AB 602, gives Californians the right to sue someone who creates deepfakes that place them in pornographic material without consent. (Carrie Mihalcik / CNET)

We should stop using free speech as an excuse not to do something about people who promote bigotry and violence, argues Andrew Marantz. He has a new book out about online extremism in which he argues for for more public media and content moderation. (Andrew Marantz / The New York Times)

Industry

PayPal pulled out of the Libra Association — the nonprofit association that governs Facebook’s planned cryptocurrency. The news comes amid reports that Mastercard, Visa, and Stripe are also considering withdrawing their support. Nick Statt has a spicy quote from Libra’s policy and communications head at The Verge:

Later on in the evening, Facebook’s communications team sent along another statement from Disparte, in which the blockchain project’s policy chief appears to criticize PayPal for not having the “fortitude” to stick with something as difficult and demanding as Libra.

“It requires a certain boldness and fortitude to take on an endeavor as ambitious as Libra — a generational opportunity to get things right and improve financial inclusion,” Disparte writes. “The journey will be long and challenging. The type of change that will reconfigure the financial system to be tilted towards people, not the institutions serving them, will be hard. Commitment to that mission is more important to us than anything else. We’re better off knowing about this lack of commitment now, rather than later.”

Meanwhile, Apple CEO Tim Cook said Facebook should never have developed Libra at all. “A private company shouldn’t be looking to gain power this way,” Cook told a French newspaper. (Nick Statt / The Verge)

Elsewhere, Cook has been cozying up to Trump to score policy victories. Recently, the Apple CEO was able to get iPhones exempted from steep tariffs. He’s also as an adviser to the administration’s workforce policy board, but hasn’t faced the backlash for this that other executives have. (Tripp Mickle / The Wall Street Journal)

Snap CEO Evan Spiegel says that widespread adoption of augmented-reality smart glasses is still 10 years out. In August, Snap announced the $380 Spectacles 3, the latest version of the company’s camera-equipped glasses. (Salvador Rodriguez / CNBC)

Google is considering buying Firework, an app for users to create and share 30-second videos with strangers. The move could help Google compete with TikTok, though Firework is aimed at a slightly older audience. (Georgia Wells and Rob Copeland / The Wall Street Journal)

TikTok has been paying some influencers to create posts and share them on other social networks. The Indian government is investigating whether the company’s approach violated Indian laws around editorial influence on works for hire. (Venkat Ananth and Patanjali Pahwa / Electronic Times)

Ex-HQ Trivia host Scott Rogowsky has a new job co-hosting ChangeUp, a live show from MLB’s New Jersey studios, for the streaming platform DAZN. I think it’s fair to say that Quiz Daddy’s new show doesn’t have quite the same buzz about it. But then, neither does HQ. (Jacob Feldman / Sports Illustrated)

Lele Pons is credited with coining the phrase “do it for the Vine” on that now-defunct product. Now, she’s a successful influencer on Instagram and YouTube, exemplifying a specific type of social media personality that’s bubbly but also generic. (Sarah Ellison / The Washington Post)

Tyler “Ninja” Blevins says he left Twitch because the Amazon-owned streaming service placed too many restrictions on the brand deals he could cut elsewhere. Blevins moved to Microsoft-owned Mixer in August. He’s since scored deals with companies like Adidas and has a small role in Ryan Reynolds’ new movie, Free Guy. (Julia Alexander / The Verge)

And finally...

The FBI is running Facebook ads targeting Russians in Washington

Here are just two perfect sentences from Donie O’Sullivan and David Shortell at CNN:

The FBI is running ads on Facebook in the Washington DC area seemingly designed to target and recruit Russian spies as well as those who know about their work, CNN has learned.

One ad seen by CNN features a stock photo of a young woman at her graduation with her family. Russian text overlaid on the image reads, “For your future, for the future of your family.”

Talk about turnabout being fair play! Hats off to the FBI here. The ads are targeted at people in Washington, DC, and some can be seen through a Facebook ad tracking tool. Enjoy!

Talk to us

Send us tips, comments, questions, and your Facebook outage stories: casey@theverge.com and zoe@theverge.com.