Facebook is back online after a massive outage that also took down Instagram, WhatsApp, Messenger, and Oculus

Illustration by Alex Castro / The Verge

Just as Facebook’s Antigone Davis was live on CNBC defending the company over a whistleblower’s accusations and its handling of research data suggesting Instagram is harmful to teens, its entire network of services suddenly went offline.

The outage started just before noon ET and took nearly six hours before it was resolved. This is the worst outage for Facebook since a 2019 incident took its site offline for more than 24 hours, as the downtime hit hardest on the small businesses and creators who rely on these services for their income.

Facebook issued an explanation for the outage on Monday evening, saying that it was due to a configuration issue. On Tuesday afternoon, Facebook engineers offered more detail, explaining that the company’s backbone connection between data centers shut down during routine maintenance, which caused the DNS servers to go offline. These two factors combined in making the problem more difficult to fix, and they help explain why services were offline for so long.

Instagram.com was flashing a 5xx Server Error message, while the Facebook site merely told us that something went wrong. The problem also affected its virtual reality arm, Oculus. Users could load games they already have installed, and the browser works, but social features or installing new games didn’t.

After failing all tests for most of Monday, a test of ISP DNS servers via DNSchecker.org showed most of them successfully finding a route to Facebook.com at 5:30PM ET. A few minutes later, we were able to start using Facebook and Instagram normally; however, it may take time for the DNS fixes to reach everyone.

On Twitter, Facebook communications exec Andy Stone says, “We’re aware that some people are having trouble accessing our apps and products. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.” Mike Schroepfer, who will step down from his post as CTO next year, tweeted, “We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible.”

Inside Facebook, the outage broke nearly all of the internal systems employees use to communicate and work. Several employees told The Verge they resorted to talking through their work-provided Outlook email accounts, though employees can’t receive emails from external addresses. Employees who were logged into work tools such as Google Docs and Zoom before the outage can still use those, but any employee who needs to log in with their work email was blocked.

On Monday we learned that Facebook engineers were sent to the company’s US data centers to try and fix the problem, according to two people familiar with the situation.

A peek at Down Detector (or your Twitter feed) reveals the problems were widespread. While it’s unclear exactly why the platforms were unreachable for so many people, their DNS records show that, like last week’s Slack outage, the problem is apparently DNS (it’s always DNS).

Cloudflare senior vice president Dane Knecht notes that Facebook’s border gateway protocol routes — BGP helps networks pick the best path to deliver internet traffic — were suddenly “withdrawn from the internet.” While some have speculated about hackers, or an internal protest over the whistleblower testifying before Congress, Facebook has blamed the problem on a bug that occurred during routine maintenance.

Update October 4th, 3:37PM ET: Added additional information about the outage.

Update October 4th, 4:15PM ET: Added statement from Facebook CTO Mike Schroepfer, along with internal Facebook updates.

Update October 4th, 5PM ET: Noted outage is still ongoing, added information about the 2019 outage.

Update October 4th, 5:35PM ET: DNS updates suggest Facebook is closing in on a solution.

Update October 4th, 6:08PM ET: Facebook.com is back online.

Update October 4th, 10:29PM ET: Added information about Facebook’s explanation.

Update October 5th, 2:29PM ET: Added more background details on the backbone network problem that caused the outage.


Related:

Comments

Just change it to 8.8.8.8

Or 1.1.1.1

Tried both. Neither are showing records for facebook.

Or 1-800-PP5-1-DOODOO

That’s not how it works.

Pointing your DNS queries to Google (8.8.8.8) or Cloudflare (1.1.1.1) isn’t a solution as the article clearly states that BGP route lists were removed globally for the affected domains. It may have worked temporarily due to stale records, but it isn’t a fix, as the BGP routes were removed from all public DNS providers.

If a phone number is erased from all phonebooks, getting another phonebook isn’t going to help. The routes have to be refreshed/reloaded from the top down..

It’s kind of nice really. Businesses everywhere are likely to see their employee productivity levels increase dramatically during this outage.

UNLESS when your business uses Workplace by Facebook

So… Facebook.

…or literally any advertising, PR, social media, or branding company anywhere in the developed world.

…or any company, small business, or social cause that relies on FB advertising, or any services these feeder companies use.

So no one doing anything useful… I think we can do without these companies doing work indefinitely, and humanity would benefit.

Twitter and Reddit are still up so probably not

Really? Like I use facebook daily, but I never really look at the feed that much.

Employee productivity expectations are already too high in our late-stage capitalist nightmare. Let’s not bring them up even higher, ok?

more likely scenario is that myspace servers are on the brink of crashing, or perhaps people are using the time to buy more halloween decorations.

It is nice, minus the damage it’s doing for FB’s small business advertising clients.
My hobby business (custom-made home decor items) site gets on average 4k hits an hour from FB referrers. That went to zero quite quickly. Bye bye todays’ impressions and sales..
But yes, globally the world is a better place for a few minutes while the domains continue to not have traffic routed to them..
The latest update sounds like if they are sending engineers to datacenters, they don’t have much of a grasp as to the issue. Just from personal experience, it sounds like either they, or a piece of equipment that automates L3 routing changes, from L4, borked a BGP update. Borked updates would cause the bad routes to be deleted, not just re-modified. If all BGP broadcasts for FB owned properties vanished, that’s a good sign it was a bad update. On the other side, one borked BGP update shouldn’t have this effect on so many services, unless some dummy coded that all data from all services route to a single public, or service domain. That’s just crappy design.

Nah, everyone just stopped working to talk about facebook being down and checking to see if it’s back every minute.

Would be nice if it stayed down forever.

… perchance to dream.

Agree with your sentiment. Whats with your username tho

That is one tasty fart…

EE UK Mobile network also down

I’m UK and EE/BT too and had similar issues with just random websites not loading or loading very slowly. Annoyingly isitdownrightnow.com also seems to be down

View All Comments
Back to top ↑