Facebook Inc. suffered a devastating outage that shut out many of its 2.7 billion global users, idled some of the company’s employees and prompted a public apology from the chief technology officer.
“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centres caused issues that interrupted this communication,” said Facebook vice-president of infrastructure Santosh Janardhan.
“This disruption to network traffic had a cascading effect on the way our data centres communicate, bringing our services to a halt.”
Janardhan confirmed that the outage also impacted many of the internal tools and systems Facebook uses in its operations, complicating its attempts to quickly diagnose and resolve the problem.
“Our services are now back online and we’re actively working to fully return them to regular operations,” stated Janardhan.
“We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.”
Users around the world were unable for hours Monday to access Facebook’s family of social-media apps, including the main social network, photo-sharing app Instagram and messaging service WhatsApp, in one of the longest failures in recent memory.
Downdetector, which monitors internet problems, said the Facebook outage is the largest it has seen, with more than 10.6 million reports worldwide.
Some internal services used by Facebook employees, including the company’s Workplace tool for communicating among teams, were also down for some staff, according to a spokesperson. Some workers were even struggling to use Facebook’s badge system at offices, according to a source familiar with the issues.
While it’s not uncommon for Facebook’s apps to have occasional glitches, technical issues that last more than a few minutes are rare.
“*Sincere* apologies to everyone impacted by outages of Facebook powered services right now,” tweeted Chief Technology Officer Mike Schroepfer. “We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible
The outage is the latest in a series of difficult events for Facebook. A former employee turned whistle-blower appeared Sunday on CBS’s “60 Minutes” to accuse the company of prioritizing profits over user safety.
The former employee, Frances Haugen, also handed over thousands of damning documents to U.S. lawmakers and the Wall Street Journal, which wrote a series of articles last month on Facebook’s struggles with content moderation and Instagram’s negative psychological impact on teenagers. The whistle-blower is also set to testify before a Senate subcommittee on Tuesday.
Facebook shares dropped 4.9% to $326.23 at the close in New York. They had declined before the outage was reported, hurt by the whistle-blower’s “60 Minutes” appearance.
Facebook has had to physically reset some of the company’s servers in an effort to fix the problem.
The cause of the issue is “probably a bad configuration or code push to the network management system,” said Alex Stamos, former chief security officer at Facebook who is now director of Stanford University’s Internet Observatory. “This isn’t supposed to happen.”
While the scale of the outage was unusual, Facebook’s internal apps stopped working for a time in 2019 following a dispute with Apple Inc., which halted some of the apps’ functionality on the iPhone maker’s platform.
After a user on Twitter suggested that Instagram should “stay offline forever,” Instagram boss Adam Mosseri jokingly replied, “Them fighting words… but it does feel like a snow day.”
The company’s shares dropped 4.9% to $326.23 at the close in New York. They had declined before the outage was reported on the whistle-blower’s “60 Minutes” appearance.
The outages on Monday at Facebook, WhatsApp and Instagram likely occurred because of a problem in the company’s domain name system, an obscure but crucial component of the internet.
Commonly known as DNS, it’s like a phone book for the internet. It’s the tool that converts a web domain, like Facebook.com, into the actual internet protocol, or IP, address where the site resides. Think of Facebook.com as the person one might look up in the white pages, and the IP address as the physical address they’ll find.
On Monday, a technical problem related to Facebook Inc.’s DNS records generated at least six hours of outages. When a DNS error occurs, a user’s web browser or smartphone apps can longer navigate to Facebook services.
Not only did Facebook’s primary platforms down, but so too did some of its internal applications, including the company’s own email system.
Users on Twitter and Reddit have also said that employees at the company’s Menlo Park, California, campus were unable to access offices and conference rooms that required a security badge. That could happen if the system that grants access is also connected to the same domain — Facebook.com.
The problem at Facebook appears to have its origins in the Border Gateway Protocol, or BGP. If DNS is the internet’s phone book, BGP is its postal service. When a user enters data in the internet, BGP determines the best available paths that data could travel.
Minutes before Facebook’s platforms stopped loading, public records show that a large number of changes were made to Facebook’s BGP routes, according to Cloudflare Inc.’s chief technology officer, John Graham-Cumming, in a tweet. Facebook hasn’t commented on if or why those changes were made.
Information Sourced from My Broadband.