September 22, 2022

Amazon Internal Communication Reports Problem with AWS Network Devices

  • An Amazon Web Services outage on Tuesday caused other apps to see slower services.
  • AWS analysis found traffic congestion on network devices in the Northern Virginia region.
  • Employees are always looking for the root cause; the issue affected all AWS operations to some extent.

As Amazon Web Services experienced one of the largest outages in company history on Tuesday, more than 600 employees participated in an emergency conference call to assess the cause of the outage.

The main culprit: A sudden surge in traffic that caused congestion on several network devices in Northern Virginia, the largest region for AWS data centers.

The company had initially identified the “root cause” of the outage on “an issue with multiple network devices within the internal AWS network,” according to a screenshot of an internal AWS press release Tuesday morning obtained by Insider. “Specifically, these devices receive more traffic than they can handle, resulting in high latency and packet loss for the traffic passing through them.”

The problems persisted on Tuesday afternoon and resulted in hours of web service disruption, sparking some of the world’s largest online services, including Disney +,


Netflix

, and even Amazon’s own e-commerce store, for widespread glitches and slowdowns. The list of companies that experienced outages on Tuesday includes Spotify,


Zoom

, and Airbnb, to name a few.

While the outage was linked to a disruption in northern Virginia, it disrupted all parts of AWS’s global operations to some extent. Additionally, Amazon’s retail and delivery networks, which rely on AWS tools, have in some cases been brought to a screeching halt.

The blackout took its toll on Amazon’s internal warehousing and logistics operations in the midst of the holiday shopping season. Some warehouse workers and drivers were sent home as the company’s internal communication, delivery routing and monitoring systems stalled.

The network issue “specifically impacted” Amazon’s internal DNS servers. As of 2:04 p.m. Seattle time, the company had no estimate of when the system would be fully operational, according to a message on the AWS public state console.

A separate internal memo stated that “firewalls are overwhelmed by as yet unknown source”, adding that AWS networking teams were working to “block traffic from key offending speakers / hosts at the firewall level.”

Amazon’s Real-Time Digital Advertising Auction Activity May Be Responsible for Much of Firewall Crushing Traffic, Internal Report Says


Soft

posts seen by Insider.

In an email to Insider, an Amazon representative said, “There is an AWS service event in the US East (Virginia) region affecting Amazon operations and other customers with resources operating from this region. The AWS team is working to resolve the issue as quickly as possible. “

Even inside AWS, however, information about the outage remains sketchy. As engineers and executives scrambled to decode the problem on a 600-person conference call, chaired by AWS vice president of infrastructure Peter Desantis, rumors spread among staff. An AWS employee speculated that the outage was caused by an “orchestrated DNS attack”, while another employee played down those concerns, saying it was more of an “internal thing” related network and firewall saturation.

“It’s the fog of war,” an AWS official said.

In a message sent just before 2 p.m. PT, the company’s internal communications team told employees that it was “starting to see a significant upturn in the availability of AWS services in the US-EST region. 1 ”. The division’s “most senior engineers” continue to monitor the problem, including “identifying the specific traffic flows that were causing traffic jams within these devices,” the note said.

Do you work at Amazon? Contact reporter Eugene Kim through Signal or Telegram encrypted messaging apps (+ 1-650-942-3061) or email ([email protected]).

Contact reporter Katherine Long through Signal / Telegram encrypted messaging apps (+ 1-206-375-9280) or email ([email protected]).

Contact us using a non-professional device. See Insider’s Sources Guide for more tips on sharing information securely.


Source link