Last week millions of people were displeased when they turned to Netflix to watch their traditional Christmas eve movies and specials only to find that the service was down. Twitter was soon a buzz with many of those customers tweeting their complaints about the outage. The folks at Netflix read the complaints and quickly took action and found that the problem stemmed back to Seattle-based Amazon's cloud service. The people at the Virginia site, of Amazon's cloud services, acknowledged they were at fault, made the repairs needed and quickly began an investigation on what happened. Today they final made their report on what caused the problem with their cloud was human error.
It seems that unlike previous outages of Amazon's Web Services (AWS) this time their was no storm to blame, no hackers using denial of service flooding programs or cascading collapse in the programming. This it turned out that during a routine maintenance some critical data was accidentally erased on the elastic load balancer (ELB) control that splits up incoming data among partitions to provide extra stability.
“The service disruption began at 12:24 PM PST on December 24th when a portion of the ELB state data was logically deleted. This data is used and maintained by the ELB control plane to manage the configuration of the ELB load balancers in the region (for example tracking all the backend hosts to which traffic should be routed by each load balancer). The data was deleted by a maintenance process that was inadvertently run against the production ELB state data” spokespeople for Amazon wrote in a statement. Later on in the statement they added, “We have made a number of changes to protect the ELB service from this sort of disruption in the future.”
For more on this topic you should also read the following articles: