Episode 10 - Trouble in the Cloud

Welcome to Episode 10 

News
  https://www.bloomberg.com/news/articles/2017-03-08/microsoft-pledges-to-...  
  https://mspoweruser.com/firefox-52-will-last-version-firefox-windows-xp-...
  https://www.cnet.com/news/look-out-windows-android-is-catching-up/
  https://www.wired.com/2017/03/atari-chip-set-off-bitter-war-among-neuros...
  http://www.nature.com/nature/journal/v543/n7644/full/nature21371.html
  https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-wha...
    https://xkcd.com/936/

Announcements
  Feedback
    @Gangrif and @Xenophage make a great pair that will titillate ones's ears! They cover things in the ops and 
    infosec news categories and topics that are relatable or at least interesting to discuss. It's not your typical
    format of a podcast, but that is what makes it refreshing.

    Keep up the great content guys! 

  Patreon, you guys are awesome
    $10 tier. 
  The face! 

  Youtube stream for this episode! https://youtu.be/EeD5y34oKNY

Chat

Main topic
  Trouble in the cloud, The 2/28/2017 US East 1 S3 outage
    https://aws.amazon.com/message/41926/
    An Amazon employee was troubleshooting a problem with their S3 billing mechanisms. 
    A mistake made in an established playbook, took down systems that were not intended to be taken down
      The downtime which was intended only for billing systems, took down systems essential in both reads and writes to he S3 API.
    This required that some systems be rebooted.
      Reboots on the Index and Placement subsystems (two of the systems mentioned as accidentally rebooted) had not been performed for years
      Due to the dependencies between these systems, the restarts took quite some time. 
    The downtime caused some backlog of requests, and these needed to be processed when the systems were once again operational
    
  Remediation
    The core issues here were the amount of systems un-intentionally taken offline, and the fact that systems that depended on eachother were taken down at the same time. 
    Amazon has made changes to their tools to help pervent systems from dropping below service affecting thresholds.
    They are also working to remove some of the inter-dependencies. 

  On top of everything, the the S3 status page depended on the health of the S3 service in order to operate. 
    This made it difficult for customers to view the status of S3. 

 
Intro and Outro music credit: Tri Tachyon, Digital MK 2
http://freemusicarchive.org/music/Tri-Tachyon/