Ep 18 - Handling Incidents and Outages
TIMESTAMPS:
00:00 Start
01:28 The Stand-Up
06:48 Social Engineering
08:25 This Week's Epic
25:44 The Wash-Up
What do we tell people when things go wrong in our organisations? This week, there have been a couple of write-ups of recent high-profile outages at Roblox and Mozilla, which - when paired with the well-documented outage at Facebook that we discussed last season - gives us a fascinating glimpse into other companies' incident processes, on-call rotas and war rooms. Sanj, Gwen and Neil share their surprising love of being knee-deep in an incident, bringing some of their own recent experiences to the podcast.
In our workplace updates, there's lots of hiring, lots of shipping new features, everybody tries to coax Sanj into management, and Neil totally isn't doing any money laundering.
LINKS DISCUSSED THIS WEEK:
YouTube: Ozark Season 1 Trailer
Facebook Engineering: Update about the October 4th outage
Roblox Return to Service 10/28-10/31 2021
Mozilla Hacks: Retrospective and Technical Details on the recent Firefox Outage
Vox: Pokémon Go launched in 26 countries, and then its servers crashed