If you have any questions, feedback to share, want to become a partner or simply need to know more,
the entire Tech.Rocks team is at your disposal!
Managing a massive scale incident
Dec 7, 2023 — 04:05 pm - 4:40 PMMain Stage
Main Stage
Presented by
Description
On March 8, 2023 Datadog experienced a massive global outage. In this talk, we will share the trigger for the incident and why it was a massive effort to recover from. We’ll cover the lessons we learned from this event and how we ran the incident response itself, successfully coordinating more than 500 engineers over 2+ days of continual response, and how we built an engineering organization capable of that feat (with minimal heroism).