Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. With a complex distributed architecture and hundreds of deployments per day, needless to say sometimes things don't go as planned. Our teams have been improving the way incidents are managed at Datadog over the years and they are using that knowledge to help Datadog customers manage their own incidents.
In this session, Technical Evangelist Ara Pulido will chat with Léo Cavaillé, SRE Manager, and Matt Hardwick, an engineer working on Datadog’s incident application. They will discuss how incident management evolved at Datadog, how we handle incidents today, and how the SRE team is working alongside the engineers building Datadog’s Incident application to make Datadog the best place to organize, investigate, manage, and solve your infrastructure and application incidents.