Datadog on...

Upcoming Episode


Datadog on Building an Event Storage System

December 13, 2022

Ara Pulido, Guillaume Duranceau and Ryan Worl

When Datadog introduced its Log Management product, it required a new event data storage platform, as storing logs and events is a completely different problem from storing metrics, which was the first Datadog product.

notification bell

Past Episodes


Datadog on gRPC

September 29, 2022

Ara Pulido, Anthonin Bonnefoy and Antoine Tollenaere

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability.

play

Datadog on Data Informed Product Development

July 26, 2022

Ara Pulido, Miranda Kapin and Derek Howles

Datadog is an observability and security platform. That means that our users may be in a high stress situation: debugging an issue in production, managing an incident or responding to a security threat. Having a good UX is particularly critical in those cases.

play

Datadog on Detecting Threats using Network Traffic Flows

June 11, 2022

Theo Guidoux, Andrew Krug and Anna Pauxberger

At Datadog’s scale, with over 18,000 customers sending trillions of data points per day, analyzing the volume of data coming in can be challenging. One of the largest log sources internally at Datadog are networking logs. Being able to analyze and make sense of them is critical to keep Datadog secure. To help with the task, we have bui...

play

Datadog on Web Security Standards

May 19, 2022

Jean-Baptiste Aviat, Ayaz Badouraly and Andrew Krug

Modern web applications are incredibly complex. Frameworks, javascript, and dependency management have made understanding and maintaining a baseline security standard maximum difficulty. With attack vectors like those listed in the OWASP Top 10 it can be incredibly difficult to know where to start and what the metrics for success are. ...

play

Datadog on Rust

February 23, 2022

Duarte Nunes, Ara Pulido and Brian Troutwine

Rust is a programming language that has been gaining popularity over the past few years, with its adopters claiming that it helps them write faster, memory efficient, and more reliable software.

play

Datadog on Profiling in Production

January 28, 2022

Julien Danjou and Kirk Kaiser

Depending on your chosen programming language and stack, you may have never used a profiler in production. The very idea of using a profiler in production for a web service may seem unrealistic, due to the amount of overhead involved. After all, aren’t profilers extremely computationally expensive to run?

play

Datadog on Data Visualization

December 14, 2021

Mark Hintz, Ara Pulido and Kemper Smith

Datadog customers send trillions of data points per day. These data points are processed by Datadog and used to debug production issues in real time. But, in order to reason about all this data, we humans need visual representations. Visualizations can help us discover connections and problem points.

play

Datadog on Building Responsive UX

September 30, 2021

Amy Luo, Edwin Morris and Ara Pulido

Datadog product designers and frontend developers have been working together to create a new, better UX for creating dashboards, which is one of the most important parts of using Datadog. A central part of this effort was building a new layout engine. Working on this project was a bit different from the usual feature work, so the colla...

play

Datadog on Gamedays

August 31, 2021

Elijah Andrews, Mike Petruzelli and Ara Pulido

As engineers, as we scale our applications and infrastructure, we accept that failure can and will happen. But, how can we get ahead of those potential failures? Gamedays are events which aim to test the resilience of a system when facing abnormal and turbulent situations, checking whether our expectations on how it will fail (or not) ...

play

Datadog on Chaos Engineering

June 1, 2021

Joris Bonnefoy, Tay Nishimura and Ara Pulido

As you scale your applications, remaining resilient to underlying network failures, resource constraints introduced by other applications, or spikes in traffic can become exponentially more complex, even with very thorough testing and processes. Chaos engineering is a discipline that encourages experimenting in production and injecting...

play

Datadog on Security and Compliance

March 31, 2021

Kirk Kaiser and Andrew Spangler

At Datadog, customer trust and data security are of the utmost importance.

play

Datadog on Agent Integration Development

March 23, 2021

Christine Chen, Ara Pulido and Julia Simon

To make sure that customers are getting the most out of the platform in the least amount of time, Datadog maintains more than 400 built-in integrations. These integrations collect metrics, events, and logs from a diverse set of sources: databases, source control, bug tracking tools, cloud providers, automation tools, and more.

play

Datadog on eBPF

January 26, 2021

Lee Avital, Guillaume Fournier and Ara Pulido

eBPF (extended Berkeley Packet Filter) is a Linux technology that can run sandboxed programs in the kernel without changing kernel source code or loading kernel modules. While the kernel is an ideal place to implement monitoring/observability, networking, and security it wasn't until the recent broad adoption of eBPF that it was feasib...

play

Datadog on Serverless

December 10, 2020

David Huie, Kirk Kaiser and Andrew Krug

The Datadog Security Platform team leverages Serverless to ingest security events across many different cloud providers, deployment platforms, and devices. These security events are then transformed and shipped to a data lake to help defend and protect the platform as a whole. Once there, these ingested events are used to drive interna...

play

Datadog on Kubernetes Monitoring

November 16, 2020

Celene Chang, Charly Fontaine and Ara Pulido

With many blog posts published and talks given on the topic, it’s no secret that Datadog is running Kubernetes at scale. We currently run dozens of clusters, some of them with thousands of nodes. Additionally, we have clusters running in multiple clouds. How are we monitoring all of that, ensuring we can scale up quickly and safely?

play

Datadog on Software Delivery

September 30, 2020

Jacob LeGrone, Ara Pulido and Benjamin Smith

Over 800 Engineers at Datadog do thousands of deployments per day, to hundreds of services in different environments, regions, and cloud providers. How can we manage all those deployments in a common way and have a reliable paper trail way to audit any changes?

play

Datadog on Incident Management

August 27, 2020

Leo Cavaille, Matt Hardwick and Ara Pulido

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. With a complex distributed architecture and hundreds of deployments per day, needless to say sometimes things don't go as planned. Our teams have been improving the way incidents are managed at Datadog ov...

play

Datadog on RocksDB

June 30, 2020

James Bibby, Kenny House and Ara Pulido

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. Each of those is associated with metadata, mostly in the form of tags, and it can also be part of streams of related data points, which can then be explored, queried, or aggregated. RocksDB is used by man...

play

Datadog on Kafka

May 27, 2020

Jamie Alquiza, Kirk Kaiser and Balthazar Rouberol

In this session, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters.

play

Datadog on Kubernetes

May 27, 2020

Laurent Bernaille and Ara Pulido

When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing.

play