Datadog on...

Past Episodes

Datadog on LLMs: From Chatbots to Autonomous Agents

June 26, 2024

Conor Branagan, Othmane Abou-Amal and Jason Hand

As companies rapidly adopt Large Language Models (LLMs), understanding their unique challenges becomes crucial. Join us for a special episode of 'Datadog On LLMs: From Chatbots to Autonomous Agents,' streaming live from DASH 2024 on Wednesday, June 26th, to discuss this important topic.


Datadog on Stateful Workloads on Kubernetes

March 26, 2024

Martin Dickson, Edward Dale and Ara Pulido

Container orchestration platforms, like Kubernetes, were, from the beginning, an ideal solution for microservice architectures running a lot of stateless services. This was also the case for Datadog, which is run on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to hundreds of thousands of pods. But ...


Datadog on Data Science

February 20, 2024

Anne-Marie Tousch, Clement Tiennot and Jason Hand

In this episode we'll visit the world of predictive analytics and machine learning and uncover how these cutting-edge technologies are transforming the way Datadog monitors and improves its services.


Datadog on Kubernetes Autoscaling

January 25, 2024

Corentin Chary, Charly Fontaine and Ara Pulido

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. Also, this infrastructure is used by a wide variety of engineering teams at Datadog, with different features and ...


Datadog on Design Systems

December 14, 2023

Brandon West, Derek Howles and Vincent Volckaert

Over the last five years, the Datadog platform has grown. We added Application Performance Monitoring to complement our core infrastructure monitoring product, Log Management, Synthetic and Real User Monitoring, and more. For an enterprise software platform to be successful, the whole has to be greater than the sum of its parts. In Dat...


Datadog on AWS Identity Management

November 15, 2023

Andrew Krug, Leonid Vasilyev and Ben Donohue

For many engineers, Identity Management can elicit a broad range of emotions—from confusion during setup and configuration, to complete disinterest as it disappears into the background during day-to-day work, to frustration they encounter erroneously blocked access, and sometimes to terror when misconfigurations lead to a breach. In th...


Datadog on Kubernetes Node Management

October 10, 2023

Adrien Trouillaud, David Benque and Ara Pulido

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacit...


Datadog On Maintaining eBPF at Scale

September 27, 2023

Valeri Pliskin, Guy Arbitman and Andrew Krug

The extended Berkeley Packet Filter, eBPF has resulted in an ecosystem of new tooling that allows running programs in the linux kernel without loading kernel modules. eBPF seeks to do this safety within a secure sandbox environment and has been a boon to observability and security.


Datadog on Mobile Software Development

August 22, 2023

Xavier Gouchet, Maciek Grzybowski and Ara Pulido

Understanding the health and user experience of your mobile application is critical in order to avoid user frustration, understand application crashes, and reduce bugs mean time to resolution. To help with that task, Datadog has a mobile monitoring solution that allows developers to better understand and improve their application. But ...


Datadog on WebRTC

May 31, 2023

Brandon West, Jason Thomas and Brad Carter

WebRTC is a standard for real-time digital communications by enabling video, audio, and data streaming. Originally created for web browsers, Datadog uses WebRTC to create streaming applications across a variety of platforms, from Electron-based native applications to mobile applications.


Datadog on Caching

April 27, 2023

Jessica Cordonnier, Mitch Ward and Ara Pulido

Caching (and cache invalidation!) is often mentioned as one of the hardest problems in computer science. While caching can bring substantial performance improvements, reasoning about cached data can be extremely difficult as caching fundamentally means that you are no longer reading from your source of truth. With that in mind, many te...


Datadog on Data Engineering Pipelines: Apache Spark at Scale

March 23, 2023

Alodie Boissonnet, Anton Ippolitov and Ara Pulido

Datadog is an observability and security platform that ingests and processes tens of trillions of data points per day, coming from more than 22,000 customers. Processing that amount of data in a reasonable time stretches the limits of well known data engines like Apache Spark.


Datadog on Site Reliability Engineering

February 22, 2023

Brandon West, Laura de Vesine and Rick Mangi

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. Wit...


Datadog on the Lifecycle of Threats and Vulnerabilities

January 12, 2023

Nick Frichette, Adam Stevko and Andrew Krug

The security industry is full of complex terminology like threat, vulnerability, and mitigations. Definitions matter as we design processes that scale. At Datadog, the Security Research functions are focused on detection and response to specific types of threats and vulnerabilities. Workload vulnerabilities, cloud control plane vuln...


Datadog on Building an Event Storage System

December 13, 2022

Ara Pulido, Guillaume Duranceau and Ryan Worl

When Datadog introduced its Log Management product, it required a new event data storage platform, as storing logs and events is a completely different problem from storing metrics, which was the first Datadog product.


Datadog on gRPC

September 29, 2022

Ara Pulido, Anthonin Bonnefoy and Antoine Tollenaere

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability.


Datadog on Data Informed Product Development

July 26, 2022

Ara Pulido, Miranda Kapin and Derek Howles

Datadog is an observability and security platform. That means that our users may be in a high stress situation: debugging an issue in production, managing an incident or responding to a security threat. Having a good UX is particularly critical in those cases.


Datadog on Detecting Threats using Network Traffic Flows

June 11, 2022

Theo Guidoux, Andrew Krug and Anna Pauxberger

At Datadog’s scale, with over 18,000 customers sending trillions of data points per day, analyzing the volume of data coming in can be challenging. One of the largest log sources internally at Datadog are networking logs. Being able to analyze and make sense of them is critical to keep Datadog secure. To help with the task, we have bui...


Datadog on Web Security Standards

May 19, 2022

Jean-Baptiste Aviat, Ayaz Badouraly and Andrew Krug

Modern web applications are incredibly complex. Frameworks, javascript, and dependency management have made understanding and maintaining a baseline security standard maximum difficulty. With attack vectors like those listed in the OWASP Top 10 it can be incredibly difficult to know where to start and what the metrics for success are. ...


Datadog on Rust

February 23, 2022

Duarte Nunes, Ara Pulido and Brian Troutwine

Rust is a programming language that has been gaining popularity over the past few years, with its adopters claiming that it helps them write faster, memory efficient, and more reliable software.


Datadog on Profiling in Production

January 28, 2022

Julien Danjou and Kirk Kaiser

Depending on your chosen programming language and stack, you may have never used a profiler in production. The very idea of using a profiler in production for a web service may seem unrealistic, due to the amount of overhead involved. After all, aren’t profilers extremely computationally expensive to run?


Datadog on Data Visualization

December 14, 2021

Mark Hintz, Ara Pulido and Kemper Smith

Datadog customers send trillions of data points per day. These data points are processed by Datadog and used to debug production issues in real time. But, in order to reason about all this data, we humans need visual representations. Visualizations can help us discover connections and problem points.


Datadog on Building Responsive UX

September 30, 2021

Amy Luo, Edwin Morris and Ara Pulido

Datadog product designers and frontend developers have been working together to create a new, better UX for creating dashboards, which is one of the most important parts of using Datadog. A central part of this effort was building a new layout engine. Working on this project was a bit different from the usual feature work, so the colla...


Datadog on Gamedays

August 31, 2021

Elijah Andrews, Mike Petruzelli and Ara Pulido

As engineers, as we scale our applications and infrastructure, we accept that failure can and will happen. But, how can we get ahead of those potential failures? Gamedays are events which aim to test the resilience of a system when facing abnormal and turbulent situations, checking whether our expectations on how it will fail (or not) ...


Datadog on Chaos Engineering

June 1, 2021

Joris Bonnefoy, Tay Nishimura and Ara Pulido

As you scale your applications, remaining resilient to underlying network failures, resource constraints introduced by other applications, or spikes in traffic can become exponentially more complex, even with very thorough testing and processes. Chaos engineering is a discipline that encourages experimenting in production and injecting...


Datadog on Security and Compliance

March 31, 2021

Kirk Kaiser and Andrew Spangler

At Datadog, customer trust and data security are of the utmost importance.


Datadog on Agent Integration Development

March 23, 2021

Christine Chen, Ara Pulido and Julia Simon

To make sure that customers are getting the most out of the platform in the least amount of time, Datadog maintains more than 400 built-in integrations. These integrations collect metrics, events, and logs from a diverse set of sources: databases, source control, bug tracking tools, cloud providers, automation tools, and more.


Datadog on eBPF

January 26, 2021

Lee Avital, Guillaume Fournier and Ara Pulido

eBPF (extended Berkeley Packet Filter) is a Linux technology that can run sandboxed programs in the kernel without changing kernel source code or loading kernel modules. While the kernel is an ideal place to implement monitoring/observability, networking, and security it wasn't until the recent broad adoption of eBPF that it was feasib...


Datadog on Serverless

December 10, 2020

David Huie, Kirk Kaiser and Andrew Krug

The Datadog Security Platform team leverages Serverless to ingest security events across many different cloud providers, deployment platforms, and devices. These security events are then transformed and shipped to a data lake to help defend and protect the platform as a whole. Once there, these ingested events are used to drive interna...


Datadog on Kubernetes Monitoring

November 16, 2020

Celene Chang, Charly Fontaine and Ara Pulido

With many blog posts published and talks given on the topic, it’s no secret that Datadog is running Kubernetes at scale. We currently run dozens of clusters, some of them with thousands of nodes. Additionally, we have clusters running in multiple clouds. How are we monitoring all of that, ensuring we can scale up quickly and safely?


Datadog on Software Delivery

September 30, 2020

Jacob LeGrone, Ara Pulido and Benjamin Smith

Over 800 Engineers at Datadog do thousands of deployments per day, to hundreds of services in different environments, regions, and cloud providers. How can we manage all those deployments in a common way and have a reliable paper trail way to audit any changes?


Datadog on Incident Management

August 27, 2020

Leo Cavaille, Matt Hardwick and Ara Pulido

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. With a complex distributed architecture and hundreds of deployments per day, needless to say sometimes things don't go as planned. Our teams have been improving the way incidents are managed at Datadog ov...


Datadog on RocksDB

June 30, 2020

James Bibby, Kenny House and Ara Pulido

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. Each of those is associated with metadata, mostly in the form of tags, and it can also be part of streams of related data points, which can then be explored, queried, or aggregated. RocksDB is used by man...


Datadog on Kafka

May 27, 2020

Jamie Alquiza, Kirk Kaiser and Balthazar Rouberol

In this session, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters.


Datadog on Kubernetes

May 27, 2020

Laurent Bernaille and Ara Pulido

When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing.