Datadog on Data Engineering Pipelines: Apache Spark at Scale

March 23, 2023

Ara Pulido

Ara Pulido

Anton Ippolitov

Anton Ippolitov

Alodie Boissonnet

Alodie Boissonnet

Category

Datadog is an observability and security platform that ingests and processes tens of trillions of data points per day, coming from more than 22,000 customers. Processing that amount of data in a reasonable time stretches the limits of well known data engines like Apache Spark.

In addition to scale, Datadog infrastructure is multi-cloud on Kubernetes and the data engineering platform is used by different engineering teams, so having a good set of abstractions to make running Spark jobs easier is critical.

In this session, Ara Pulido, Staff Developer Advocate, will chat with Anton Ippolitov, Senior Software Engineer in the Data Engineering Infrastructure team, and Alodie Boissonnet, Software Engineer in the Historical Metrics Query team. They will share their journey on building and maintaining their infrastructure and data engineering pipelines, as well as run and optimize Spark batch jobs, with real-work examples.

By the end of the talk you will have a better understanding of what value Spark brings to your organization, why Spark continues to be one of the most popular open source data engines, and how to use it at scale.