January 25, 2024
Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. Also, this infrastructure is used by a wide variety of engineering teams at Datadog, with different features and capacity needs that may also change overtime.
How do we make sure our applications have the compute resources they need at any given time? How do we ensure that our cloud costs are a real reflection of that compute need and we are not wasting resources? What metrics do we use to drive those autoscaling events?
In this session, Ara Pulido, Staff Developer Advocate, will chat with Charly Fontaine, Engineering Manager and Corentin Chary, Senior Staff Software Engineer about Datadog autoscaling strategies in Kubernetes –vertical and horizontal–, including what metrics teams are using to drive their autoscaling events.
Datadog on Building Reliable Distributed Applications Using Temporal →
Datadog on OpenTelemetry →
Datadog on Secure Remote Updates →
Datadog on Stateful Workloads on Kubernetes →
Datadog on Data Science →
Datadog on Kubernetes Node Management →
Datadog on Caching →
Datadog on Data Engineering Pipelines: Apache Spark at Scale →
Datadog on Site Reliability Engineering →
Datadog on Building an Event Storage System →
Datadog on gRPC →
Datadog on Gamedays →
Datadog on Chaos Engineering →
Datadog on Serverless →
Datadog on Kubernetes Monitoring →
Datadog on Software Delivery →
Datadog on Incident Management →
Datadog on Kubernetes →