June 1, 2021
As you scale your applications, remaining resilient to underlying network failures, resource constraints introduced by other applications, or spikes in traffic can become exponentially more complex, even with very thorough testing and processes. Chaos engineering is a discipline that encourages experimenting in production and injecting controlled failures into the system to understand how the system will react in such conditions and to improve its reliability.
In this session Ara Pulido, Technical Evangelist, will chat with Tay Nishimura and Joris Bonnefoy, both site reliability engineers on the Chaos Engineering team at Datadog, to discuss how chaos engineering is done at Datadog and the in-house tooling they have built over the past few years to enable more robust testing of Datadog's rapidly growing enterprise systems.
By the end of the session you will have a better understanding of what chaos engineering is, how it can help your organization, and what you need to get started in your organization.
Datadog on Building Reliable Distributed Applications Using Temporal →
Datadog on OpenTelemetry →
Datadog on Secure Remote Updates →
Datadog on Stateful Workloads on Kubernetes →
Datadog on Data Science →
Datadog on Kubernetes Autoscaling →
Datadog on Kubernetes Node Management →
Datadog on Caching →
Datadog on Data Engineering Pipelines: Apache Spark at Scale →
Datadog on Site Reliability Engineering →
Datadog on Building an Event Storage System →
Datadog on gRPC →
Datadog on Gamedays →
Datadog on Serverless →
Datadog on Kubernetes Monitoring →
Datadog on Software Delivery →
Datadog on Incident Management →
Datadog on Kubernetes →