September 29, 2022
Anthonin Bonnefoy
Antoine Tollenaere
Ara Pulido
Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability.
As teams investigated incidents related to their services, they discovered that some of them were gRPC related. But, were there common patterns to those incidents? Could we use them to learn more about gRPC and how to use it better?
During this past year, an engineering squad with members from different teams was formed to study gRPC related incidents and share lessons learned. They wrote a set of best practices for all engineering teams to follow and common libraries that implement them.
In this session Ara Pulido, Staff Developer Advocate, will chat with Anthonin Bonnefoy, Senior Software Engineer in the Core Resilience team and Antoine Tollenaere, Team Lead in the Networking team, who were part of this squad, to share their investigation of the incidents and the gRPC best practices they came up with to avoid those in the future.
By the end of the session you will have a better understanding of the internals of gRPC and how to better implement it at your organization.
Datadog on OpenTelemetry →
Datadog on Secure Remote Updates →
Datadog on LLMs: From Chatbots to Autonomous Agents →
Datadog on Stateful Workloads on Kubernetes →
Datadog on Data Science →
Datadog on Kubernetes Autoscaling →
Datadog on Kubernetes Node Management →
Datadog On Maintaining eBPF at Scale →
Datadog on Caching →
Datadog on Data Engineering Pipelines: Apache Spark at Scale →
Datadog on Site Reliability Engineering →
Datadog on Building an Event Storage System →
Datadog on Rust →
Datadog on Profiling in Production →
Datadog on Gamedays →
Datadog on Chaos Engineering →
Datadog on Agent Integration Development →
Datadog on eBPF →
Datadog on Serverless →
Datadog on Kubernetes Monitoring →
Datadog on Software Delivery →
Datadog on Kubernetes →