The deconstructed database at Datadog

Data Eng & Databases

Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.

Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.

Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.

This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.

In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.

By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.

In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.

Principal Engineer

Julien Le Dem

Datadog

Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.

Staff Engineer

Pierre Lacave

Datadog

Pierre Lacave is a Staff Engineer at Datadog with over 15 years of experience building and operating large-scale data systems. His career spans high-stakes domains, including FinTech, AdTech, and Observability, where he specialized in developing and managing the performance and reliability of massive, real-time analytics systems.

Pierre is a member of the Apache DataSketches PMC, contributing to the development of probabilistic algorithms for big data analysis. Currently, he focuses on Datadog’s query infrastructure, committed to building open, scalable, and resilient distributed systems.

2026 Talks

The deconstructed database at Datadog

Data Eng & Databases

The AI Conference for Humans Who Ship