May 12-14 ■ SF ■ MARRIOTT MARQUIS
The AI Conference for Humans Who Ship
Meet the world's top AI infrastructure minds where architects of AI share what actually works. Three days of high-quality technical talks and meaningful interactions.
Real Industry Intelligence
For 13 years, AI Council has brought together the engineers behind breakthrough AI systems: researchers solving training at scale, teams optimizing inference, and practitioners shipping models to millions.
Technical Attendees
Speakers
Years Running
Featured 2026 Speakers
Learn from AI heroes at top companies who explain their apps, architectures and best practices in detail. Stay tuned for more speakers coming soon!
- Lightning Talks
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Lightning Talks
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
- Lightning Talks
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- Agent Infrastructure
The same way computing for agents is evolving into sandboxes, data infrastructure for agents will necessitate the provisioning and maintenance of trillions of databases. From agent memory, session data, to the mind-boggling number of databases vibe-coded applications will need, it is clear we need databases that come online instantly and are individually cheap. SQLite is widely acknowledged to have the right shape for this, but it also at the same time lacks the extended feature set that modern applications need. In this talk we will present Turso, an open- source rewrite of SQLite in Rust, that keeps full compatibility with its file-based nature while expanding what it can do.
The same way computing for agents is evolving into sandboxes, data infrastructure for agents will necessitate the provisioning and maintenance of trillions of databases. From agent memory, session data, to the mind-boggling number of databases vibe-coded applications will need, it is clear we need databases that come online instantly and are individually cheap. SQLite is widely acknowledged to have the right shape for this, but it also at the same time lacks the extended feature set that modern applications need. In this talk we will present Turso, an open- source rewrite of SQLite in Rust, that keeps full compatibility with its file-based nature while expanding what it can do.
Glauber Costa is a Canadian software engineer recognized for his contributions to high-performance systems software, including Linux kernel, Kernel-based Virtual Machine (KVM), the OSv unikernel, ScyllaDB, and Rust open source projects. Currently Glauber serves as the founder and CEO of Turso, a full rewrite of SQLite in Rust that is redefining what local databases mean in the age of agents.
- Coding Agents & Autonomous Dev
Emilie’s work emphasizes the unglamorous but critical side of AI adoption: building frameworks, workflows, and internal tooling that enable engineers to take ownership of outcomes, experiment effectively, and measure success without relying on traditional project management. She is passionate about autonomy, accountability, and creating systems where teams can move quickly while maintaining quality and trust.
- Data Eng & Databases
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Pierre Lacave is a Staff Engineer at Datadog with over 15 years of experience building and operating large-scale data systems. His career spans high-stakes domains, including FinTech, AdTech, and Observability, where he specialized in developing and managing the performance and reliability of massive, real-time analytics systems.
Pierre is a member of the Apache DataSketches PMC, contributing to the development of probabilistic algorithms for big data analysis. Currently, he focuses on Datadog’s query infrastructure, committed to building open, scalable, and resilient distributed systems.
- Agent Infrastructure
Most AI agents work in demos. Few survive in production. LLMs are stateless. Infrastructure fails. Context windows reset. Real-world objectives span hours or days. Building long-running autonomous agents requires durability engineered across the entire system.
This talk compares and contrasts dominant approaches to durability for agents and presents three pillars of durable agentic systems.
-
Durable Execution
Agents must survive crashes, retries, and partial task completion. Durable execution engines like Temporal persist workflow state and enable deterministic replay. Graph-based orchestrators such as LangGraph model control flow as explicit state machines. These approaches reflect different assumptions about recovery, replayability, and operational resilience, and directly shape how agents behave under failure.
-
Durable Autonomy
Autonomous systems inevitably encounter ambiguity and incomplete information. Durable autonomy means designing agents that recognize uncertainty, escalate intelligently to humans when necessary, and resume coherently without losing progress. We’ll examine architectural patterns for human-in-the-loop integration that preserve control while maintaining forward momentum.
-
Durable Statefulness
Long-running agents cannot rely on ever-growing prompts. Some systems serialize state into resumable bursts using patterns like Anthropic’s Git-Commit approach. Others externalize cognition into layered memory architectures - separating working, episodic, semantic, or procedural memory through memory virtualization. Different workloads and time horizons demand different state strategies.
Attendees will leave with a deeper understanding of agent durability and a practical architectural framework for building resilient agents, systems designed not just to respond, but to endure.
Most AI agents work in demos. Few survive in production. LLMs are stateless. Infrastructure fails. Context windows reset. Real-world objectives span hours or days. Building long-running autonomous agents requires durability engineered across the entire system.
This talk compares and contrasts dominant approaches to durability for agents and presents three pillars of durable agentic systems.
-
Durable Execution
Agents must survive crashes, retries, and partial task completion. Durable execution engines like Temporal persist workflow state and enable deterministic replay. Graph-based orchestrators such as LangGraph model control flow as explicit state machines. These approaches reflect different assumptions about recovery, replayability, and operational resilience, and directly shape how agents behave under failure.
-
Durable Autonomy
Autonomous systems inevitably encounter ambiguity and incomplete information. Durable autonomy means designing agents that recognize uncertainty, escalate intelligently to humans when necessary, and resume coherently without losing progress. We’ll examine architectural patterns for human-in-the-loop integration that preserve control while maintaining forward momentum.
-
Durable Statefulness
Long-running agents cannot rely on ever-growing prompts. Some systems serialize state into resumable bursts using patterns like Anthropic’s Git-Commit approach. Others externalize cognition into layered memory architectures - separating working, episodic, semantic, or procedural memory through memory virtualization. Different workloads and time horizons demand different state strategies.
Attendees will leave with a deeper understanding of agent durability and a practical architectural framework for building resilient agents, systems designed not just to respond, but to endure.
Parminder Singh is a Silicon Valley–based entrepreneur, AI systems thinker, and engineering leader building the next generation of agentic software.
He is the Co-Founder of Redscope.ai, where he is pioneering AI agents that qualify and convert website visitors into high-intent, consented leads. He leads engineering and go-to-market, building full-stack AI systems that combine large language models, real-time web context, and autonomous workflows.
Previously, Parminder co-founded Hansel.io, scaled teams across India and the U.S and led the company through acquisition by NetcoreCloud.com. Earlier in his career, he helped build mobile products at Flipkart.com used by millions.
Parminder writes and speaks on the future of AI architecture, multi-agent systems, RAG infrastructure, and the shifting power dynamics of the AI stack. His work has been published in VentureBeat, The AI Journal, and Inc42. He regularly presents on agentic AI systems and enterprise automation.
He holds a B.Tech in Computer Science from IIIT Hyderabad.
- Agent Infrastructure
- Lightning Talks
- Model Systems
- Model Systems
- Model Systems
- Model Systems
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Data Eng & Databases
Working on ClickHouse’s managed Postgres effort, I have a front-row seat to this shift. Before ClickHouse, I co-founded PeerDB, where we saw data movement from Postgres to ClickHouse accelerate by orders of magnitude over the last couple of years, and it is still growing. That growth is not just about ETL, it signals what users actually want: transactional simplicity with analytics-grade performance, without stitching together a dozen systems.
In this talk, I will explain the pattern we are seeing, why it is accelerating now, and what it implies for the next generation of database platforms. I will then walk through the approach ClickHouse is taking, including managed Postgres, tighter Postgres and ClickHouse integration, and new primitives like pg_clickhouse and pg_stat_ch. We will also cover the replication story (including new “logical replication v2” style ideas), and the set of levers required to get closer to sub-second freshness and low operational overhead.
Finally, I will zoom out to the bigger picture: a unified database is not just “Postgres + OLAP”. It requires re-architecting parts of the stack so applications do not have to carry the abstraction burden. I will share what “world-class” looks like here, the remaining technical challenges, and a realistic path to making unified OLTP + OLAP the default for fast-growing AI workloads.
Working on ClickHouse’s managed Postgres effort, I have a front-row seat to this shift. Before ClickHouse, I co-founded PeerDB, where we saw data movement from Postgres to ClickHouse accelerate by orders of magnitude over the last couple of years, and it is still growing. That growth is not just about ETL, it signals what users actually want: transactional simplicity with analytics-grade performance, without stitching together a dozen systems.
In this talk, I will explain the pattern we are seeing, why it is accelerating now, and what it implies for the next generation of database platforms. I will then walk through the approach ClickHouse is taking, including managed Postgres, tighter Postgres and ClickHouse integration, and new primitives like pg_clickhouse and pg_stat_ch. We will also cover the replication story (including new “logical replication v2” style ideas), and the set of levers required to get closer to sub-second freshness and low operational overhead.
Finally, I will zoom out to the bigger picture: a unified database is not just “Postgres + OLAP”. It requires re-architecting parts of the stack so applications do not have to carry the abstraction burden. I will share what “world-class” looks like here, the remaining technical challenges, and a realistic path to making unified OLTP + OLAP the default for fast-growing AI workloads.
- Applied AI
Most organizations are experimenting with AI at work. Proof of concepts are thriving in small-scale silos, but how can you scale those wins across an enterprise? This talk discusses the symptoms that trap teams in the cycle of perpetual prototyping and provides a practical framework for breaking through common technical, organizational, and cultural barriers to scale. Learn how to plan for production from day one and turn your AI experiments into solutions that stick.
Most organizations are experimenting with AI at work. Proof of concepts are thriving in small-scale silos, but how can you scale those wins across an enterprise? This talk discusses the symptoms that trap teams in the cycle of perpetual prototyping and provides a practical framework for breaking through common technical, organizational, and cultural barriers to scale. Learn how to plan for production from day one and turn your AI experiments into solutions that stick.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Lightning Talks
- Data Eng & Databases
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- AI Engineering
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
- Inference Systems
- Agent Infrastructure
As models become more capable and reliable, it’s more important than ever to design tools and context thoughtfully. When models had short context windows and low agency, basic document retrieval into the model’s context window made a dramatic difference; but as frontier models boast millions of tokens of context and run for minutes and hours through tool calls and cycles of compaction, there is an ever growing list of concerns all vying for our models’ scarce attention. Extending context windows is an expensive and incomplete workaround.
In this talk, we will share some of the principles and techniques we found useful in navigating these problems ourselves as we worked on the goal of improving our assistant, Puck, from a simple retrieval-based chatbot to a deeply knowledgeable general assistant. In particular, we will touch on two prevailing challenges. First, we’ll share how we’ve sought to raise the signal-to-noise ratio in Puck’s context by thinking of context engineering itself as a search problem, involving every step of the pipeline from indexing to subagents. Second, we’ll share how Puck approaches blending faithful, up-to-date structured data queries with the richness, breadth, incompleteness, and frequent conflicts latent in unstructured data in production. We’ll walk through a few concrete tactics in detail within both of these pillars, demonstrating that creative and useful approaches often come when we stop thinking of databases, search indexes, tools, and subagents as separate components, but as different solutions to the same underlying, age-old problem: searching for signal in a confusing and noisy world.
As models become more capable and reliable, it’s more important than ever to design tools and context thoughtfully. When models had short context windows and low agency, basic document retrieval into the model’s context window made a dramatic difference; but as frontier models boast millions of tokens of context and run for minutes and hours through tool calls and cycles of compaction, there is an ever growing list of concerns all vying for our models’ scarce attention. Extending context windows is an expensive and incomplete workaround.
In this talk, we will share some of the principles and techniques we found useful in navigating these problems ourselves as we worked on the goal of improving our assistant, Puck, from a simple retrieval-based chatbot to a deeply knowledgeable general assistant. In particular, we will touch on two prevailing challenges. First, we’ll share how we’ve sought to raise the signal-to-noise ratio in Puck’s context by thinking of context engineering itself as a search problem, involving every step of the pipeline from indexing to subagents. Second, we’ll share how Puck approaches blending faithful, up-to-date structured data queries with the richness, breadth, incompleteness, and frequent conflicts latent in unstructured data in production. We’ll walk through a few concrete tactics in detail within both of these pillars, demonstrating that creative and useful approaches often come when we stop thinking of databases, search indexes, tools, and subagents as separate components, but as different solutions to the same underlying, age-old problem: searching for signal in a confusing and noisy world.
- Data Eng & Infrastructure
- Data Eng & Databases
- AI Engineering
The views, opinions, and content shared in this presentation are my own and do not represent those of any employer, past or present. I’d also like to note that my book, Data Engineering for Multimodal AI (O’Reilly), is authored independently and is neutral in perspective. It is not sponsored by Salesforce and does not represent Salesforce’s viewpoints.
The views, opinions, and content shared in this presentation are my own and do not represent those of any employer, past or present. I’d also like to note that my book, Data Engineering for Multimodal AI (O’Reilly), is authored independently and is neutral in perspective. It is not sponsored by Salesforce and does not represent Salesforce’s viewpoints.
- Data Eng & Databases
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
- Lightning Talks
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Lightning Talks
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
- Lightning Talks
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- Agent Infrastructure
The same way computing for agents is evolving into sandboxes, data infrastructure for agents will necessitate the provisioning and maintenance of trillions of databases. From agent memory, session data, to the mind-boggling number of databases vibe-coded applications will need, it is clear we need databases that come online instantly and are individually cheap. SQLite is widely acknowledged to have the right shape for this, but it also at the same time lacks the extended feature set that modern applications need. In this talk we will present Turso, an open- source rewrite of SQLite in Rust, that keeps full compatibility with its file-based nature while expanding what it can do.
The same way computing for agents is evolving into sandboxes, data infrastructure for agents will necessitate the provisioning and maintenance of trillions of databases. From agent memory, session data, to the mind-boggling number of databases vibe-coded applications will need, it is clear we need databases that come online instantly and are individually cheap. SQLite is widely acknowledged to have the right shape for this, but it also at the same time lacks the extended feature set that modern applications need. In this talk we will present Turso, an open- source rewrite of SQLite in Rust, that keeps full compatibility with its file-based nature while expanding what it can do.
Glauber Costa is a Canadian software engineer recognized for his contributions to high-performance systems software, including Linux kernel, Kernel-based Virtual Machine (KVM), the OSv unikernel, ScyllaDB, and Rust open source projects. Currently Glauber serves as the founder and CEO of Turso, a full rewrite of SQLite in Rust that is redefining what local databases mean in the age of agents.
- Coding Agents & Autonomous Dev
Emilie’s work emphasizes the unglamorous but critical side of AI adoption: building frameworks, workflows, and internal tooling that enable engineers to take ownership of outcomes, experiment effectively, and measure success without relying on traditional project management. She is passionate about autonomy, accountability, and creating systems where teams can move quickly while maintaining quality and trust.
- Data Eng & Databases
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Pierre Lacave is a Staff Engineer at Datadog with over 15 years of experience building and operating large-scale data systems. His career spans high-stakes domains, including FinTech, AdTech, and Observability, where he specialized in developing and managing the performance and reliability of massive, real-time analytics systems.
Pierre is a member of the Apache DataSketches PMC, contributing to the development of probabilistic algorithms for big data analysis. Currently, he focuses on Datadog’s query infrastructure, committed to building open, scalable, and resilient distributed systems.
- Agent Infrastructure
Most AI agents work in demos. Few survive in production. LLMs are stateless. Infrastructure fails. Context windows reset. Real-world objectives span hours or days. Building long-running autonomous agents requires durability engineered across the entire system.
This talk compares and contrasts dominant approaches to durability for agents and presents three pillars of durable agentic systems.
-
Durable Execution
Agents must survive crashes, retries, and partial task completion. Durable execution engines like Temporal persist workflow state and enable deterministic replay. Graph-based orchestrators such as LangGraph model control flow as explicit state machines. These approaches reflect different assumptions about recovery, replayability, and operational resilience, and directly shape how agents behave under failure.
-
Durable Autonomy
Autonomous systems inevitably encounter ambiguity and incomplete information. Durable autonomy means designing agents that recognize uncertainty, escalate intelligently to humans when necessary, and resume coherently without losing progress. We’ll examine architectural patterns for human-in-the-loop integration that preserve control while maintaining forward momentum.
-
Durable Statefulness
Long-running agents cannot rely on ever-growing prompts. Some systems serialize state into resumable bursts using patterns like Anthropic’s Git-Commit approach. Others externalize cognition into layered memory architectures - separating working, episodic, semantic, or procedural memory through memory virtualization. Different workloads and time horizons demand different state strategies.
Attendees will leave with a deeper understanding of agent durability and a practical architectural framework for building resilient agents, systems designed not just to respond, but to endure.
Most AI agents work in demos. Few survive in production. LLMs are stateless. Infrastructure fails. Context windows reset. Real-world objectives span hours or days. Building long-running autonomous agents requires durability engineered across the entire system.
This talk compares and contrasts dominant approaches to durability for agents and presents three pillars of durable agentic systems.
-
Durable Execution
Agents must survive crashes, retries, and partial task completion. Durable execution engines like Temporal persist workflow state and enable deterministic replay. Graph-based orchestrators such as LangGraph model control flow as explicit state machines. These approaches reflect different assumptions about recovery, replayability, and operational resilience, and directly shape how agents behave under failure.
-
Durable Autonomy
Autonomous systems inevitably encounter ambiguity and incomplete information. Durable autonomy means designing agents that recognize uncertainty, escalate intelligently to humans when necessary, and resume coherently without losing progress. We’ll examine architectural patterns for human-in-the-loop integration that preserve control while maintaining forward momentum.
-
Durable Statefulness
Long-running agents cannot rely on ever-growing prompts. Some systems serialize state into resumable bursts using patterns like Anthropic’s Git-Commit approach. Others externalize cognition into layered memory architectures - separating working, episodic, semantic, or procedural memory through memory virtualization. Different workloads and time horizons demand different state strategies.
Attendees will leave with a deeper understanding of agent durability and a practical architectural framework for building resilient agents, systems designed not just to respond, but to endure.
Parminder Singh is a Silicon Valley–based entrepreneur, AI systems thinker, and engineering leader building the next generation of agentic software.
He is the Co-Founder of Redscope.ai, where he is pioneering AI agents that qualify and convert website visitors into high-intent, consented leads. He leads engineering and go-to-market, building full-stack AI systems that combine large language models, real-time web context, and autonomous workflows.
Previously, Parminder co-founded Hansel.io, scaled teams across India and the U.S and led the company through acquisition by NetcoreCloud.com. Earlier in his career, he helped build mobile products at Flipkart.com used by millions.
Parminder writes and speaks on the future of AI architecture, multi-agent systems, RAG infrastructure, and the shifting power dynamics of the AI stack. His work has been published in VentureBeat, The AI Journal, and Inc42. He regularly presents on agentic AI systems and enterprise automation.
He holds a B.Tech in Computer Science from IIIT Hyderabad.
- Agent Infrastructure
- Lightning Talks
- Model Systems
- Model Systems
- Model Systems
- Model Systems
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Data Eng & Databases
Working on ClickHouse’s managed Postgres effort, I have a front-row seat to this shift. Before ClickHouse, I co-founded PeerDB, where we saw data movement from Postgres to ClickHouse accelerate by orders of magnitude over the last couple of years, and it is still growing. That growth is not just about ETL, it signals what users actually want: transactional simplicity with analytics-grade performance, without stitching together a dozen systems.
In this talk, I will explain the pattern we are seeing, why it is accelerating now, and what it implies for the next generation of database platforms. I will then walk through the approach ClickHouse is taking, including managed Postgres, tighter Postgres and ClickHouse integration, and new primitives like pg_clickhouse and pg_stat_ch. We will also cover the replication story (including new “logical replication v2” style ideas), and the set of levers required to get closer to sub-second freshness and low operational overhead.
Finally, I will zoom out to the bigger picture: a unified database is not just “Postgres + OLAP”. It requires re-architecting parts of the stack so applications do not have to carry the abstraction burden. I will share what “world-class” looks like here, the remaining technical challenges, and a realistic path to making unified OLTP + OLAP the default for fast-growing AI workloads.
Working on ClickHouse’s managed Postgres effort, I have a front-row seat to this shift. Before ClickHouse, I co-founded PeerDB, where we saw data movement from Postgres to ClickHouse accelerate by orders of magnitude over the last couple of years, and it is still growing. That growth is not just about ETL, it signals what users actually want: transactional simplicity with analytics-grade performance, without stitching together a dozen systems.
In this talk, I will explain the pattern we are seeing, why it is accelerating now, and what it implies for the next generation of database platforms. I will then walk through the approach ClickHouse is taking, including managed Postgres, tighter Postgres and ClickHouse integration, and new primitives like pg_clickhouse and pg_stat_ch. We will also cover the replication story (including new “logical replication v2” style ideas), and the set of levers required to get closer to sub-second freshness and low operational overhead.
Finally, I will zoom out to the bigger picture: a unified database is not just “Postgres + OLAP”. It requires re-architecting parts of the stack so applications do not have to carry the abstraction burden. I will share what “world-class” looks like here, the remaining technical challenges, and a realistic path to making unified OLTP + OLAP the default for fast-growing AI workloads.
- Applied AI
Most organizations are experimenting with AI at work. Proof of concepts are thriving in small-scale silos, but how can you scale those wins across an enterprise? This talk discusses the symptoms that trap teams in the cycle of perpetual prototyping and provides a practical framework for breaking through common technical, organizational, and cultural barriers to scale. Learn how to plan for production from day one and turn your AI experiments into solutions that stick.
Most organizations are experimenting with AI at work. Proof of concepts are thriving in small-scale silos, but how can you scale those wins across an enterprise? This talk discusses the symptoms that trap teams in the cycle of perpetual prototyping and provides a practical framework for breaking through common technical, organizational, and cultural barriers to scale. Learn how to plan for production from day one and turn your AI experiments into solutions that stick.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Lightning Talks
- Data Eng & Databases
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- AI Engineering
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
- Inference Systems
- Agent Infrastructure
As models become more capable and reliable, it’s more important than ever to design tools and context thoughtfully. When models had short context windows and low agency, basic document retrieval into the model’s context window made a dramatic difference; but as frontier models boast millions of tokens of context and run for minutes and hours through tool calls and cycles of compaction, there is an ever growing list of concerns all vying for our models’ scarce attention. Extending context windows is an expensive and incomplete workaround.
In this talk, we will share some of the principles and techniques we found useful in navigating these problems ourselves as we worked on the goal of improving our assistant, Puck, from a simple retrieval-based chatbot to a deeply knowledgeable general assistant. In particular, we will touch on two prevailing challenges. First, we’ll share how we’ve sought to raise the signal-to-noise ratio in Puck’s context by thinking of context engineering itself as a search problem, involving every step of the pipeline from indexing to subagents. Second, we’ll share how Puck approaches blending faithful, up-to-date structured data queries with the richness, breadth, incompleteness, and frequent conflicts latent in unstructured data in production. We’ll walk through a few concrete tactics in detail within both of these pillars, demonstrating that creative and useful approaches often come when we stop thinking of databases, search indexes, tools, and subagents as separate components, but as different solutions to the same underlying, age-old problem: searching for signal in a confusing and noisy world.
As models become more capable and reliable, it’s more important than ever to design tools and context thoughtfully. When models had short context windows and low agency, basic document retrieval into the model’s context window made a dramatic difference; but as frontier models boast millions of tokens of context and run for minutes and hours through tool calls and cycles of compaction, there is an ever growing list of concerns all vying for our models’ scarce attention. Extending context windows is an expensive and incomplete workaround.
In this talk, we will share some of the principles and techniques we found useful in navigating these problems ourselves as we worked on the goal of improving our assistant, Puck, from a simple retrieval-based chatbot to a deeply knowledgeable general assistant. In particular, we will touch on two prevailing challenges. First, we’ll share how we’ve sought to raise the signal-to-noise ratio in Puck’s context by thinking of context engineering itself as a search problem, involving every step of the pipeline from indexing to subagents. Second, we’ll share how Puck approaches blending faithful, up-to-date structured data queries with the richness, breadth, incompleteness, and frequent conflicts latent in unstructured data in production. We’ll walk through a few concrete tactics in detail within both of these pillars, demonstrating that creative and useful approaches often come when we stop thinking of databases, search indexes, tools, and subagents as separate components, but as different solutions to the same underlying, age-old problem: searching for signal in a confusing and noisy world.
- Data Eng & Infrastructure
- Data Eng & Databases
- AI Engineering
The views, opinions, and content shared in this presentation are my own and do not represent those of any employer, past or present. I’d also like to note that my book, Data Engineering for Multimodal AI (O’Reilly), is authored independently and is neutral in perspective. It is not sponsored by Salesforce and does not represent Salesforce’s viewpoints.
The views, opinions, and content shared in this presentation are my own and do not represent those of any employer, past or present. I’d also like to note that my book, Data Engineering for Multimodal AI (O’Reilly), is authored independently and is neutral in perspective. It is not sponsored by Salesforce and does not represent Salesforce’s viewpoints.
- Data Eng & Databases
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Datadog has grown from a startup focused on infrastructure monitoring into a platform processing over a hundred trillion events daily.
Over the years, we expanded beyond metrics to include traces, logs, profiling, real user monitoring, and security. Our user base has broadened from operations to developers, analysts, and business users. More recently, automated agents have also become key consumers of our platform.
Our bottom-up culture encourages teams to take initiative. Consequently, we developed multiple specialized ingestion pipelines and query engines. These were built to satisfy strict real-time requirements and interactive experience, providing users with the insights necessary for success.
This focus on efficiency led to custom-built, proprietary solutions designed for our unique constraints. Today, however, the evolving landscape allows us to reconcile these specialized engines with open standards, blending the versatility of the ecosystem with our purpose-built designs.
In recent years, we have refactored the interfaces of these query engines to create a composable data system. This allows us to better leverage shared capabilities, enabling cross-dataset querying, advanced analytics, and more versatile access patterns.
Our goal is to scale our bottom up culture. By defining clear contracts and high-level components, we enable decentralized decision-making. This improves performance, efficiency, and flexibility across the platform, while reducing silos.
By adopting a deconstructed stack, we combine the efficiency of the open-source ecosystem with our internal capabilities to build a truly composable system. This architecture provides the flexibility to adapt to immediate and future demands, specifically addressing requirements for scale, velocity, and operational resilience, while ensuring readiness for growing challenges such as data intensive operations like AI.
In this talk, we will discuss how we rely on and contribute to key projects in the data ecosystem: Arrow for data interchange, Substrait for plans, Calcite as an optimizer, DataFusion as an execution core, and Parquet for columnar storage.
Julien Le Dem is a Principal Engineer at Datadog, serves as an officer of the ASF and is a member of the LFAI&Data Technical Advisory Council. He co-created the Parquet, Arrow and OpenLineage open source projects and is involved in several others. His career leadership began in Data Platforms at Yahoo! - where he received his Hadoop initiation - then continued at Twitter, Dremio and WeWork. He then co-founded Datakin (acquired by Astronomer) to solve Data Observability. His French accent makes his talks particularly attractive.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
Why Attend?
- Technical Deep-Dives
- Engineering Office Hours
- Hands-On Workshops
- Events & Networking
Technical Deep-Dives
Get direct insights on production systems and architectural decisions from technical leaders. Our hand-selected speakers don't just present slides. They pull back the curtain on real implementations, complete with performance metrics and hard-learned lessons.
Engineering Office Hours
An AI Council exclusive! Our signature office hours get you dedicated time with speakers for in-depth discussions in a small group setting. Meet your heroes face-to-face, debug your architecture challenges, expand on strategies and discuss the future of AI with the leaders building it.
Hands-On Workshops
Build alongside the maintainers of production AI systems. These aren't just tutorials—they're intensive technical sessions where you'll implement real solutions with guidance from the people who architected them.
Events & Networking
Get access to dozens of exclusive community-curated events where engineering discussions continue in fun, low pressure environments where the real connections happen. From our Community Drinks & Demos night to founder dinners to firesides, you won't want to miss out!
Past AI Council Talks
Learn from the engineers setting industry standards.
Billion-Scale Vector Search on Object Storage
Simon Hørup Eskildsen, Co-Founder, Turbopuffer
Mickey Liu, Software Engineer, Notion
The Future of Data Engineering in a Post-AI World
Michelle Ufford Winters, Distinguished MTS - Data & Analytics, eBay (ex- Netflix, GoDaddy, Noteable)
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
Naveen Rao, VP of AI, Databricks
George Mathew, Managing Director, Insight Partners
What Every Data Scientist Needs To Know About GPUs
Charles Frye, Developer Advocate, Modal Labs
Meet your hosts
Our expert track hosts hand-select the best talks each year from hundreds of community submissions, ensuring that our content resonates with the interests and needs of real-world data practitioners like you.
Daniel Francisco
Director of Product
Meta
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Sai Srirampur
Principal Engineer
Clickhouse
Scott Breitenother
Founder
Brooklyn Data
Tristan Zajonc
CEO & Co-Founder
Continual
AI Council 2026 - SAN FRANCISCO
Our Venue
San Francisco Marriott Marquis
Downtown SF in SOMA
780 Mission St
Reserve a room at the Marriott Marquis with our special rate.
What Builders Say
Pedram Navid, Developer Education
“AI Council is better than any conference I’ve ever been at because the talks are a higher caliber than anything I’ve ever experienced, and the people here are just second to none.”
Charles Frye, Developer Advocate
“The people who work on the tools that you use every day, the people you admire, they’re there. They want to share what they’ve been working on.”
Ryan Boyd, Co-Founder
“AI Council provides an intimate setting for interacting with other folks in the industry, whereas other conferences you may not know anyone you meet in the hallways.”
Where AI Actually Gets Built
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.