MAY 12-14, 2026 | SAN FRANCISCO, CA
The AI Conference for Humans Who Ship
Meet the world's top AI infrastructure minds at AI Council where architects of AI infrastructure share what actually works. Experience THREE DAYS of high-quality technical talks and meaningful interactions with the engineers & teams building our AI driven future.
Speakers Who Ship Code, Not Slides
AI Council features hand-selected engineers building production systems, teams shipping billion-parameter inference and experts solving AI's hardest challenges.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
- AI Engineering
- Lightning Talks
- Model Systems
- Lightning Talks
- Model Systems
- Model Systems
- Applied AI
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- Model Systems
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
- Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
- Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
- How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
- Detecting constraint violations and failure modes that do not appear in aggregate return metrics
- Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
- The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
- Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
- Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
- How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
- Detecting constraint violations and failure modes that do not appear in aggregate return metrics
- Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
- The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- AI Engineering
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
- AI Engineering
- Lightning Talks
- Model Systems
- Lightning Talks
- Model Systems
- Model Systems
- Applied AI
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- Model Systems
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
- Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
- Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
- How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
- Detecting constraint violations and failure modes that do not appear in aggregate return metrics
- Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
- The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
- Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
- Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
- How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
- Detecting constraint violations and failure modes that do not appear in aggregate return metrics
- Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
- The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- AI Engineering
Real Industry Intelligence
For 13 years, AI Council has brought together the engineers behind breakthrough AI systems: researchers solving training at scale, teams optimizing inference, and practitioners shipping models to millions.
Technical Attendees
Speakers
Years Running
Why Attend?
- Technical Deep-Dives
- Engineering Office Hours
- Hands-On Workshops
- Events & Networking
Technical Deep-Dives
Get direct insights on production systems and architectural decisions from technical leaders. Our hand-selected speakers don't just present slides. They pull back the curtain on real implementations, complete with performance metrics and hard-learned lessons.
Engineering Office Hours
An AI Council exclusive! Our signature office hours get you dedicated time with speakers for in-depth discussions in a small group setting. Meet your heroes face-to-face, debug your architecture challenges, expand on strategies and discuss the future of AI with the leaders building it.
Hands-On Workshops
Build alongside the maintainers of production AI systems. These aren't just tutorials—they're intensive technical sessions where you'll implement real solutions with guidance from the people who architected them.
Events & Networking
Get access to dozens of exclusive community-curated events where engineering discussions continue in fun, low pressure environments where the real connections happen. From our Community Drinks & Demos night to founder dinners to firesides, you won't want to miss out!
Past AI Council Talks
Learn from the engineers setting industry standards.
Billion-Scale Vector Search on Object Storage
Simon Hørup Eskildsen, Co-Founder, Turbopuffer
Mickey Liu, Software Engineer, Notion
The Future of Data Engineering in a Post-AI World
Michelle Ufford Winters, Distinguished MTS - Data & Analytics, eBay (ex- Netflix, GoDaddy, Noteable)
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
Naveen Rao, VP of AI, Databricks
George Mathew, Managing Director, Insight Partners
What Every Data Scientist Needs To Know About GPUs
Charles Frye, Developer Advocate, Modal Labs
AI Council 2026 - SAN FRANCISCO
About the Venue
San Francisco Marriott Marquis
780 Mission St, San Francisco May 12 - 14, 2026
Reserve a room at the Marriott Marquis with our special rate.
What Builders Say
Pedram Navid, Developer Education
“AI Council is better than any conference I’ve ever been at because the talks are a higher caliber than anything I’ve ever experienced, and the people here are just second to none.”
Charles Frye, Developer Advocate
“The people who work on the tools that you use every day, the people you admire, they’re there. They want to share what they’ve been working on.”
Ryan Boyd, Co-Founder
“AI Council provides an intimate setting for interacting with other folks in the industry, whereas other conferences you may not know anyone you meet in the hallways.”
Where AI Actually Gets Built
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.