MAY 12-14, 2026 | SAN FRANCISCO, CA
The AI Conference for Humans Who Ship
Meet the world's top AI infrastructure minds at AI Council where architects of AI infrastructure share what actually works. Experience THREE DAYS of high-quality technical talks and meaningful interactions with the engineers & teams building our AI driven future.
Featured 2026 Speakers
Learn from AI heroes at top companies who explain their apps, architectures and best practices in detail. Stay tuned for more speakers coming soon!
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
- AI Engineering
- Lightning Talks
- Model Systems
- Lightning Talks
- Model Systems
- Model Systems
- Applied AI
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- Model Systems
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Lightning Talks
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
- Lightning Talks
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- AI Engineering
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
- AI Engineering
- Lightning Talks
- Model Systems
- Lightning Talks
- Model Systems
- Model Systems
- Applied AI
- Model Systems
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale.
I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run.
Practical lessons for teams training their own MoEs or scaling up sparse architectures
- Model Systems
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.
I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.
Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.
- Data Eng & Infrastructure
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
This talk goes beyond architecture diagrams to share what actually happens when you operate an agentic search engine on trillions of documents.
We'll dig into how an object storage-native design allows a small team of engineers to manage an AI search engine that scales to:
- Peak load of 1M+ writes per second and 30k+ searches per second
- 1+ trillion documents
- 5+ PB of logical data
- 400+ tenants
- p90 query latency <100 ms
- How using a modern storage architecture decreases COGS by 10x or more
- Optimizing traditional vector and FTS indexes for the high latency of object storage
- Building search algorithms that are fine-tuned for LLM-initiated searches
- A simple rate-limiting technique that provides strong performance isolation in multi-tenant environments
- Observability, reliability, and performance lessons learned from production incidents.
Attendees will leave with a concrete understanding of how separating storage from compute-and treating object storage as the primary database changes not only the cost structure, but the entire operational model of large-scale AI search.
- Coding Agents & Autonomous Dev
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
- Model Systems
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
-
Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
-
Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
-
How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
-
Detecting constraint violations and failure modes that do not appear in aggregate return metrics
-
Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
-
The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
- Lightning Talks
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
Vibe coding is fast, but it often skips the safety rails: features look fine in a demo and then break in real user flows. Especially when you iterate on them. This talk shows how to make vibe-coded web apps reliable by adding end-to-end tests with Playwright that are quick to write, stable in CI, and focused on what actually matters.
A big shift is that modern coding assistants like Cursor and Claude Code can run commands and iterate on real failures. Whats missing is the glue, between e.g. Claude Code to run a test and gets its state. I will show practical workflows for writing tests faster using MCP Skills and Playwright MCP, both in an editor and inside Claude Code environments.
Based on lessons from building multiple websites over the last months, I will share a repeatable approach for growing a small, high-signal test suite that keeps up with rapid development and gives you the confidence to ship more changes without fear.
- Lightning Talks
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
While typical LLM chatbot can be seen as a more sophisticated Google search, we're beginning to expect more from Agents: Don't just give an answer but actually perform the task. Security, bias, privacy etc suddenly become non-negotiable. Only after we handle these complexities, we can unblock real usecases in healthcare, finance, legal etc.
This talk tackles why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects & more. How to build evaluation datasets that actually reflect production scenarios, not just cherry-picked examples. We'll cover automated evaluation pipelines using LLM-as-judge patterns, and when you can not avoid human in the loop. The session addresses detecting regressions before users do: setting up continuous evaluation that catches model degradation. Tricky cases when agent aces public evals but fails in production, and how to build evaluations that predict real-world performance.
- Lightning Talks
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Why does AI struggle with quantitative data, and where are there divergences from human experts?
- What data representations are most meaningful for today's frontier AI models?
- How can better data representation let AI become a partner in data science?
- What are the best practices for deploying AI to teams to maximize acceleration without compromising quality or security?
- Applied AI
- AI Engineering
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
In this session, Kshitij Grover explains how to build pricing infrastructure with extreme accuracy, high transparency, and the intelligence to make fast, iterative pricing changes. The talk covers why precise cost attribution matters even when AI features are not directly monetized yet, how transparent usage and controls build customer trust, and why flexibility is essential as workloads, margins, and buying processes evolve.
The session is grounded in real-world examples from high-growth AI and SaaS companies. It shows how teams are pricing AI agents in practice using usage-based models, prepaid credits, outcome-aligned pricing, and guardrails that protect margins as scale increases. Attendees will leave with concrete guidance on how to design pricing as a system that aligns incentives, adapts over time, and turns AI pricing into a durable competitive advantage.
Real Industry Intelligence
For 13 years, AI Council has brought together the engineers behind breakthrough AI systems: researchers solving training at scale, teams optimizing inference, and practitioners shipping models to millions.
Technical Attendees
Speakers
Years Running
Why Attend?
- Technical Deep-Dives
- Engineering Office Hours
- Hands-On Workshops
- Events & Networking
Technical Deep-Dives
Get direct insights on production systems and architectural decisions from technical leaders. Our hand-selected speakers don't just present slides. They pull back the curtain on real implementations, complete with performance metrics and hard-learned lessons.
Engineering Office Hours
An AI Council exclusive! Our signature office hours get you dedicated time with speakers for in-depth discussions in a small group setting. Meet your heroes face-to-face, debug your architecture challenges, expand on strategies and discuss the future of AI with the leaders building it.
Hands-On Workshops
Build alongside the maintainers of production AI systems. These aren't just tutorials—they're intensive technical sessions where you'll implement real solutions with guidance from the people who architected them.
Events & Networking
Get access to dozens of exclusive community-curated events where engineering discussions continue in fun, low pressure environments where the real connections happen. From our Community Drinks & Demos night to founder dinners to firesides, you won't want to miss out!
Past AI Council Talks
Learn from the engineers setting industry standards.
Billion-Scale Vector Search on Object Storage
Simon Hørup Eskildsen, Co-Founder, Turbopuffer
Mickey Liu, Software Engineer, Notion
The Future of Data Engineering in a Post-AI World
Michelle Ufford Winters, Distinguished MTS - Data & Analytics, eBay (ex- Netflix, GoDaddy, Noteable)
Data Meets Intelligence: Where the Data Infra & AI Stack Converge
Naveen Rao, VP of AI, Databricks
George Mathew, Managing Director, Insight Partners
What Every Data Scientist Needs To Know About GPUs
Charles Frye, Developer Advocate, Modal Labs
Meet your hosts
Our expert track hosts hand-select the best talks each year from hundreds of community submissions, ensuring that our content resonates with the interests and needs of real-world data practitioners like you.
Daniel Francisco
Director of Product
Meta
Dhruv Singh
Co-founder & CTO
HoneyHive AI
Sai Srirampur
Principal Engineer
Clickhouse
Scott Breitenother
Founder
Brooklyn Data
Tristan Zajonc
CEO & Co-Founder
Continual
AI Council 2026 - SAN FRANCISCO
About the Venue
San Francisco Marriott Marquis
780 Mission St, San Francisco May 12 - 14, 2026
Reserve a room at the Marriott Marquis with our special rate.
What Builders Say
Pedram Navid, Developer Education
“AI Council is better than any conference I’ve ever been at because the talks are a higher caliber than anything I’ve ever experienced, and the people here are just second to none.”
Charles Frye, Developer Advocate
“The people who work on the tools that you use every day, the people you admire, they’re there. They want to share what they’ve been working on.”
Ryan Boyd, Co-Founder
“AI Council provides an intimate setting for interacting with other folks in the industry, whereas other conferences you may not know anyone you meet in the hallways.”
Where AI Actually Gets Built
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.