Technical Talks
Benchmarking AI Agents Against Realistic Analytical Tasks with ADE-bench
Missing value detected...
Video will be populated after the conference
- Coding Agents & Autonomous Dev
There are many benchmarks that attempt to measure how well LLMs and AI agents can write SQL queries or do complicated statistical analysis. But as most practitioners know, this is only a small part of our job. Before we can write a query, we have to figure out the business context behind the question. We have figure out which tables to use in a messy database. We have to make subjective decisions about vaguely defined problems. All of this makes benchmarking analytical agents difficult.
We built a new benchmark—ADE-bench—that aspires to do exactly that. It gives agents complex analytical environments to work in and ambiguous tasks to solve, and measures how well they perform.
In this talk, we'll share how we built the benchmark, the results of our tests, a bunch of things we learned along the way, and what we think is coming next. The benchmark harness is open source, and can be found here: https://github.com/dbt-labs/ade-bench
Founder
Benn Stancil
Mode
Benn Stancil is a cofounder of Mode, an analytics and BI company that was bought by ThoughtSpot in 2023. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams; at ThoughtSpot, he was the Field CTO. More recently, Benn worked on the analytics team on the Harris for President campaign. He regularly writes about data and technology at benn.substack.com.
Director, DX + AI
Jason Ganz
dbt Labs
Jason Ganz used to call himself a futurist but frankly isn't certain what one can do with that word these days. He is the Director of Developer Experience and AI at dbt Labs. You can find him across the internet thinking about how to build resilience and navigate the AI transition while retaining our humanity, treating each other well and having fun.
Discover the data-driven foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it by subscribing to our newsletter today!