2026 Talks
Welcome to Our Data Benchmark, Where Everything's Made Up and the Points Don't Matter
Missing value detected...
Video will be populated after the conference
- Analytics & Data Sci
SPIDER, DSBench, and every other analytics benchmark treat data work like a pub quiz: here's a question, here's the answer, did you match? But real analytics is arguing about whether "revenue" means bookings or collections, discovering that Stripe amounts are in cents while your platform stores dollars, and figuring out why the numbers don't tie to last quarter's deck. Current benchmarks can't express any of that—they just check if you got 47.3. Worse: smarter models keep making the same mistakes. Opus is clearly more intelligent than Sonnet, but it falls into the same traps—path of least resistance, accepts the first answer, doesn't ask clarifying questions. We'll show specific examples where industry standard benchmarks fail (including our own) and share some ideas for evals that test what analysts actually do: learn a messy warehouse over time, not answer a frozen question on day zero.
AI Engineer
Izzy Miller
HEX
Bio Coming Soon
The AI Conference for Humans Who Ship
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.