Buy Your Tickets for AI Council 2026!

2026 Talks

Izzy Miller
Izzy Miller
AI Engineer | HEX

Welcome to Our Data Benchmark, Where Everything's Made Up and the Points Don't Matter

  • Analytics & Data Sci

SPIDER, DSBench, and every other analytics benchmark treat data work like a pub quiz: here's a question, here's the answer, did you match? But real analytics is arguing about whether "revenue" means bookings or collections, discovering that Stripe amounts are in cents while your platform stores dollars, and figuring out why the numbers don't tie to last quarter's deck. Current benchmarks can't express any of that—they just check if you got 47.3. Worse: smarter models keep making the same mistakes. Opus is clearly more intelligent than Sonnet, but it falls into the same traps—path of least resistance, accepts the first answer, doesn't ask clarifying questions. We'll show specific examples where industry standard benchmarks fail (including our own) and share some ideas for evals that test what analysts actually do: learn a messy warehouse over time, not answer a frozen question on day zero.

Izzy Miller

AI Engineer

Izzy Miller

HEX

Bio Coming Soon