2026 Talks
Testing for Bias in Production ML Services
Missing value detected...
Video will be populated after the conference
Testing deployed ML services is different from testing traditional software. Unlike software with deterministic outcomes, ML systems operate in probabilities and can return different results over time.
At TinyData, we make tools that help ensure the safety of production ML systems. To demonstrate this, I will showcase the approach we took to testing 4 commercial ML systems for gender bias. By having the ability to easily generate datasets for blackbox testing, we could find large categories of images that result in gender labelling errors. We will then discuss the workflow required for turning these error-producing datasets into training data for improving the systems.
The AI Conference for Humans Who Ship
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.