Buy Early Bird Tickets for AI Council 2026!

Technical Talks

Ezi Ozoani
Ezi Ozoani
CTO | Aethon.fund

Lessons From RL Systems That Looked Fine Until They Didn't

video
Missing value detected...
Video will be populated after the conference

  • Model Systems

Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.

This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.

Topics include:

  • Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
  • Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
  • How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
  • Detecting constraint violations and failure modes that do not appear in aggregate return metrics
  • Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
  • The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.

Ezi Ozoani

CTO

Ezi Ozoani

Aethon.fund

Bio Coming Soon