Technical Talks
Lessons From RL Systems That Looked Fine Until They Didn't
Missing value detected...
Video will be populated after the conference
- Model Systems
Reinforcement learning systems often fail not because rewards are wrong, but because optimization pressure is unbounded. Policies exploit edge cases, drift over time, and converge to brittle strategies that look fine in training but break in deployment, especially under bounded actions, safety requirements, resource budgets, and long-term user impact.
This talk focuses on controlling optimization directly: practical techniques for training RL agents that remain stable and predictable under hard constraints. Rather than modifying rewards, we explore structural and system-level approaches that shape behavior by construction.
Topics include:
- Why reward penalties alone fail to enforce hard constraints under scale and distribution shift
- Structural constraint mechanisms such as action masking, feasibility filters, and sandboxed execution
- How training inside hard boundaries changes policy behavior and improves long-horizon stability, including across retraining cycles
- Detecting constraint violations and failure modes that do not appear in aggregate return metrics
- Lessons from applying constrained RL in production-like systems, including failures only discovered after deployment and what ultimately stopped them
- The goal is to share concrete algorithmic and system design strategies for deploying reinforcement learning in settings where violations are suboptimal.
CTO
Ezi Ozoani
Aethon.fund
Bio Coming Soon
Discover the data-driven foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it by subscribing to our newsletter today!