RLVR in Practice: From Synthetic Data to GRPO

Model Systems

Reinforcement Learning from Verifiable Rewards (RLVR) is increasingly common in post-training pipelines, but the practical details are often glossed over. How do you design reward functions that programmatically verify model outputs? What makes synthetic training data effective? How do you build a custom RL environment that doesn't silently break your training?

Product Research Engineer

Chris Alexiuk

NVIDIA

Chris Alexiuk is a deep learning developer advocate at NVIDIA, working on creating technical assets that help developers use the incredible suite of AI tools available at NVIDIA. Chris comes from a machine learning and data science background, and he is obsessed with everything and anything about large language models.

2026 Talks

RLVR in Practice: From Synthetic Data to GRPO

Model Systems

The AI Conference for Humans Who Ship