Optimizing Model Training End-to-End: A Tiny MoE Case Study

Model Systems

Cloud compute is expensive, and wasting runs on the guise of a "just scale will fix any problems" leaves you with less time to fix errors, and less compute to train the model you want. In this talk, I will discuss what are the easy optimizatiosn you might miss (minimizing communications, using the most effective algorithms, ensuring you're getting the most FLOPs possible) at the small scale, before ensuring that when you do scale up nothing is going to waste. In this particular talk, I'll be focusing on what worked at home, that then let me scale it further onto the cloud.

Head of Dev Rel

Zach Mueller

Lambda

Machine learning practitioner who began in 2019 through the fast.ai program, with experience building computer vision and regression models for academic research. A recognized expert and leading contributor to the fastai ecosystem, including fastai inference, with endorsement from Jeremy Howard. Passionate about applying ML to environmentally focused projects and community-driven development.

2026 Talks

Optimizing Model Training End-to-End: A Tiny MoE Case Study

Model Systems

The AI Conference for Humans Who Ship