Technical Talks

Vik Korrapati

CTO | M87 Labs (Moondream)

No Dropped Frames: Designing a VLM around a Latency Budget

Model Systems

Moondream is a vision language model that runs in real time on video streams. This talk covers the model-side work behind it.

I'll start with architecture: upcycling from dense to MoE, and the tradeoffs when you're optimizing for latency rather than just parameter count. Then tokenization: why we built a custom SuperBPE tokenizer and what it bought us. The goal throughout was to avoid modeling decisions that would hurt us at inference time.

I'll also cover training infrastructure. We wrote custom training engines and RL systems because existing open source projects were pushing us toward design decisions that didn't fit. I'll talk about where we diverged and what we got out of it.

Finally, inference. Real-time VLM isn't just a serving problem or a modeling problem. We built a custom inference engine alongside the model, and I'll cover how the two informed each other.

CTO

Vik Korrapati

M87 Labs (Moondream)