Large language models can write poetry and debug code, but ask them to reason about physical systems at scale and they collapse. Why? Because natural language is fundamentally the wrong representation for how the world actually works. The future of AI isn't better language models: it's joint-embedding architectures that bridge natural language with formal domain-specific languages where physics, causality, and constraints live.
This talk introduces a framework for world models that operate across multiple representation spaces simultaneously. Instead of forcing everything through the bottleneck of natural language tokens, we learn joint embeddings that align:
- Natural language descriptions with formal specifications
- High-level goals with executable domain-specific languages (DSLs)
- Physical constraints with learned dynamics
- Human intent with machine-verifiable semantics
The key insight: scalability doesn't come from bigger transformers, it comes from the right representations. DSLs give us composability, verifiability, and orders-of-magnitude efficiency gains that natural language simply cannot provide. But humans think in natural language. The breakthrough is alignment mechanisms that map fluidly between NL and FL while preserving semantic structure.
I'll demonstrate how this enables world models that scale to complex physical systems, generalize beyond training distributions, and critically; fail gracefully with interpretable error modes. We'll see applications from healthcare, networking to manufacturing systems where traditional end-to-end learning hits fundamental walls.
The next generation of AI won't just chat: it will build, verify, and scale.