Sail builds efficient inference systems for background agents. In this talk, I’ll describe how we optimize for throughput over latency at every level of the stack, from silicon to API. I’ll also describe what we’ve learned from customers about designing effective background agents, e.g. how to build an async harness, deal with very long contexts, and run agent sandboxes at scale.
Coming Soon