Most CDC pipelines work fine when you're building an MVP. Ours did too - until they didn't. Artie is a real-time data replication platform that processes 20-30 billion events per day across thousands of pipelines with sub-minute latency on 90% of them. Three years ago we were running a forked version of Debezium with Kafka processing millions of rows. Along the way, many assumptions we started with broke.
This talk is a post-mortem of what failed, what we rebuilt, and the decisions that matter at scale:
Robin Tang is the Co-Founder and CTO of Artie, a real-time data replication platform that moves data across databases, warehouses, and lakes. Before founding Artie, Robin built data infrastructure at scale and saw firsthand how brittle existing CDC tooling became under production load. At Artie, he leads the engineering team that replaced Debezium with a fully custom streaming architecture now processing trillions of rows for customers including Substack, ClickUp, and Alloy. Robin writes and speaks about the practical challenges of data replication - schema evolution, transactional integrity, and the edge cases that only surface at scale.