2026 Talks
Scaling CDC to Trillions of Rows: What Broke, What We Rebuilt, and What AI Demands Next
Missing value detected...
Video will be populated after the conference
- Data Eng & Databases
Most CDC pipelines work fine when you're building an MVP. Ours did too - until they didn't. Artie is a real-time data replication platform that processes 20-30 billion events per day across thousands of pipelines with sub-minute latency on 90% of them. Three years ago we were running a forked version of Debezium with Kafka processing millions of rows. Along the way, many assumptions we started with broke.
This talk is a post-mortem of what failed, what we rebuilt, and the decisions that matter at scale:
- Why we replaced Debezium - single-threaded capture, limited extensibility, and no built-in recovery forced us to build a proprietary Reader from scratch to increase fault tolerance
- Parallel backfills without data loss - running historical loads alongside live CDC using primary-key range chunking and exactly-once merge semantics, following Netflix's DBLog pattern
- Fan-in from thousands of single-tenant databases - consolidating sharded or single-tenant sources into unified destination schemas without bespoke ETL per tenant
- Edge cases at scale - five-digit-year timestamps, negative years, non-JSON in JSONB, non-UTF8 encodings, and why we chose to fail hard rather than silently skip data (and the recovery mechanisms that make that practical in production)
- Schema evolution - automatic column adds, type changes, drops, and notifications so teams know what changed
Finally: AI workloads have the same freshness problem databases have always had, but the sources are no longer just databases - they are filesystems, object stores, git repos, and documents. We will share how Artie is extending its core primitives beyond databases to become the sync layer for any data AI systems depend on.
Attendees will leave with concrete architectural patterns for building CDC systems that survive at scale, a checklist of failure modes, and a framework for thinking about real-time data as AI infrastructure.
Co-founder & CEO
Robin Tang
Artie
Robin Tang is the Co-Founder and CTO of Artie, a real-time data replication platform that moves data across databases, warehouses, and lakes. Before founding Artie, Robin built data infrastructure at scale and saw firsthand how brittle existing CDC tooling became under production load. At Artie, he leads the engineering team that replaced Debezium with a fully custom streaming architecture now processing trillions of rows for customers including Substack, ClickUp, and Alloy. Robin writes and speaks about the practical challenges of data replication - schema evolution, transactional integrity, and the edge cases that only surface at scale.
The AI Conference for Humans Who Ship
While other conferences theorize, AI Council features the engineers shipping tomorrow's breakthroughs today.