Technical Talks

View All

Building a Self-Service Platform for Continuous, Real-Time Feature Generation for Machine Learning

Sherin Thomas Sherin Thomas | Software Engineer | Lyft

At Lyft, all our systems, including client applications generate many millions of events per second. These events are ingested by the event ingestion pipeline and streamed through Kinesis and Kafka and also available in persistent stores such as Hive for offline consumption.

This data can be used to generate features for ML models as well as for other kinds of real time decision making. Our Research Scientists and Data Scientists have come up with algorithms to get features from data. However, the challenge lies in doing this quickly, correctly, effectively and reliably. For this we have built a self service platform using Flink, Beam and Kubernetes that can be used to write, prototype and deploy stateful computations on high throughput streaming data.

With this platform we have tried to abstract out the challenges of dealing with provisioning, data discovery, bootstrapping, skew, late arriving and unordered events, downtime etc, so that our experts can focus on what they do best without having to worry about what goes on behind the scenes.

In this talk I will be discussing the architecture, key takeaways, lessons learned as well as wins!

Sherin Thomas
Sherin Thomas
Software Engineer | Lyft

Sherin Thomas is a Software Engineer at Lyft. Currently she's building a self-serve, real-time feature generation platform for Machine Learning usecases, using Apache Flink, Beam, Kafka.