Data Pipeline at Tapad

Tapad's data pipeline is an elastic combination of technologies (Kafka, Hadoop, Avro, Scalding) that forms a reliable system for analytics, realtime and batch graph-building, and logging. In this talk, Tapad Senior Software Developer Toby Matejovsky speaks about the creation and evolution of the pipeline. He demonstrates a concrete example – a day in the life of an event tracking pixel. Toby also talks about common challenges that his team has overcome such as integrating different pieces of the system, schema evolution, queuing, and data retention policies.


We have many more articles on Apache Kafka. Check out our collection of our top Apache Kafka tech talks.

This talk was given at the NYC Data Engineering meetup hosted at Spotify HQ in NYC.