Rafe Coburn Rafe Coburn on

Three years ago, Etsy's analytics data pipeline was built around a pixel hosted on Akamai, FTP uploads, and Amazon EMR. Rafe Colburn, manager of the data engineering team at Etsy, talks about their migration to a data ingestion pipeline based on Kafka. He gives an overview on how they rebuilt their data pipeline without disrupting ongoing analytics work, as well as the tradeoffs made in building these systems.

Jay Kreps Jay Kreps on

Apache Kafka committer, Jay Kreps from LinkedIn, walks through a brief production timeline for Kafka. Jay goes over what's new with 0.8.2 and how to get the most out of new features like Log Compaction and the new Java producer. Jay also gives an overview what to expect from 0.9(?): a new consumer, better security and operational improvements.

Todd Palino Todd Palino on

LinkedIn runs one of the largest installations of Kafka in the world. In this talk, Todd Palino and Clark Haskins (Site Reliability, LinkedIn) discuss Kafka from an operations point of view. You'll learn the use cases for Kafka and the tools LinkedIn has been developing to improve the management of deployed clusters. They also talk about some of the challenges of managing a multi-tenant data service and how to avoid getting woken up at 3 AM.

Neville Li Neville Li on

This is the first time that a Spotify engineer has spoken publicly about their deployment and use cases for Storm! In this talk, Software Engineer Neville Li describes:

  • Real-time features developed using Storm and Kafka including recommendations, social features, data visualization and ad targeting

  • Architecture

  • Production integration

  • Best practices for deployment


Joe Stein Joe Stein on

In this talk, Joe Stein, Apache Kafka committer, member of the PMC, and Founder and Principal Architect at Big Data Open Source Security, will talk on Apache Kafka an open source, distributed publish-subscribe messaging system. Joe will focus on how to get started with Apache Kafka, how replication works and more! Storm is a great system for real-time analytics and stream processing but to get the data into Storm, you need to collect your data streams with consistency and availability at high loads and large volumes. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. This talk was recorded at the NYC Storm User Group meetup at WebMD Health.