Anna Smith from Rent the Runway talks about how they've evolved their data pipeline over time to deal with infrastructure constraints, disparate data sources, and changing data sources/quality all while still serving reports and data back to the website with minimal downtime. Anna also covers how they leveraged Luigi to ensure robust reporting without forcing non-technical analysts to learn Python.


This video was recorded at the NYC Data Engineering meetup at Spotify in NYC.

In this talk, Joe Crobak, formerly from Foursquare, will give a brief overview of how a workflow engine fits into a standard Hadoop-based analytics stack. He will also give an architectural overview of Azkaban, Luigi, and Oozie, elaborating on some features, tools, and practices that can help build a Hadoop workflow system from scratch or improve upon an existing one. This talk was recorded at the NYC Data Engineering meetup at Ebay.