Building a Data Pipeline from Scratch

Big data processing with Apache Hadoop, Spark, Storm and friends is all the rage right now. But getting started with one of these systems requires an enormous amount of infrastructure, and there are an overwhelming number of decisions to be made. Oftentimes you don't even know what kinds of questions you can or should be answering with your data.

As a first step, Joe Crobak (Software Engineer, Project Florida) describes the types of problems that people typically solve with a data pipeline—things like A/B testing and data warehousing. Then, drawing from his personal experience of building data tools at Foursquare and a from-scratch data pipeline at a new startup, he'll highlight the key questions to ask and best practices you should implement to encourage success.

57:54

This talk was presented at the Axial Lyceum in NYC.

Looking for more tech talks and articles? You can subscribe to our newsletter or check out our YouTube channel.

Want to hear from more top engineers?

Our weekly email contains the best software development content and interviews with top CTOs. Enter your email address now to stay in the loop.