Big data processing with Apache Hadoop, Spark, Storm and friends is all the rage right now. But getting started with one of these systems requires an enormous amount of infrastructure, and there are an overwhelming number of decisions to be made. Oftentimes you don't even know what kinds of questions you can or should be answering with your data.
As a first step, Joe Crobak (Software Engineer, Project Florida) describes the types of problems that people typically solve with a data pipeline—things like A/B testing and data warehousing. Then, drawing from his personal experience of building data tools at Foursquare and a from-scratch data pipeline at a new startup, he'll highlight the key questions to ask and best practices you should implement to encourage success.
This talk was presented at the Axial Lyceum in NYC.