Neville Li Neville Li on

Learn about Scio, a Scala API for Google Cloud Dataflow (incubated as Apache Beam). Apache Beam offers a simple, unified programming model for both batch and streaming data processing while Scio brings it much closer to the high level API many data engineers are familiar with, e.g. Spark and Scalding. Neville will cover design and implementation of the framework, including features like typesafe BigQuery macros, REPL, and serialization. There will also be a live coding demo.

Neville is a software engineer at Spotify who works mainly on data infrastructure and tools for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.

This talk was given at the NYC Data Engineering meetup in June 2016.

Continue
Reuven Lax Reuven Lax on

Reuven will cover the Beam programming model, and the advantages of hosted Google Cloud Dataflow.

Reuven has been a Google engineering since 2006. In that time, he's been instrumental in building Google's streaming data-processing systems from MillWheel to Cloud Dataflow.

This talk was given at the NYC Data Engineering meetup in June 2016.

Continue
Sadayuki Furuhashi Sadayuki Furuhashi on

In production environments, it usually takes several applications and team members working together to accomplish moving data from one place to another. This problem can surface in companies of any size but is especially problematic when working at scale. This is because, when the data is being collected, it can come from different sources and likely in different formats which adds obvious complexity. Even if data is collected right, moving it at scale present other challenges that needs proper handling: duplicates, multiple destinations, exceptions and more.

Continue
Calvin French-Owen Calvin French-Owen on

Data is critical to building great apps. Engineers and analysts can understand how customers interact with their brand at any time of the day, from any place they go, from any device they're using - and use that information to build a product they love. But there are countless ways to track, manage, transform, and analyze that data. And when companies are also trying to understand experiences across devices and the effect of mobile marketing campaigns, data engineering can be even trickier. What’s the right way to use data to help customers better engage with your app?

In this all-star panel hear from mobile experts at Instacart, Branch Metrics, Pandora, Invoice2Go, Gametime and Segment on the best practices they use for tracking mobile data and powering their analytics.

Che Horder is the Director of Analytics at Instacart, and previously led a team data science and engineering team at Netflix as Director of Marketing Analytics.

Gautam Joshi is the Engineering Program Manager of Analytics at Pandora and formerly worked at CNET/CBSi and Rdio. He helped create sustainable solutions for deriving meaning from large datasets. He’s a huge fan of music and technology, a California native and a proud Aggie.

Mada Seghete is the co-founder of Branch Metrics, a powerful tool that helps mobile app developers use data to grow and optimize their apps.

Beth Jubera is Senior Software Engineer at Invoice2Go, and was previously a Systems Engineer at IBM.

John Hession is VP of Growth at Gametime, and was previously Director of Mobile Operations and Client Strategy at Conversant.

Continue
Joey Echeverria Joey Echeverria on

Real-time stream analysis starts with ingesting raw data and extracting structured records. While stream-processing frameworks such as Apache Spark and Apache Storm provide primitives for processing individual records, processing windows of records, and grouping/joining records, the process of performing common actions such as filtering, applying regular expressions to extract data, and converting records from one schema to another are left to developers writing business logic.

Continue