Scio, a new Scala API for Google Cloud Dataflow
Learn about Scio, a Scala API for Google Cloud Dataflow (incubated as Apache Beam). Apache Beam offers a simple, unified programming model for both batch and streaming data processing while Scio brings it much closer to the high level API many data engineers are familiar with, e.g. Spark and Scalding. Neville will cover design and implementation of the framework, including features like typesafe BigQuery macros, REPL, and serialization. There will also be a live coding demo.
Neville is a software engineer at Spotify who works mainly on data infrastructure and tools for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.
This talk was given at the NYC Data Engineering meetup in June 2016.