Collecting and Moving Data at Scale

In production environments, it usually takes several applications and team members working together to accomplish moving data from one place to another. This problem can surface in companies of any size but is especially problematic when working at scale. This is because, when the data is being collected, it can come from different sources and likely in different formats which adds obvious complexity. Even if data is collected right, moving it at scale present other challenges that needs proper handling: duplicates, multiple destinations, exceptions and more.

In this presentation, Sadayuki will dissect the challenges described and share his experience developing two open source solutions to address these problems: Fluentd and Embulk.

Sadayuki Furuhashi is an open-source hacker who wrote original code of MessagePack, Fluentd and Embulk projects. He is also a founder and architect of Treasure Data, Inc. and works on distributed storage and query engines.

This talk was given at SF DataEngConf in April 2016.