Intro to Apache Kudu

Big Data applications need to ingest streaming data and analyze it. HBase is great at ingesting streaming data but not so good at analytics. On the other hand, HDFS is great at analytics but not at ingesting streaming data. Frequently applications ingest data into HBase and then move it to HDFS for analytics.

What if you could use a single system for both use cases? This could dramatically simplify your data pipeline architecture.

Enter Apache Kudu. Kudu is a storage system that lives between HDFS and HBase. It is good at both ingesting streaming data and good at analyzing it using Spark, MapReduce, and SQL.

This talk was given as a joint event from SF Data Engineering and SF Data Science, and Asim Jalis (Lead Instructor, Data Engineering Immersive, Galvanize SF) is the speaker.