Running Real-Time Queries with Spark and Shark on Top of Cassandra Data

Evan Chan (Software Engineer, Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data. He starts by surveying the Cassandra analytics landscape, including Hadoop and HIVE, and touches on the use of custom input formats to extract data from Cassandra. Then, he dives into Spark and Shark (two memory-based cluster computing frameworks) and explains how they enable often dramatic improvements in query speed and productivity.


This talk was given at Cassandra Day Silicon Valley 2014. If you enjoyed this post, be sure to check out Patrick McFadin's talk about CQL Data Models.