Two Hulu Engineers Explain Their Approach to Scalability

Matt Jurik's talk about the architecture of Hulu's video progress service piqued my interest in Hulu's approach to scalability. In a Planet Cassandra interview, Matt's teammate Andreas Rangel (Senior Software Engineer, Hulu) explains how they've scaled their data model. Here's what we learned:

1. Hulu's keyspace is a beast:

  • Primary C* cluster runs version 1.2.12

  • 16 nodes split between 2 datacenters

  • Our watch history keyspace contains several billion CQL3 rows with approximately 1TB of data per datacenter

  • Individual nodes are 12-core machines with 48GB RAM using multiple SSDs in RAID5 configuration

2. At any given time, there are hundreds of engineers logged-in to the #Cassandra IRC channel

3. Now's the time to start learning about Cassandra's internals:

"...Having a high-level understanding of some of the internals such has how deletions are implemented, how secondary indices operate, and when to use the row cache can go a long way in designing a strong application built atop Cassandra." - Andreas Rangel

Check out these #CassandraWeek resources to start learning: