Christos Kalantzis Christos Kalantzis on

In this talk, Christos Kalantzis (Cloud Persistence Engineering Manager, Netflix), explains why he believes companies should adopt an optimistic software design model that is similar to what is used at Netflix. This is a practical talk with strategies for implementing the design model, testing for eventual consistency, and convincing your organization that it's the right move.

Continue
Michael Kjellman Michael Kjellman on

Making and implementing a C* Migration Plan
Migrating to a new database is hard: really hard. It’s almost impossible to do it perfectly. I’d like to break the migration issue into two parts: 1) maintaining integrity of your data during import and migration and 2) how to operationally plan and code a migration plan to migrate from MySQL to C* downtime free (fingers crossed!).

Continue
Evan Chan Evan Chan on

Evan Chan (Software Engineer, Ooyala), describes his experience using the Spark and Shark frameworks for running real-time queries on top of Cassandra data. He starts by surveying the Cassandra analytics landscape, including Hadoop and HIVE, and touches on the use of custom input formats to extract data from Cassandra. Then, he dives into Spark and Shark (two memory-based cluster computing frameworks) and explains how they enable often dramatic improvements in query speed and productivity.

Continue
Les Hazlewood Les Hazlewood on

Need to scale user session loads? Les Hazlewood (Co-Founder and CTO of Stormpath), explains Shiro's enterprise session management capabilities and how to use Cassandra as Shiro's session store. This enables a distributed session cluster supporting hundreds of thousands or even millions of concurrent sessions. As a working example, Les will show how to set up a session cluster in under 10 minutes using Cassandra.

Continue
Jimmy Mårdell Jimmy Mårdell on

Spotify users have generated 1 billion+ playlists. At peak usage, over there are over 40,000 requests per second - not to mention support for "offline mode" and concurrent changes. In this excellent talk from C* Summit EU, Jimmy Mardell (Developer, Spotify) gives an overview of Spotify's playlist architecture, the Cassandra data model, and lessons learned working with Cassandra.

Continue
Michael Kjellman Michael Kjellman on

I won’t lie (or conveniently fail to mention) that I have lost many nights of sleep due to Cassandra. I’ve certainly reflected and asked myself, “Was it really worth it?” Some of the sleepless nights were due to encountering previously unknown bugs, which have since been fixed. Other sleepless nights were caused by bad and misinformed decisions myself and my co-workers made while performing various C* operations. Implemented correctly, distributed computing brings lots of potential to your application. You can improve performance by distributing work across many physical (and inexpensive!) machines. Additionally, a database like Cassandra was designed from the beginning with replication in mind. Ensuring there are multiple copies of a dataset across multiple nodes and datacenters in distant geographical regions is not an afterthought (unlike MySQL replication). However, the many advantages and benefits of distributed computing come with the tradeoff of increased complexity.

Continue
Patrick McFadin Patrick McFadin on

Three years ago, I was stuck trying to get a use case fit into my Oracle database. It was getting expensive fast and I was running out of budget. A friend suggested I try Apache Cassandra for the task and the time series use case was perfect. It's not a perfect database and it was really hard to get my head around the data model and the driver support was scattered. There were a few points where I was ready to just give up and pay Oracle but I stuck with it. Cassandra was the solution that fit my problem, and after a long uphill climb, it worked better than I'd expected.

Continue
Michael Kjellman Michael Kjellman on

So far, I've explained why you shouldn't migrate to C* and the origins and key terms. Now, I'm going to turn my attention to how Cassandra stores data.

Cassandra nodes, clusters, rings


At a very high level, Cassandra operates by dividing all data evenly around a cluster of nodes, which can be visualized as a ring. Nodes generally run on commodity hardware. Each C* node in the cluster is responsible for and assigned a token range (which is essentially a range of hashes defined by a partitioner, which defaults to Murmur3Partitioner in C* v1.2+). By default this hash range is defined with a maximum number of possible hash values ranging from 0 to 2^127-1.

Continue
Unknown author on

Matt Jurik (Software Developer, Hulu) gave an excellent talk at Cassandra Day Silicon Valley about Hulu's migration to Cassandra. The talk features awesome diagrams of Hulu's architecture with a focus on the Hugetop service. Hugetop tracks users' progress in content. Hulu has been able to scale this service to accommodate over 400 million monthly plays. Here are my favorite snapshots from the talk.

Continue
Matt Jurik Matt Jurik on

Hulu users view 400 million videos and 2  billion advertisements each month. Hugetop is the service that allows users to track their progress in video content. The Hulu engineering team switched to a Cassandra-based architecture in the wake of unbounded data growth, MySQL servers that were running out of space, and the horrors of manual resharding.

Continue
Al Tobey Al Tobey on

As we move into the world of big data, systems architectures and data models we've relied on for decades are hindering growth. At the core of the problem is the read-modify-write cycle. In this talk, Al Tobey (Open Source Mechanic, DataStax) explains  how to build systems that don't rely on RMW, with a focus on Cassandra. For those times when RMW is unavoidable, he covers how and when to use Cassandra's lightweight transactions and collections.

Continue
Patrick McFadin Patrick McFadin on

In this talk, Patrick McFadin (Chief Evangelist for Apache Cassandra, DataStax) explains how to work with data throughout the application life cycle. You'll learn how to store objects, index for fast retrieval, and select a data model for Cassandra apps.

Continue
Michael Kjellman Michael Kjellman on

A new class of databases (sometimes referred to as “NoSQL”) has been developed and designed with 18+ years worth of lessons learned from traditional relational databases such as MySQL. Cassandra (and other distributed or “NoSQL” databases) aim to make the “right” tradeoffs to ultimately deliver a database that provides the scalability, redundancy, and performance needed in todays applications. Although MySQL may have performed well for you in the past, new business requirements and/or the need to both scale and improve the reliability of your application might mean that MySQL is no longer the correct fit.

Continue
Tadas Vilkeliskis Tadas Vilkeliskis on

At Chartbeat we are thinking about adding probabilistic counters to our infrastructure, HyperLogLog (HLL) in particular. One of the challenges with something like this is to make it redundant and have somewhat good performance. Since HyperLogLog is a relatively new approach to cardinality approximation there are not many off the shelf solutions, so why not try and implement HLL in Cassandra?

Continue
Evan Chan Evan Chan on

In this talk, Evan Chan, Software Engineer at Ooyala, presents on real-time analytics using Cassandra, Spark & Shark at Ooyala. He offers a review of the Cassandra analytics landscape (Hadoop & HIVE), goes over custom input formats to extract data from Cassandra, and shows how Spark & Shark increase query speed and productivity over standard solutions. This talk was recorded at the DataStax Cassandra South Bay Users meetup at Ooyala.

Continue
Al Tobey Al Tobey on

Two exciting talks on Cassandra and Go in this video! In the first talk, Kyle Kingsbury, who has tested Cassandra's behavior with respect to consistency, isolation, and transactions as part of the Jepsen project to educate users about distributed consensus, shares his surprising test results. In the second talk, Al Tobey, Open Source Mechanic at DataStax presents a brief introduction to Go and Cassandra, explaining how they are a great fit for each other using code samples and a live demo. These talks were recorded at the DataStax Cassandra SF Users meetup at Disqus.

Continue
Patrick McFadin Patrick McFadin on

In this introduction to Cassandra, Patrick McFadin, Chief Evangelist for Apache Cassandra at DataStax, will be presenting on why Cassandra is a key player in database technologies. Both large and small companies alike choose to use Apache Cassandra as their database solution and Patrick will be presenting on why they made this choice. Patrick will also be discussing Cassandra's architecture, including: data modeling, time-series storage and replication strategies, providing a holistic overview of how Cassandra works and the best way to get started. This talk was recorded at the Big Data Gurus meetup at Samsung R&D.

Continue
Tim Moreton Tim Moreton on

"Understanding and Managing Cassandra's Vnodes + Under the Hood: Acunu Analytics" - In this talk, Tim Moreton, Founder and CTO at Acunu Analytics, and Nicolas Favre-Felix, Software Engineer at Acunu Analytics, share the concept, implementation and benefits of virtual nodes in Apache Cassandra 1.2 & 2.0. They also go over why virtual nodes are a replacement to token management, and how to use Acunu Analytics to collect event data, build OLAP-style cubes and ask SQL-like queries via a RESTful API, on top of Cassandra. This talk was recorded at the DataStax Cassandra SF users group meetup.

Continue
Ben Engber Ben Engber on

Ben Engber, CEO and founder of Thumbtack Technology, will discuss how to perform tuned benchmarking across a number of NoSQL solutions. He describes a NoSQL Database Comparison across Couchbase, Aerospike, MongoDB, Cassandra, HBase, and others in a way that does not artificially distort the data in favor of a particular database or storage paradigm. This includes hardware and software configurations, as well as ways of measuring to ensure repeatable results. This talk was recorded at the Scale Warriors of NYC meetup at adMarketplace.

Continue