Understanding the billions of data points we ingest each month is no easy task. Through the development of models that allow us to do so, we’ve noticed some commonalities in the process of converting raw data to real-world understanding. Although you can get pretty good results with simple models and algorithms, digging beyond the obvious abstractions and using more sophisticated methods requires a lot of effort. In school we often learn different techniques and algorithms in isolation, with neatly fitted input sets, and study their properties. In the real world, however, especially the world of location data, we often need to combine these approaches in novel ways in order to yield usable results.
In this article we look at the process of understanding the importance of different locations as they relate to consumers, which takes us from simple joins in Hadoop to a sophisticated time-series algorithm called Viterbi.