Digging into the Dirichlet Distribution

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts.  In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents.  The Dirichlet distribution is one of the basic probability distributions for describing this type of data. In this talk, Max Sklar, from Foursquare, takes a closer look at the Dirichlet distribution and it's properties, as well as some of the ways it can be computed efficiently.  This talk was recorded at the NYC Machine Learning meetup at Pivotal Labs.


The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.

Bio: Max Sklar is an engineer and a machine learning specialist. At Foursquare, his continuing objective is to make the app smarter and more interesting. Over the last two years, Max has spearheaded the effort to apply Natural Language Processing technology to Foursquare’s user-generated text corpus. He has spoken at a variety of conferences and meetups in New York’s tech scene, and has been an adjunct instructor for NYU’s data structures course for four semesters. He holds an M.S. in Information Systems from NYU, and a B.S. in Computer Science from Yale, and can be found on Twitter @maxsklar.

Max Sklar is a Machine Learning Engineer at Foursquare

Foursquare is a small but highly ambitious company that aims to change the way people keep up with friends and discover what's nearby. We have 85 engineers distributed across New York and San Francisco, working to turn nearly 5 billion check-ins into automatic, personalized recommendations that ping your phone. We're not afraid to move fast and break things as we release, launch, iterate, update and announce -- sometimes all in the same day. We're a closely-knit team and, especially at the end of a long day over beers, we feel like we're inventing the future together.