Digging into the Dirichlet Distribution

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts.  In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents.  The Dirichlet distribution is one of the basic probability distributions for describing this type of data. In this talk, Max Sklar, from Foursquare, takes a closer look at the Dirichlet distribution and it's properties, as well as some of the ways it can be computed efficiently.  This talk was recorded at the NYC Machine Learning meetup at Pivotal Labs.


The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.

Bio: Max Sklar is an engineer and a machine learning specialist. At Foursquare, his continuing objective is to make the app smarter and more interesting. Over the last two years, Max has spearheaded the effort to apply Natural Language Processing technology to Foursquare’s user-generated text corpus. He has spoken at a variety of conferences and meetups in New York’s tech scene, and has been an adjunct instructor for NYU’s data structures course for four semesters. He holds an M.S. in Information Systems from NYU, and a B.S. in Computer Science from Yale, and can be found on Twitter @maxsklar.