Transcript: Machine Learning with Large Networks of People and Places

(Original post with audio and slides is here )

Blake Shaw:  Thank you all for coming. As was mentioned, my name is Blake, and today I’m going to be talking about machine learning with large networks of people and places. So, here at Foursquare, we think there’s a great opportunity to leverage massive amounts of location data to help people better understand and connect with places all over the world. So, just in case you don’t know, I’m sure most of you guys already do, Foursquare is a couple things.  First of all, it’s an app that helps you explore your city and connect with friends.  And it’s also a platform for people to share and connect with different kinds of the data and location services.  So, people use Foursquare for a variety of things. They’ll show up at a place. They’ll plug-in their phone, and they’ll check-in to, you know, share their location with friends or to get more information about the place they’re at.  There’s a lot of reasons people will check-in.  They’ll get tips and deals.  People earn points and badges. They like to keep track of their visits.  Most importantly, they want to discover new places, and they want to share those experiences with friends.

So what’s interesting part of the industry. It’s sort of at the intersection of these three different parts of mobile, social, and local. And Foursquare generates just a ton of data.  We’re now at 20,000,000 people in our database over 2,000,000,000 check-ins, around 5,000,000 check-ins per day, and our API generates over 1,500 actions every second. I think it’s around 80 checks per second on average.

And so, we sort of feel that this data offers an unprecedented view into the behavior of millions of people as they move around cities worldwide. And so, to give you a sort of a sense of what our data looks like, I just want to share with you this great digitalization.

So, here we see a week’s worth of check-in activity worldwide. Whenever a person checks Foursquare, their location is plotted on this map as a colored dot. And the color corresponds to the category or place they’re checking in to. So, you know, light blue is travel spots. Green is food. Blue is night life. And what I find particularly striking about this visualization is how that you see the sort of habitual patterns in human behavior. So, you can see like as nighttime rolls around, it rolls across the globe, the lights become dim. It becomes more blue as people start checking in to more night life spots. And also you can see towards the end of the week, certain places become more popular. For instance, people check-in to more shopping places or great outdoors places.

So, you can clearly see from this visualization that this data is sort of it pulls out these sort of aggregate daily rhythms in people’s lives, but I also think that this data is sort of the key to building a unique recommendation engine that really understands not only what people want to do in the physical world, but when they want to do it. And so, this is just a brief outline of what I’m going to talk about today.

I’m going to continue to give you a little introduction to Foursquare data and the kinds of things that we like to build out of check-ins and how we apply machine learning to do that.  And then I’m going to focus on sort of two data structures, the place graph and the social graph.  You might have heard the concept of the social graph before. It’s the set of all people and all the connections between those people.

And so we think there’s a similar graph of places, for all the places and how they’re all connected to each other. And then I’m going to talk about how we sort of leverage all this data and these machine learning algorithms to build a unique recommendation engine, which we call Explore. And then I’ll conclude.

Just to get us started, I really like to show this live at the beginning because it really, I think, gives a good feel for the kinds of things you can build out of. So, I think the check-ins give us an amazing picture of what places actually are in the real world.  Can anyone guess what these two places are?  These are just pictures of check-ins plotted on a map, and the map has been hidden.  Any guesses about what New York places these are?

Blake Shaw:  Well, Midtown, anyone else?

Male Audience Member:  Central Park.

Blake Shaw:  Central Park, yes, that’s Central Park on the left.  On the right?  This is JFK airport. And so just from people using Foursquare, just pulling out their phones, saying they’re at JFK, we get this latitude and longitude, this signal about where they actually were when they said they were at JFK.  And see that it actually traces out all the terminals. It traces out the runways.  It traces out Airtran. And similarly for Central Park, you can see that there’s all sorts of different areas of Central Park.  People call all these different areas of Central Park.  And we didn’t have to have a mapmaker come and say exactly what this is. This just came naturally out of people using Foursquare to check-in to JFK and to Central Park.  So, if you think about it, these check-ins, these millions of check-ins, actually form a precise definition of what it means to be at JFK.

So, this data not only gives us a picture of places in space, but it also gives us a unique picture of places in time. So, consider these three places. The blue line, and this is showing when places are busy during the week. The blue line is Gorilla Coffee.  It’s a nice coffee place.  The red line is showing Gray’s Papaya.  I’m sure most New Yorkers are familiar with that. And the green line is Amarone, which is like a nice restaurant.  So, you can see there’s this unique signature for these different types of places based on when they’re busy.

So, like on a weekday morning, coffee places are busy, and in the evenings, it’s busier at the restaurant. You can see that Amarone’s serves brunch on Sunday, right?  So, they have this strange double peak. There’s all sorts of fun stuff.  So, Gray’s Papaya is busy in the middle for the lunch, but it also has these funny peaks like late night on Friday and Saturday as people go out drinking and looking for food.

So, you can really tell that these time signatures for places that #[06:33] check-ins sort of really define what these places are.  And this is a really unique data set that we can use to build into our recommendation end. There’s more examples. Certain kinds of places that are just so much more popular on the weekend.  Idioms, meat markets, and some restaurants, pool halls, et cetera.  These are simple statistics that we want to build and into Explore.

It turns out that check-ins are also correlated with the weather amongst other external factors.  So, here on this plot, we can see in blue the temperature for New York City over the last year, and in green we can see the number of check-ins at ice cream shops.  As you can see that not only is this interesting, seasonal trends in the data, but these strong spikes in temperature actually trigger strong spikes in ice cream consumption in New York.  I bet there’s lots of different kinds of places that are correlated with weather activity. For instance, when it’s warm out, I guess this is kind of intuitive, but it’s amazing that this kind of stuff just pops out of our data, right?  When it’s warm, people like to go to ice cream shops and roof decks and harbors. And when it’s cold, people like to go to noodle shops and art galleries.

Here’s another interesting example of how we sort of pull out interesting insights from our Foursquare data. So, on this map we take every single check-in. We’ve got 2,000,000,000 of these things.  We take all the shouts that are associated with them, all the texts, and people can express a lot of interesting things. This is a social network, so they say all sorts of different stuff.  We can actually measure the sense then of the words. So, ranging from blue which is awesome, happy, okay, to orange which is blah, sucks, WTF, we can actually measure sort of where that sentiment came from in the city. I think you can make an argument that people in Manhattan are happier than people in Brooklyn when looking at this map. I’m not as familiar with Hong Kong or London, but it’s interesting to see that there’s strange regional areas, which have different kinds of sentiment than other areas.

Okay, so, that’s just a nice little mish mash of some of the interesting data facts.  But I’d really like to focus on sort of an interesting machine learning problem. So, there’s this network that we observed here.  So, it’s the fourth #[09:02] places that we know about in the world and all the possible ways that they’re interconnected. You think about this like a graph. The nodes are the places, and the edges are all these different kinds of connections. There’s lots of different ways to connect places.  Places are connected if there’s a lot of flow between places. People often go from one place to another.  Similarly, if a lot of people visit two places, there is a lot of co-visitation. These edges could measure that sort of connection. Categories, menus, tips, and shouts, all this data in a way actually connects places, right?  If both places serve burgers, you consider there’s a connection between these two places.

So, I want to talk a little bit about what we can do now that we’ve sort of built this network and we think about like that.  So, here’s a visualization of the flow network.  So, this is like pretty much a couple years worth of activity compressed into one day.  And again, as now instead of showing like a single dot in a check-in, I’m showing you a dot moving between two places when that flow existed and this is looping through one day continuously.  And again, the check-ins are color coded by their color where blue is professional and work and shops and services, yellow is food, red is residential.

So, just from observing people moving around a city, you can see that it actually, naturally forms this network.  Look at these different hubs of the city where people gather and disperse to many different areas.  Again, you can see this sort of habitual pattern in people’s behaviors as people wake up in the morning.  Go to work.  Eat their lunch.  Go back to work.  And then you can see it change to be a night time kind of thing.  We think there’s an incredible amount of value in understanding the dynamics of this network, about how people move between places.

Here’s a sort of static picture of that same flow network.  Now I’ve actually represented each place as a node, the size proportional to how many places it connects to and the edges are thicker if there’s a lot more flow between those places.  Even without showing you a map here, you can really tell this is Manhattan.  You can pick out interesting spots like Central Park, LaGuardia Airport, different areas in I guess Williamsburg, Brooklyn, and downtown Brooklyn.

So, one of the first things you can do after you build this network is you can just generate very simple statistics to understand very basic things.  Like, okay, well after people go the Museum of Modern Art, where do they go? Will they go to the Design Store? They go to the Modern.  They go to all these other places that are related to the MoMA. Similarly, after people go to the Statue of Liberty, they go to Ellis Island.  They go to the 911 Memorial.  They go to the Empire State Building.  You can use this network to predict where people will go next.  So, for instance, if we roll these things up by categories, we can see that after people check-in to bars, they’re much more likely to check-in to American restaurants, night clubs, pubs, and cafes.  After people check-in to coffee shops, they’re much more likely to go and check-in at the office or a café or a grocery store.

So, if you look at this visualization here on the left, this is showing sort of this probability matrix.  This is a sort of a simple machine learning model which says given you’re at some category over here, what’s the probability that you’re going to check-in at another category.  But these are our 400 categories, and the intensity of each dot #[12:49] says what that probability is.  You can see that I’ve actually arranged these categories by applying a very simple clustering algorithm so that you can actually visualize these patterns along the diagonal.

So, you can see, barely, that there are these block structure on the diagonals.  That means that there exists these groups of places where people are much more likely to check-in to places inside that group.  And they correspond to very intuitive #[13:17].  Once you check-into a cultural landmark, #[13:21] like lots of airport related places, lots of places where they do college, the night life, they have this behavior of once you get stuck in that area of the network, you’re much more likely to stay in it.  Also note that there’s these strong vertical bands are for home and for the office, which are very common.

So, there’s another interesting network problem here. Now, instead of thinking about this in terms of places, if we think about it also in terms of  people, this is sort of a bi-parteid network.  Where we’re trying to look, you know—given examples of people connecting to places, can we find new places that people will like? So, this is sort of a classic machine learning problem.  It’s called collaborative filtering.  The idea is to use sort of these different sort of styles of collaborative filtering to predict new places you’ll like.

So, one of them is item-by-item similarity. You’re probably familiar with this if you’ve used Amazon.com. You notice, you go on the website and they say people who bought this also bought that. It’s a very classic idea for doing collaborative filtering, and we use it extensively here.  So, the idea is to we know some of the places that you like and the places you’ve been to.  We’ll find other places that are similar, and we’ll recommend those.

So, the other sort of classic collaborative filtering paradigm is user to user similarity. In this case, we know people who are like you. We know who your friends are. We can find the places that they like, and we can use those for recommendations for a user.

And then finally, there’s techniques based on low rank matrix factorization.  You can think about these as trying to come up with sort of these latent features to describe both people and places.  And using those to calculate these distances, and things that are close in this latent space are good recommendations.

Actually, there’s lots of different pros and cons for these different kinds of methods. So, one of the pros for item-by-item similarity is that you can easily update this data for a new user.  A new user comes onboard, and they like a couple places. We can easily find the places that are similar and present those as recommendations. Yeah, so for example, like in our system we have something very similar to this where people who like Joe’s Pizza will also like Lombardi’s.

Now unfortunately, this isn’t one of the most performant collaborative filtering models, but that’s we sort of combine it with ideas from user to user similarity.  So, unlike a lot of other collaborative filtering methods out there, for our user to user similarity we rely heavily on this social graph. We believe that the items that one’s friends like are good predictions about things that you might like as well. All right. So, that was the place graph.

Now I’d like to talk about the social graph.  So, Foursquare’s a social network.  We have over 20,000,000 people and they’re all connected via different kinds of interactions.  So, obviously, people can be friends with each other.  They can follow each other.  So, some of the nodes in this network are celebrities or brands and we allow one way connection in that case, which is like follow.  We also have an interesting signal which is like a dun, also known as a like, where people will suggest things to do like a tip and we can follow those tips.  So, people get another type of interaction type.  Also, comments, people will leave comments on each others’ check-ins and they’ll communicate.  So, the frequency of their communication also defines edges in this network.

Then finally, unlike many other social networks, we have a very powerful signal telling us about connections between people.  So, Foursquare doesn’t exist purely online, unlike many other social networks.  We believe that people form connections to each other by being in the same physical space.  And we call this connection or this interaction type co-location.  We think it’s a really powerful social signal.
So, to talk about the social graph, I want us to sort of come at it from a different perspective.  I want to come at it from the perspective of what happens when a new place opens in a city that you like.  So, when a new coffee shop opens in the East Village, what does the social graph look like from that perspective?  This is a coffee place called La Colombe.  It’s only a few blocks away.  It’s actually right next to the old Foursquare office.  They make really great coffee.  So, of course, when we were working nearby, the minute this coffee shop opened, right around the corner, pretty much the entire office was sort of swarmed upon this place.  We’re all obsessed with new places and coffee and pretty much we have every single coffee shop and bar near the office meticulously tracked.

So, when people started to come to this place I started to wonder, what does it look like from the perspective of the place seeing all these new people come to their business.  In terms of the individual people, but what does the network look like?  What does the network look like of people who go to La Colombe?

So, this is sort of one view of what happens when a coffee shop opens. You can see that the number of check-ins per day is rising. So, every day, as you can see along the X axis, go by, you can see the number of check-ins go up. The coffee shop is doing really well.  You can see that it’s growing like crazy and taking off.  So, this is sort of one way of looking at what happens when a coffee shop opens.  But a place isn’t just interacting with individuals one at a time.  It’s actually interacting with this entity, this social graph.

So, consider trying to quantify all the different social aspects of every single one of those check-ins.  So, on the right, like before, we see number of check-ins growing over time. Now, on the left, we’re seeing a visualization of the social graph, right?  So, now as every single new person comes to La Colombe, they’re represented as a row and as a column. A dot in this matrix here represents a friendship between those people.  So you can see that out of the first 100 or so people that went to La Colombe, that it formed a very tight social network.  Then as time goes on, you can see that they started to attract different people.  People who are outside of this core group of people who first discovered it.

So, the plot in the middle actually, is trying to quantify this of sort of how quickly is this place spreading on this network versus just people coming at random who have no connection.  I think it’s sort of like a measure of bi-realities. It’s like the average number of people who’ve been to this place who are friends with a person who just checked in.  People often talk about memes, these things that spread on the internet.  I think that there is an analogy here, right?  As a new place opens, it spreads amongst this social graph as people discover it.  They check-in.  Their friends see those check-ins and you can really see how places spread like a meme on the internet.  Almost like a virus.

I think that it’s sort of a different perspective for a merchant, right? You can imagine, back in the day, all a merchant really needed to do was to make sure that people were spatially exposed to their shop.  But now we’re sort of revealing an entire other dimension, this social exposure.  And I think that it’s going to become very important for merchants to understand how their businesses are sort of being socially exposed to the people who visit.

That’s it. I still feel like we haven’t quite revealed the real underlying dynamics of this system.  So, I showed you a social graph as sort of a matrix, but this is like a very prime opportunity to use a machine learning algorithm to visualize this network. So, here’s another view of what’s going on when people start checking in to La Colombe.

So, this is the social network, right?  These are the 10,000 or so people that have visited La Colombe.  Each person is represented as a node and each friendship is represented as a link.  The size of the node represents how many friends they have, sort of a measure of influence.  I believe Dennis is this big one right here, this gigantic #[22:36], Dennis Crowley the founder of Foursquare. He knows pretty much like 5% of the people who have been to La Colombe.

Anyway, so the way I made this visualization is by applying a machine learning algorithm called minimum volume imbedding, which basically says take this group of 10,000 nodes and all the connections and try to find a way to lay it out in 2-D space, such that people who are close to each other in this space are more likely to be friends or more likely to be close in that network.  So, we can sort of see what happens as every new person checks in to La Colombe.

So, each check-in that happens is going to take a node.  It’s going to highlight it.  It’s going to go split from gray to light blue. And it’s going to be highlighted orange. And you can see how the different areas of this network are going to become sort of infected with this idea of La Colombe.

It’s going to spread from this sort of lower right hand corner, lower left hand corner and eventually #[23:37] blur the whole network.  I’m not going to wait for it to finish, but I think that it’s very interesting to think about this idea of a social network that’s sort of like a substrate, which an item can spread on.

I’ll take some questions, yeah. Yes?

Audience Member speaks.

Blake Shaw:  So, this actual is this object really exists in more than 2 dimensions.  It probably exists in like 5 or 10 dimensions. We picked 2 here, but you could sort of imagine like sort of spikes like this going into the page and out of the page and into a fourth dimension and a fifth dimension. Generally, high dimensional networks look sort of like this when you plot them.  With that said, you can see some really interesting structure like this is definitely like a clique of people here who are much more connected to each other than they are to the rest of the group.  Similarly, this is a small clique. You can get a sense for what the distribution of node sizes are.

Audience Member speaks.

Blake Shaw:  Sure. Actually this was made by an algorithm called structure preserving embedding.

Audience Member speaks

Blake Shaw:  Sure. So, unlike many graph embedding techniques, which are spring based or force based, this is actually done via a semi-definite optimization. It’s trying to sort of match the distances in this two dimensional space to distances that are computed along that graph. It doesn’t really use any forces.

Audience Member speaks.

Blake Shaw:  Okay. I’m going to move on a little bit, but I’d be happy to take more questions Sue. So, you’re probably asking yourselves what is this low dimensional structure mean?  It’s not absolutely obvious, but I think that the—it has something to do with this notion of homophyly.
So, homophyly is the concept that similar things bond together. And we observe a really strong homophyly of many different types in the Foursquare social network based on demographics, location, interests, et cetera.  And so, I think that­-oops, sorry. I think that location demographics are some kinds of homophyly that we can measure and understand. And I think that the sort of low dimensional representation of a social graph represents sort of another kind of homophyly, another kind of way that people are sort of inherently bonded together by their actions in the real world and by the friendships that they’ve made.

So, this is one example of homophyly that we can and do measure precisely.  So, this is the idea that people who are of the same age are more likely to become friends.  And you can see this really interesting pattern.  So, if you take for every person, you figure out what their age is and you look at the ages of all their friends, you can make this nice. It’s not quite a histogram, but it’s a nice plot.  So, people who are of ages 15 to 20 are much more likely to friend people of that same age.  But as you get older, you’re more likely to have friends of many different kinds of ages.  This sort of shows how this idea of age homophyly changes as people get older.

Yes?

Audience Member speaks.

Blake Shaw:  Yeah, totally.  So, the question is, do we ever look at the differences between people who are explicitly friends to each other and people who are co-locating together a lot.  Yes, absolutely. That is a very interesting signal I think a lot of people don’t really necessarily act their age or act their demographics or some people are very different from their friends or very similar to their friends. That’s why we’re very careful when we build recommendations from friendships to make sure that we understand whether or not they’re friends who share similar tastes as well. Great question.

So, there’s another aspect to this social graph that I want to talk to next. It’s this idea of influence.  So, I think that we have a very unique social graph in the sense in that people are constantly in Foursquare influencing people that do other things.  And actually, to not only do stuff online, but actually go out in the real world and do things there.  So, to sort of look at this problem, we use our tip network.  So, just to familiarize you with it, tips are these short phrases on Foursquare.  It can be something like try the burger or ask for Bob at the bar or whatever.  They’re these short phrases or tips that people leave for each other that are like little mini recommendations.  And people that when they do these tips they can make lists of them and mark them as done.

So, given this network of 2,500,000 people doing the tips from other people and from brands, we can sort of figure out who is influential in our network.  Oh, just if you’re interested different staff about this actual network.

So, the question is, how can we find these authoritative people.  The answer is we use something very similar to page rank.  The idea is that people who are authoritative will then link to more authoritative people.

So, the most influential brands, in case you were curious, the History Channel, Bravo TV, National Post, TV.com, MTV, et cetera.  And the most influential users Lockhart #[28:23], Venus, Co-Founder.  Yeah, so according to page rank, these are the people who have the most influence, meaning they have the most links from influential people.

So, I sort of gave you a sense of these networks of people, these networks of places, but now I want to talk about how it ties into our recommendation prop #[ 0:28:46.1].  Here’s a picture of Explore.  So, Explore’s a recommendation engine built from our social graph, from our place graph, and from billions of check-ins.  So, we really want people to be able to open Explore anywhere in the world and get a great recommendation about where to go, whether they’re interested in food or coffee or drinks or anything.  So, Foursquare Explore is going to take a variety of these signals into account to provide sort of a real-time rec.  So, it’s going to take your location, your time of day, your check-in history, your friends’ preferences, and everything I showed you about sort of the similarities and connections between venues.

So, the idea is to sort of take all of these signals and to put them all together and at a moment’s notice, at less than 200 milliseconds, send these recommendations down to your mobile device.  So, this real-time recommendation should match the user’s intent.  It should be relevant to their contacts and should be curated by their social network.

None of this would be possible without our great data stack, MongoDB in particular.  We also run on the Amazon S3 cloud. We rely on Hadoop and Hive and Ink for a lot of our data analysis and machine loading. We use Flume for data collection. We use #[31:11] and NetLab for analyses.

So, there’s a variety of open questions I like to sort of— all right, go ahead.

Audience Member speaks

Blake Shaw:  More.

Audience Member speaks

Blake Shaw:  Open questions. So, this is the thing that we are constantly thinking about is what do check-ins reveal about the behavior of people and places and cities.  Given these sort of networks, how can we predict new connections?  People who connect with each other, places that will connect with each other and people who connect with places, etcetera.  We talk a little bit about how we measure influence, but it’s still very much an open question.

Then finally, we have one question that we think about a lot is can we infer a real world social network? Unlike a lot of social networks on the web where people connect with each other, possibly anywhere in the world and connect over all sorts of things such as sharing funny cat videos and all sorts of other aspects of online life. There’s a very interesting real world social network that’s sort of hidden, right? That we haven’t really been able to measure yet because there weren’t any logs for people actually moving around in the real world until now.

So, we think that by looking at this amazing location data, we can really understand how people interact and form real world social networks.  And we think that this a very powerful idea, but we’ve only really begun understand and ask the right questions about sort of how to build this thing.  So if you have any ideas about how to solve these questions, please, find me after the talk.

So, just to conclude, I really want to stress that this is a very unique data set created by millions of people interacting with each other and interacting with places all over the world.  And also, that we’re operating at a really massive scale.  Today I’m talking about millions of places and millions of people, but there’s now over a billion devices in the world that are just constantly emitting this signal of latitudes, longitudes and time stamps.  So, the opportunity to take advantage of that massive amount of data is just sitting right in front of us.

Thank you very much.