Vida Ha (Ifeelgoods) Interview Transcript
PETE: Hey, it's Pete. I'm here in Silicon Valley today and I'm talking to Vida Ha, who's the co-founder of Ifeelgoods. Hey, Vida.
PETE: How are you doing today?
PETE: Are you excited?
PETE: You pumped?
PETE: Good. So in this episode of g33ktalk, we're going to talk to Vita about her technology stack and we have some really interesting things we're going to talk about. Some infrastructure stuff, we're going to talk about how to stop and prevent online fraud, which is pretty cool. We're going to talk about some various data related stuff that you guys are dabbling in and using, which is cool. And yeah, we're just going to generally geek out and have a good time. Are you ready?
VIDA: Sounds great
PETE: Alright. So, Vita, you were previously at Google, right?
PETE: And now your role here is co-founder of Ifeelgoods
VIDA: And Lead Engineer
PETE: Lead Engineer. Okay, cool. So tell us a little bit about Ifeelgoods, what you guys do, just so we have some context to as what your application is and what you guys are doing.
VIDA: Yes. We are a digital goods incentive platform. That's big jargon that basically says we power the redemption of free digital goods. Things like Facebook credits, Skype minutes, E-gift cards, anything that's instantly redeemable, that can be delivered electronically, so that retailers can offer them to consumers in place of discounts.
PETE: Okay. So give me a typical use case, like how would a customer or where would a customer see your service in a purchase flow for e-commerce site, for instance?
VIDA: Right, so they would see banners that we would host on the retailers' side that talks about the offers. So maybe in place of free shipping, you would get $5 worth of free Facebook credits instead. And then at the end of the flow, after the receipt confirmation page, you'd see a widget that's hosted by us. We actually do the whole front end part of it also. You'd Facebook connect, click it, we know who your identity is and then we deposit credits directly into your account.
PETE: Okay. So this could be an incentive, alongside an actual, it could be a hardware that's purchased, or some sort of incentive or digital promotion that is just powered by you guys.
VIDA: Exactly. You get to use the Facebook credits immediately today even if it takes a week for your new shoes or whatever to arrive.
PETE: Okay, okay, cool. Tell me a little bit about your technology stack. What are you guys using and what are you built on?
VIDA: We are using Ruby on Rails as our web development framework on top of Amazon, EC2 services all over the board. We also have to know Facebook APIs, all the things like that.
PETE: Okay. You mentioned before that you guys are using a lot of different parts of the AWS stack, run me through those quickly.
VIDA: There is, EC2, which is the servers that we actually run. There's RDS which is our main SQL style database. We also use Simple DB, which is a more, a more new flavor kind of SQL database. And then there's also S3 for storage. So we use that both for any kind of large data that we want to save for offline processing as well as just images, creatives, like things that go on our website on our redemption flow. And then we use a couple of the other services. There is the load balancing layer that we use automatically with Amazon, Elastic MapReduce.
PETE: Cool. So you're all the way in the cloud. Do you have any servers at all?
PETE: Do you have a backup server somewhere?
VIDA: No. You can back up things on Amazon.
PETE: Alright. A lot of people use it for that, right? S3 is cheap.
VIDA: They triplicate everything, typically. So I'm not worried about it.
PETE: Okay, Cool. Have you ever had an Amazon outage, just out of curiosity?
PETE: No? Alright.
VIDA: We've been lucky enough to not be hit with that problem.
PETE: Cool. Well if Amazon is listening, I'm sure they would like to hear that.
VID 2 - DATA
PETE: Okay, so you're in the Rails shop, you're all in the cloud, you're using a lot of different Amazon services and you mentioned that you had a lot of different sort of data components or data services that you're using. Tell us, for those of our audience who don't know, tell us a little bit about the difference between the AWS Data options.
VIDA: Right. So we use a lot of our data we use to store regular SQL type of storage because it's quick to access, we can query across it, things like that so, you know, we have a concept of a retailer with their campaigns and the offers that they are trying to give out to their customers. We use MySQL or the SQL Data store, to store all that kind of information. But then we also have a really large number of user objects. So not only do we have to deal with scaling - we'll only have a certain number of clients that are giving our offers, fairly small. But the number of consumers that are redeeming these offers are magnitudes larger. So we have a lot more objects in that space. And it needs to be global - global data store so we want to have just one database of users across the whole world rather than partitioning out, partition out some of our enterprise clients on the different servers and things like that. So for our user store, we're actually using Simple DB to store some of the fast user data. And then we're also using offline storage because part of our process as a user, Facebook Connects with us, and gives us their data such as Likes that we can use to predict what are the things, offers they'd be interested in from us and we're also just trying to understand the shopping habits of these users according to their social data. So that data is actually quite large. We could use a blob store in SQL to store it but it makes more sense to just, we don't need it to be served up. It's just purely for offline processing so we actually write it out to S3 to disc.
PETE: Okay. Can you give me a quick use case on both sides of that fence? Like, what's an example of the type of data you are using real time and what's an example of the type of data that you're interested in more batch mode?
VIDA: In real time we maybe just want to know how many friends you have and how many likes you have in aggregate, so I just stores an integer to say, how many friends and how many likes you have. In offline mode, a lot of our retailers are interested in what are the top likes of our users. So, we have to actually go in and store individual users and even to the granularity of what's the most popular in the retail category versus the music category.
VID 3 - SCALABILITY
PETE: So, what about scalability in your architecture? How do you handle that and what are some considerations that you're looking at in terms of making sure your application architecture is set that you can scale, because you guys are, you're not huge right now, right? But obviously the goal is to processing, I'm sure hundreds of thousands are more accost than you are and you keep scaling that up.
VIDA: Right, So, it's also naturally partitioned right now, where our Simple DB store wanted to be global. So, we always want all of our users to be on just one database, to be readily available to us. On the other hand, since we're set up with retailers, we could actually partition, every retailer's data can actually be completely separated and partitioned from the other. So, we can serve traffic for specific clients on specific servers with a different database. So, I'm never worried about one particular database having to handle a humungous load. For our consumer data, I didn't build simple DB so I don't know how it works but there's always things you can do for a consumer, you could separate our their data and modulo it according to certain keys, put all your users at that end with this number on this particular user data. So, there's still ways to partition the user data. My guess is that Simple DB probably already does that and kind of hides that from the developer without having to figure that out by serving it automatically on separate servers without you noticing and labeling the whole thing as just one big service.
PETE: So, you anticipate pushing Simple DB as far as you possibly can, you're just going to keep throwing users at it and if it starts to cry and scream then you'll reconsider. But right now, you think that's probably going to suit your scale.
VIDA: I hope so. I would definitely go with the key value store rather than a SQL store. I think that SQL has a lot of overhead because you need to process queries, where you need to query across it and build indexes across it and all the other stuff. And if you want scalability just being able to store data in different places and partition it out makes a lot of sense. I mean at Google we didn't use SQL at all in our back end servers anyways for speech recognition or even in our web server. So, just not really used to thinking of SQL as the automatic database choice as I think most of the industry does.
PETE: What about the breath of social data that you have at your disposal from, say, a Facebook API, one of the challenges there is that, that data is always changing, right? And Facebook is basically changing the schema behind their API and adding fields and deleting fields and probably merging stuff and whatnot, they're known to do that right?
PETE: Does that give you a lot of headaches and problems?
VIDA: No, because we didn't choose to store the social data in a SQL store. But it is true, I think you wanted to have, we could originally define a user profile and assume that their DB certain fields, we'd always have to be updating schema and things like that. So, Simple DB is really nice because it doesn't assume a strict user schema. You can still label parts of it but you don't have to strictly label it that way and predefine it.
VIDA: So, that allows us to be really flexible.
PETE: So, those are really great compliment to each other. The social data you're getting from a “not always what you'd expect” Facebook API and the schema-less format of Simple DB.
VID 4 - FRAUD PREVENTION
PETE: I think one of the interesting things you guys are doing here is related to fraud prevention. How do you guys stop online fraud?
VIDA: Right. Well, it's a really important issue for us because anytime you're giving out something for free, there's a possibility that fraud is going to creep into the system and just make your whole product kind of worthless to your retailer. So, we looked at some point into using IP addresses, knowing bad IP addresses from users, looking at why browsers the kind of traditional way to fight fraud. And then we realized we had such a rich set of signals from the social data. So, we have from our first campaign from Skype that we know so far, we launched a campaign there, where you could if you like you can get a free Skype code. We found that our fraud raise were better than industry average, basically because we have a social data to tell us whether the user was fraudulent or not.
So, if you think about it, it's really hard for someone to have signed-up for a multiple, it's so easy to sign-up for multiple email accounts. I'm sure we all have several, the one that don't you just give real people, those things. With the Facebook want is it becomes really apparent which account is real and not real because you have more friends on your real account, you update it more often. We get all that data to know how often does someone update their status, what is their profile picture, how many friends do they have, such a rich set of data that even though, our algorithms we started I think they're solid. I don't think we've, by any means, gone to the limits of our social data fraud filtering capability but because of signals are so strong, we're actually able to get a really good signal.
PETE: That's very interesting. Are you using aggregated social data from different social nets or is it mostly Facebook that has the biggest use to you right now?
VIDA: Facebook is the biggest use to us right now because Facebook credits are the most popular incentives that we're offering.
PETE: I see.
VIDA: We have a Twitter product where you can incentivize someone tweeting. We haven't actually used it for social data fraud for fraud filtering right now.
PETE: Got it, okay. So, if I'm an engineer and I'm building an application and I want to stop online fraud, what's the first thing I should do?
VIDA: I think engineers will tend to design an algorithm for it - instead think of things that you can do and natural signals out there because your AI algorithms as crazy and complicated as they are only going to be as good as your signals. If Facebook is actually, Facebook Connect is pretty decent, you can check if an account is verified or not. It needs to be verified by a mobile phone number rather than just an email account or something that's easy, you know, everyone has limited mobile phones. I would actually really even consider policy things. That's my tendency, is to go with the practical.
PETE: One of the interesting things that we talked about before was when you mentioned that the online fraud doesn't always come - It's not always a trickle. It often comes in sort of a blast like one big storm. Explain that phenomenon and what that means.
VIDA: I think what happens, we actually had, so what happened for a while was that there was a number of people who figured out a credit card number that worked on one of our retailer sites that wasn't a real credit card number or maybe it was a stolen credit card number, I don't even know. It just got out. They must have posted about it in a forum or things like that. So, all of a sudden overnight we had tons of fraud that came in one big burst because what happens is one person figures out how to fraud the system. What's the first thing they're going to do? They're going to spread the word to all their friends. Especially since what we do a lot is Facebook credits. So, these are people who play games or used to gifting their friend's things everyday and they gift it with the ability to fraud. So, we'll basically get a big burst of fraud in one night and have to figure out the, figure out the signal that will get rid of this fraud and then push it within 10 minutes. So, I'm going to change to shut this down.
The first thing you want to do is come up with some kind of control or some kind of a velocity control. So, you can realize when there's a potential for fraud to alert yourself for it that it could be happening because you're not realistically monitoring these things every day.
PETE: The velocity of some particular metric or set of metrics that you're kind of watching across time.
VIDA: Right. A particular campaign just seems to be picking up speed and it's just a little bit suspicious. So that's the first piece to code up which is difficult. It's not straight forward how to calculate that kind of thing. And then when you go in to it you can look at the data and pretty quickly come up with some kind of signal. I think that's actually easy part, is coming up with some kind of signal when you look at, as a person when you look at just five or ten of these fraudulent users you can usually find some characteristics pretty quickly that they all have in common.