At MediaMath, we’re big users of Elastic MapReduce (EMR). EMR’s incredible flexibility makes it a great fit for our data analytics team, which processes TBs of data each day to provide insights to our clients, to better understand our own business, and to power the various product back-ends that make Terminal 1 the “marketing operating system” that it is.Continue
In this talk on Machine Learning Distributed GBM, Earl Hathaway, resident Data Scientist at 0xdata, talks about distributed GBM, one of the most popular machine learning algorithms used in data mining competitions. He will discuss where distributed GBM is applicable, and review recent KDD & Kaggle uses of machine learning and distributed GBM. Also, Cliff Click, CTO of 0xdata, will talk about implementation and design choices of a Distributed GBM. This talk was recorded at the SF Data Mining meetup at Trulia.Continue
In this talk, "How RethinkDB Works," Joe Doliner, Lead Engineer at RethinkDB will discuss the value of RethinkDB's flexible schemas, ease of use, and how to scale a RethinkDB cluster from one to many nodes. He will also talk about how RethinkDB fits into the CAP theorem, and its persistence semantics. Finally, Joe will give a live demo, showing how to load and analyze data, how to scale out the cluster to achieve higher performance, and even destroy a node and show how RethinkDB handles failure. This talk was recorded at the SF Data Engineering meetup at StumbleUpon Offices.Continue
- Part 1: The Top 5 Metrics to Watch in MongoDB
- Part 2: Setting Up Actionable Alerts and Procedures in MMS
When releasing software, most teams focus on correctness, and rightly so. But great teams also QA their code for performance. MMS Monitoring can also be used to quantify the effect of code changes on your MongoDB database. Our staging environment is an exact mirror of our production environment, so we can test code in staging to reveal performance issues that are not evident in development. We take code changes to staging, where we pull data from MMS to determine if feature X will impact performance. Continue
In this talk, Abhijit Lele from Hortonworks, discusses YARN architecture and how to get started developing for the next generation of Hadoop. This talk was recorded at the New York Hadoop User Group meetup at Gilt.Continue
In this talk Manager of Data Platform Architecture Jeff Magnusson from Netflix discusses Lipstick, a tool that visualizes and monitors the progress and performance of Apache Pig scripts. This talk was recorded at Big Data Gurus meetup at Samsung R&D. Comments are available here.