Machine Learning Distributed GBM by Earl Hathaway & Cliff Click

In this talk on Machine Learning Distributed GBM, Earl Hathaway, resident Data Scientist at 0xdata, talks about distributed GBM, one of the most popular machine learning algorithms used in data mining competitions. He will discuss where distributed GBM is applicable, and review recent KDD & Kaggle uses of machine learning and distributed GBM. Also, Cliff Click, CTO of 0xdata, will talk about implementation and design choices of a Distributed GBM. This talk was recorded at the SF Data Mining meetup at Trulia.


Most of us use Distributed GBM by way of its R implementation. However, the folks at 0xData have recently wrote a distributed version for their open source h2o platform, and in the process become quite the experts on the algorithm as this video shows!

Explanation of the Gradient boosting (GBM) from Wikipedia Gradient boosting is a machine learning technique which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in stages like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function.

GCM History The method was invented by Jerome H. Friedman in 1999 and was published in a series of two papers, the first of which introduced the method, and the second one described an important tweak to the algorithm, which improves its accuracy and performance.

Get updates of upcoming tech talks and presentations If you'd like to be notified when we post new tech talks, developer presentations and opensource updates, you can subscribe to our newsletter, or YouTube channel.

Want to hear from more top engineers?

Our weekly email contains the best software development content and interviews with top CTOs. Enter your email address now to stay in the loop.