Small 928322 Chris Johnson on


This awesome talk by Chris Johnson and Edward Newett, machine learning engineers at Spotify, shows how they imagined, tested, iterated and built the highly-popular "Discover Weekly" feature of Spotify from start to finish.

Learn how product-oriented engineers think in this talk and the tradeoffs they make as they're looking for ways to rapidly test ideas and iterate. Of course, this product was built on music recommendations, so you'll also get to see how they thought through the process to figure out exactly how to generate meaningful recommendations for their millions of users.

Dataenconfnyc2016 logos3

This talk was a talk recorded at the DataEngConf 2015 event in NYC.

Small 237b51c Yael Elmatad on

Many data scientists work within the realm of machine learning, and their problems are often addressable with techniques such as classifiers and recommendation engines. However, at Tapad, they have often had to look outside the standard machine learning toolkit to find inspiration from more traditional engineering algorithms. This has enabled them to solve a scaling problem with their Device Graph’s connected component, as well as maintaining time-consistency in cluster identification week over week.

In this talk Yael Elmatad, Data Scientist at Tapad, will discuss two algorithms they use frequently for these problems, namely the Hash-to-Min connected component algorithm and the Stable Marriage algorithm.

Dataenconfnyc2016 logos3


Small 20e3319227b03b50ec589c29a1e7fd25 400x400 Fabrizio Milo on

Tensorflow is one of the fastest growing open source deep learning frameworks available today. Tensorflow was developed internally by Google and released open source in November 2015.

Although it is mainly known to be applied to model deep learning architectures, Tensorflow's flexible interface makes it a good candidate for production level data-science pipelines as well.

In this talk, you will learn about the fundamentals of distributing Tensorflow models over multiple computers.

Fabrizio Milo is a deep learning architect and early TensorFlow contributor @

Small a5sz6tzp 400x400 Sam Abrahams on

Machine learning, especially deep learning, is becoming more and more important to integrate into day-to-day business infrastructure across all industries. TensorFlow, open-sourced by Google in 2015, has become one of the more popular modern deep learning frameworks available today, promising to bridge the gap between the development of new models and their deployment.

This talk is an in-depth workshop on the fundamentals of the TensorFlow framework. It aims to prepare listeners to have a firm grasp of the core TensorFlow classes and workflow enabling better comprehension of deep learning models and tutorials built in TensorFlow.

Sam Abrahams is a freelance data scientist and engineer. He is a long-time contributor to the TensorFlow repository and a co-author on the upcoming book TensorFlow for Machine Intelligence.

Small 983944 Peadar Coyle on

I've been working with Machine Learning models both in academic and industrial settings for a few years now. I've recently been watching the excellent Scalable ML from Mikio Braun, this is to learn some more about Scala and Spark.

His video series talks about the practicalities of 'big data' and so made me think what I wish I knew earlier about Machine Learning

  1. Getting models into production is a lot more than just micro services 

  2.  Feature selection and feature extraction are really hard to learn from a book

  3. The evaluation phase is really important

I'll take each in turn.


Small 2 ptzerp Ryan Adams on

Ryan Adams is a machine learning researcher at Twitter and a professor of computer science at Harvard. He co-founded Whetlab, a machine learning startup that was acquired by Twitter in 2015. He co-hosts the Talking Machines podcast.

A big part of machine learning is optimization of continuous functions. Whether for deep neural networks, structured prediction or variational inference, machine learners spend a lot of time taking gradients and verifying them. It turns out, however, that computers are good at doing this kind of calculus automatically, and automatic differentiation tools are becoming more mainstream and easier to use. In his talk, Adams will give an overview of automatic differentiation, with a particular focus on Autograd. I will also give several vignettes about using Autograd to learn hyperparameters in neural networks, perform variational inference, and design new organic molecules.

This talk is from the SF Data Science meetup in June 2016.

Small 05f417e Ben Packer on

With the world’s largest residential energy dataset at their fingertips, Opower is uniquely situated to use Machine Learning to tackle problems in demand-side management. Their communication platform, which reaches millions of energy customers, allows them to build those solutions into their products and make a measurable impact on energy efficiency, customer satisfaction and cost to utilities.

In this talk, Opower surveys several Machine Learning projects that they’ve been working on. These projects vary from predicting customer propensity to clustering load curves for behavioral segmentation, and leverage supervised and unsupervised techniques.

Ben Packer is the Principal Data Scientist at Opower. Ben earned a bachelor's degree in Cognitive Science and a master's degree in Computer Science at the University of Pennsylvania. He then spent half a year living in a cookie factory before coming out to the West Coast, where he did his Ph.D. in Machine Learning and Artificial Intelligence at Stanford.

Justine Kunz is a Data Scientist at Opower. She recently completed her master’s degree in Computer Science at the University of Michigan with a concentration in Big Data and Machine Learning. Now she works on turning ideas into products from the initial Machine Learning research to the production pipeline.

This talk is from the Data Science for Sustainability meetup in June 2016.

Small tucker f15crop 400x400 Tucker Balch on

Quantitative trading strategy creation is a unique intellectual undertaking that draws on human insight, proprietary data, and nearly all aspects of computer science.

The goal in this presentation is to take you on a creative journey from the seed of a trading idea to a live traded systematic strategy. Finding real value in systematic trading requires constant attention to detail to avoid the trap of overfitting. Forex is perhaps the deepest most liquid market on Earth, but it offers new challenges to the trader with equity experience.

This talk will carry you through our process in taking on this new market, from identifying relevant data, dealing with a 24 hour trading cycle, to algorithm testing and validation.

This talk, by Lucena Research, was given at the SF Bay Area Machine Learning meetup in May, 2016.

Small 1545224 10152105765716192 1764874921 n Pete Soderling on

Dmitry Storcheus is an Engineer at Google Research NY, where he does scientific work on novel machine learning algorithms. Dmitry has a Masters of Science in Mathematics from the Courant Institute and despite his very young age he is already an internationally recognized scientist in his field of expertise.  He has published in a top peer-reviewed machine learning journal JMLR and spoken at an international conference NIPS. Dmitry Storcheus got peer recognition for his foundational research contribution published in his paper “Foundations of Coupled Nonlinear Dimensionality Reduction”, which has been cited by scientists and engineers. He is a full member of reputable international academic associations: Sigma Xi, New York Academy of Sciences and American Mathematical Society. This year Dmitry is also a primary chair of the NIPS workshop “Feature Extraction: Modern Questions and Challenges”.



- Hi Dima, you were recently invited to give a talk at DataEngConf in NYC and we are very excited to hear about your novel machine learning research.  You have a pretty unique situation where you joined Google Research right after your Masters, you are a very young scientist, working together with top notch professors and Phds. Tell me about your path overall. How did you manage to get in?

- Let me first talk about my path. I studied at a Russian college called ICEF, and then I came for graduate studies to the USA, where I did my masters in math at the Courant Institute. I started machine learning research very early - back in Russia I used machine learning to forecast financial time series and in the USA I continued machine learning studies on a theoretical level. Straight after graduation I was hired by Google Research in New York to work on machine learning algorithms. I think that they key to my employment is that my unique skills and strong technical background was recognized by Google. Also, Google Research recognized my foundational scientific work that I had already done and sound machine learning algorithms that I had developed. What I did is I derived generalization bounds for coupled dimensionality reduction and created an algorithm called SKPCA.   

- So why did you choose Google over any other company?

- 2 reasons. First, a job at Google naturally followed after my research work at my graduate school as I could apply my research directly, and I wanted it to benefit the Machine Learning community. The second reason is that I think that Google is a company that values potential in people. It selects people based on their projected individual growth, and that appealed to me as a young professional because it guaranteed continued growth and mentorship by best scientists in the industry.


Small erik Erik Bernhardsson on

Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik Bernhardsson developed a library called "Annoy" that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.

Dataenconfnyc2016 logos3

Speaker Bio:

Erik Bernhardsson is the CTO at Better, a startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations. He is also the creator of Luigi and Annoy.

Join Us