Small aaeaaqaaaaaaaamcaaaajgy0yjhimti1ltg1m2ytndq2os04zddjlwniogy2zte2ymzhnq Ajay Sharma on

I'm a data scientist from SF who relocated to NYC this spring. I prudently spent the prior 8 months scoping & planning, making sure there was a healthy appetite for data scientists in the region. But when I got here it didn't seem like I was getting the responses to my outreach I had anticipated ...

Why is No One Getting Back to Me?

I was a little skeptical the slow-start was attributed to just my own performance and the typical nature of a job search. From what I could tell, the pool of actively open jobs was quite shallow. Eagerly searching for an explanation, I decided to plot the number of data scientist job postings from this year and last year.

The data is from Gary's Guide which does an excellent job of curating tech job postings in NYC ('Data Scientist' used for the search term). This isn't indicative of all the jobs in NYC and is quite biased given the curation but I'd imagine there would be a similar trend for all data science jobs in NYC and insightful from seasonality perspective at the minimum.

What the Data Shows

Looking at hiring trends from last year, there's two peaks: the lion's share of hiring done in the spring, a lull in late summer/early fall, and another upswing just before the holidays -- which is typical seasonality.



Small 928322 Chris Johnson on


This awesome talk by Chris Johnson and Edward Newett, machine learning engineers at Spotify, shows how they imagined, tested, iterated and built the highly-popular "Discover Weekly" feature of Spotify from start to finish.

Learn how product-oriented engineers think in this talk and the tradeoffs they make as they're looking for ways to rapidly test ideas and iterate. Of course, this product was built on music recommendations, so you'll also get to see how they thought through the process to figure out exactly how to generate meaningful recommendations for their millions of users.

Dataenconfnyc2016 logos4

This talk was a talk recorded at the DataEngConf 2015 event in NYC.

Small aaeaaqaaaaaaaalzaaaajdcxmdnhotazlwezodqtngm4mc1iyzfmltdkmzrjztzlywy3mg Daniel Blazevski on

Dan Blazevski from Insight Data Science presents some recent progress on Apache Flink's machine learning library, focusing on a new implementation of the k-nearest neighbors (knn) algorithm for Flink.

In the spirit of the Kappa Architecture, Apache Flink is a distributed batch and stream processing tool that treats batch as a special case of stream processing. Dan discusses a few ways, both exact and approximate, to do distributed knn queries, focusing on using quadtrees to spatially partition the training set and using z-value based hashing to reduce dimensionality.


Small 237b51c Yael Elmatad on

Many data scientists work within the realm of machine learning, and their problems are often addressable with techniques such as classifiers and recommendation engines. However, at Tapad, they have often had to look outside the standard machine learning toolkit to find inspiration from more traditional engineering algorithms. This has enabled them to solve a scaling problem with their Device Graph’s connected component, as well as maintaining time-consistency in cluster identification week over week.

In this talk Yael Elmatad, Data Scientist at Tapad, will discuss two algorithms they use frequently for these problems, namely the Hash-to-Min connected component algorithm and the Stable Marriage algorithm.

Dataenconfnyc2016 logos4


Small 1545224 10152105765716192 1764874921 n Pete Soderling on

We're excited to announce the Call for Papers for our next DataEngConf - to be held in NYC, late October 2016.

Talks fit into 3 categories - data engineering, data science and data analytics. We made it super-easy to apply, so submit your ideas here!

We'll be selecting two kinds of speakers for the event, some from top companies that are building fascinating systems to process huge amounts of data, as well as the best submitted talks by members of the Hakka Labs community.

Don't delay - CFP ends Aug 15th, 2016.

Small a5sz6tzp 400x400 Sam Abrahams on

Machine learning, especially deep learning, is becoming more and more important to integrate into day-to-day business infrastructure across all industries. TensorFlow, open-sourced by Google in 2015, has become one of the more popular modern deep learning frameworks available today, promising to bridge the gap between the development of new models and their deployment.

This talk is an in-depth workshop on the fundamentals of the TensorFlow framework. It aims to prepare listeners to have a firm grasp of the core TensorFlow classes and workflow enabling better comprehension of deep learning models and tutorials built in TensorFlow.

Sam Abrahams is a freelance data scientist and engineer. He is a long-time contributor to the TensorFlow repository and a co-author on the upcoming book TensorFlow for Machine Intelligence.

Small images Jeff Ma on

Jeff Ma's life and career were totally transformed by the advent of the big data movement. From beating blackjack to working with professional sports teams to a variety of entrepreneurial efforts leveraging analytics, all of that has lead him to his current role at Twitter where he works with some of the brightest minds and most interesting data available. Jeff will discuss his personal journey and where he sees the future of analytics in the workplace.

This talk is from the SF Data Science meetup in June 2016.

Small 983944 Peadar Coyle on

I've been working with Machine Learning models both in academic and industrial settings for a few years now. I've recently been watching the excellent Scalable ML from Mikio Braun, this is to learn some more about Scala and Spark.

His video series talks about the practicalities of 'big data' and so made me think what I wish I knew earlier about Machine Learning

  1. Getting models into production is a lot more than just micro services 

  2.  Feature selection and feature extraction are really hard to learn from a book

  3. The evaluation phase is really important

I'll take each in turn.


Small 2 ptzerp Ryan Adams on

Ryan Adams is a machine learning researcher at Twitter and a professor of computer science at Harvard. He co-founded Whetlab, a machine learning startup that was acquired by Twitter in 2015. He co-hosts the Talking Machines podcast.

A big part of machine learning is optimization of continuous functions. Whether for deep neural networks, structured prediction or variational inference, machine learners spend a lot of time taking gradients and verifying them. It turns out, however, that computers are good at doing this kind of calculus automatically, and automatic differentiation tools are becoming more mainstream and easier to use. In his talk, Adams will give an overview of automatic differentiation, with a particular focus on Autograd. I will also give several vignettes about using Autograd to learn hyperparameters in neural networks, perform variational inference, and design new organic molecules.

This talk is from the SF Data Science meetup in June 2016.

Small aaeaaqaaaaaaaacaaaaajgq1mmy3zwqxlwi4mtetndqzmc1hmzk4lwizzmnlmde3owvkma Nick Chamandy on

Simple “random-user” A/B experiment designs fall short in the face of complex dependence structures. These can come in the form of large-scale social graphs or, more recently, spatio-temporal network interactions in a two-sided transportation marketplace. Naive designs are susceptible to statistical interference, which can lead to biased estimates of the treatment effect under study.

In this talk we discuss the implications of interference for the design and analysis of live experiments at Lyft. A link is drawn between design choices and a spectrum of bias-variance tradeoffs. We also motivate the use of large-scale simulation for two purposes: as an efficient filter on candidate tests, and as a means of justifying the assumptions underlying our choice of experimental design.


Join Us