Small 1368312 Jared Polivka on

In this post, you’ll learn about a special data science giveaway and will get a sneak peak at the three talks I’m most excited about at DataEngConf NYC.

Two of my favorite Data Science Conferences are coming up in November (one in NYC and the other in SF), and you have a chance to win tickets to both of them! Here are the giveaway details:

Grand Prize:

Enter Here Now:

a Rafflecopter giveaway

Thanks to Our Sponsors:

This giveaway was made possible by the Learn Data Science Meetup community and by our amazing sponsors:

Hakka Labs, the creators of DataEngConf

Hakka Labs is an amazing community for data engineers and scientists comprised of thought leaders at influential tech companies like Google, Netflix, LinkedIn, Airbnb, Slack and many others. And as you know, DataEngConf is one of my favorite conferences (the content and the people attending are incredible!).


Created by my friend Courtney Burton, MLconf is a single day, single track event, devoted to the Machine Learning and Data Science community in major cities, agnostic of any tool, platform or company.

*Note: You can read more about MLconf’s and DataEngConf’s backstory in my Quora Answer: “Data Science Conferences - One List to Rule Them All”

AltWork Stations

To me, AltWork represents a way to code comfortably and healthily. You can stand, sit or recline…these workstations are amazing (I feel like captain Kirk while working at an AltWork Station...)

DataEngConf - The Three Talks I’m Most Excited About

Here are the three talks at DataEngConf that I’m most excited about:

1.) Peloton: the Self-Driving Database Management System

By Andy Pavlo - Carnegie Mellon University

Andy Pavlo, from Carnegie Mellon University, is one of the rising stars in databases. Even his job title is cool: Assistant Professor of Databaseology.

Andy argues that we need a DBMS that ‘manages’ itself, and doesn’t require human decision regarding the configuration or maintenance of underlying database mechanisms.

2.) The Future of Column-Oriented Processing with Arrow and Parquet

By Julien La Dem - Principal Architect at Dremio, Apache Parquet co-founder and PMC chair

Columnar storage has been one of the key innovations of the ‘big data’ era, and we’ll hear about the most up-to-date ways it’s currently being used in tools like Kudu, Ibis, Drill, Arrow and others.

3.) Kafka Streams: Stream Processing Made Easy

By Guozhang Wang - Kafka Committer & Software Engineer, Confluent

Considering how ubiquitous Kafka has become in processing large amounts of data in the largest web platforms (starting at LinkedIn), I’m fascinated to see how Kafka Streams compare to Spark Streaming and which one will take the top spot in modern streaming architectures post Twitter’s Storm.

About Jared Polivka:

Jared is the Director of the Developer Evangelist team at Galvanize - the learning community for technology. Learn about Galvanize’s data science training here.

About the Galvanize Developer Evangelist Team:

The Galvanize Developer Evangelists currently create content for the data science and web development communities in 7 cities (San Francisco, Seattle, Austin, New York City, Denver, Boulder and Phoenix). Join a data science community near you via Learn Data Science.

Small aaeaaqaaaaaaaamcaaaajgy0yjhimti1ltg1m2ytndq2os04zddjlwniogy2zte2ymzhnq Ajay Sharma on

I'm a data scientist from SF who relocated to NYC this spring. I prudently spent the prior 8 months scoping & planning, making sure there was a healthy appetite for data scientists in the region. But when I got here it didn't seem like I was getting the responses to my outreach I had anticipated ...

Why is No One Getting Back to Me?

I was a little skeptical the slow-start was attributed to just my own performance and the typical nature of a job search. From what I could tell, the pool of actively open jobs was quite shallow. Eagerly searching for an explanation, I decided to plot the number of data scientist job postings from this year and last year.

The data is from Gary's Guide which does an excellent job of curating tech job postings in NYC ('Data Scientist' used for the search term). This isn't indicative of all the jobs in NYC and is quite biased given the curation but I'd imagine there would be a similar trend for all data science jobs in NYC and insightful from seasonality perspective at the minimum.

What the Data Shows

Looking at hiring trends from last year, there's two peaks: the lion's share of hiring done in the spring, a lull in late summer/early fall, and another upswing just before the holidays -- which is typical seasonality.



Small 928322 Chris Johnson on


This awesome talk by Chris Johnson and Edward Newett, machine learning engineers at Spotify, shows how they imagined, tested, iterated and built the highly-popular "Discover Weekly" feature of Spotify from start to finish.

Learn how product-oriented engineers think in this talk and the tradeoffs they make as they're looking for ways to rapidly test ideas and iterate. Of course, this product was built on music recommendations, so you'll also get to see how they thought through the process to figure out exactly how to generate meaningful recommendations for their millions of users.

Dataenconfnyc2016 logos4

This talk was a talk recorded at the DataEngConf 2015 event in NYC.

Small aaeaaqaaaaaaaalzaaaajdcxmdnhotazlwezodqtngm4mc1iyzfmltdkmzrjztzlywy3mg Daniel Blazevski on

Dan Blazevski from Insight Data Science presents some recent progress on Apache Flink's machine learning library, focusing on a new implementation of the k-nearest neighbors (knn) algorithm for Flink.

In the spirit of the Kappa Architecture, Apache Flink is a distributed batch and stream processing tool that treats batch as a special case of stream processing. Dan discusses a few ways, both exact and approximate, to do distributed knn queries, focusing on using quadtrees to spatially partition the training set and using z-value based hashing to reduce dimensionality.


Small 237b51c Yael Elmatad on

Many data scientists work within the realm of machine learning, and their problems are often addressable with techniques such as classifiers and recommendation engines. However, at Tapad, they have often had to look outside the standard machine learning toolkit to find inspiration from more traditional engineering algorithms. This has enabled them to solve a scaling problem with their Device Graph’s connected component, as well as maintaining time-consistency in cluster identification week over week.

In this talk Yael Elmatad, Data Scientist at Tapad, will discuss two algorithms they use frequently for these problems, namely the Hash-to-Min connected component algorithm and the Stable Marriage algorithm.

Dataenconfnyc2016 logos4


Small 20e3319227b03b50ec589c29a1e7fd25 400x400 Fabrizio Milo on

Tensorflow is one of the fastest growing open source deep learning frameworks available today. Tensorflow was developed internally by Google and released open source in November 2015.

Although it is mainly known to be applied to model deep learning architectures, Tensorflow's flexible interface makes it a good candidate for production level data-science pipelines as well.

In this talk, you will learn about the fundamentals of distributing Tensorflow models over multiple computers.

Fabrizio Milo is a deep learning architect and early TensorFlow contributor @

Small 1545224 10152105765716192 1764874921 n Pete Soderling on

We're excited to announce the Call for Papers for our next DataEngConf - to be held in NYC, late October 2016.

Talks fit into 3 categories - data engineering, data science and data analytics. We made it super-easy to apply, so submit your ideas here!

We'll be selecting two kinds of speakers for the event, some from top companies that are building fascinating systems to process huge amounts of data, as well as the best submitted talks by members of the Hakka Labs community.

Don't delay - CFP ends Aug 15th, 2016.

Small a5sz6tzp 400x400 Sam Abrahams on

Machine learning, especially deep learning, is becoming more and more important to integrate into day-to-day business infrastructure across all industries. TensorFlow, open-sourced by Google in 2015, has become one of the more popular modern deep learning frameworks available today, promising to bridge the gap between the development of new models and their deployment.

This talk is an in-depth workshop on the fundamentals of the TensorFlow framework. It aims to prepare listeners to have a firm grasp of the core TensorFlow classes and workflow enabling better comprehension of deep learning models and tutorials built in TensorFlow.

Sam Abrahams is a freelance data scientist and engineer. He is a long-time contributor to the TensorFlow repository and a co-author on the upcoming book TensorFlow for Machine Intelligence.

Small images Jeff Ma on

Jeff Ma's life and career were totally transformed by the advent of the big data movement. From beating blackjack to working with professional sports teams to a variety of entrepreneurial efforts leveraging analytics, all of that has lead him to his current role at Twitter where he works with some of the brightest minds and most interesting data available. Jeff will discuss his personal journey and where he sees the future of analytics in the workplace.

This talk is from the SF Data Science meetup in June 2016.

Small 983944 Peadar Coyle on

I've been working with Machine Learning models both in academic and industrial settings for a few years now. I've recently been watching the excellent Scalable ML from Mikio Braun, this is to learn some more about Scala and Spark.

His video series talks about the practicalities of 'big data' and so made me think what I wish I knew earlier about Machine Learning

  1. Getting models into production is a lot more than just micro services 

  2.  Feature selection and feature extraction are really hard to learn from a book

  3. The evaluation phase is really important

I'll take each in turn.


Join Us