Normal ny cover

Hakka Labs

New York, NY

About Hakka Labs Engineering

Hakka Labs helps programmers discover top companies through surfacing the technical challenges they are working on.
Check out the amazing teams on Hakka, browse through the content by their engineers and connect directly with an engineer in the company if you decided to join the team!

Small 1545224 10152105765716192 1764874921 n Pete Soderling on

Call for Papers! DataEngConf NYC 2016

We're excited to announce the Call for Papers for our next DataEngConf - to be held in NYC, late October 2016.

Talks fit into 3 categories - data engineering, data science and data analytics. We made it super-easy to apply, so submit your ideas here!

We'll be selecting two kinds of speakers for the event, some from top companies that are building fascinating systems to process huge amounts of data, as well as the best submitted talks by members of the Hakka Labs community.

Don't delay - CFP ends Aug 15th, 2016.

Small hakka labs 2 Hakka Labs on

Sneak Peek: Practical Machine Learning for Engineers

Check out the first 20 minutes of our previous Practical Machine Learning training taught by Juan M. Huerta, Senior Data Scientist at PlaceIQ.


Join us for our next 3-day training! November 10th-12th. This course is designed to help engineers collaborate with data scientists and create code that tackles increasingly complex machine learning problems. The course will be taught by Rachit Srivastava (Senior Data Scientist, PlaceIQ) and supervised by Juan.

By the end of this training, you will be able to:

  • Apply common classification methods for supervised learning when given a data set

  • Apply algorithms for unsupervised learning problems

  • Select/reduce features for both supervised and unsupervised learning problems

  • Optimize code for common machine learning tasks by correcting inefficiencies by using advanced data structures

  • Choose basic tools and criteria to perform predictive analysis

We screen applicants for engineering ability and drive, so you'll be in a room full of passionate devs who ask the right questions. Applicants should have 3+ years of coding experience, knowledge of Python, and previous exposure to linear algebra concepts.

You can apply for a seat on our course page.

Unknown author on

Two Hulu Engineers Explain Their Approach to Scalability

Matt Jurik's talk about the architecture of Hulu's video progress service piqued my interest in Hulu's approach to scalability. In a Planet Cassandra interview, Matt's teammate Andreas Rangel (Senior Software Engineer, Hulu) explains how they've scaled their data model. Here's what we learned:

1. Hulu's keyspace is a beast:

  • Primary C* cluster runs version 1.2.12

  • 16 nodes split between 2 datacenters

  • Our watch history keyspace contains several billion CQL3 rows with approximately 1TB of data per datacenter

  • Individual nodes are 12-core machines with 48GB RAM using multiple SSDs in RAID5 configuration

Unknown author on

Diagrams for Devs: Hulu's Architecture Before and After Cassandra

Matt Jurik (Software Developer, Hulu) gave an excellent talk at Cassandra Day Silicon Valley about Hulu's migration to Cassandra. The talk features awesome diagrams of Hulu's architecture with a focus on the Hugetop service. Hugetop tracks users' progress in content. Hulu has been able to scale this service to accommodate over 400 million monthly plays. Here are my favorite snapshots from the talk.

1. Hulu's old architecture (MySQL)

Hulu's old architecture

"The old architecture was based on MySQL. As you can see, at the top, we have, devices, and other services - these aren't exposed to the Internet. These are the primary three sources of web requests that come through Hugetop. Hugetop itself is really just a Python application. We use TFS to make sure that it's Async. As you can imagine, this service is one of our busier ones - there are lots of people watching videos, getting updates on where people are on videos, or processing requests for various traits on the progress indicator. There's a lot of concurrency and we use Python to make sure it can handle an extremely high for data stores we used Redis and originally MySQL...the Redis shards are there for cacheing..."

Unknown author on

Level-Up This Weekend: 5 GoSF Meetup Tech Talks

Go SF Meetup

The Go Programming Language SF Meetup Group is one of our favorites. Since the meetup is geared toward advanced engineers, the talks are cutting-edge, thought provoking, and highly practical.

Check out our collection of GoSF talk recordings this weekend. Here are the five most recent GoSF talks in the library:

  • Stream Multiplexing in Go by Alan Shreve of

  • Building Distributed Systems with Go and Mesos by Niklas Nielsen of  Mesosphere

  • Go Dependency Management by Keith Rarick - Formerly of Heroku

  • Dependency Management, CoreOS and Go by Brandon Phillips of CoreOS

  • Building Web Services in Go by Richard Crowley of Betable

Unknown author on

NYC and SF Tech Talks: April 7 - 13th

We hope to see you at a few of these talks! As always, we'll post the video of talks we attend as they are recorded and edited.

NYC Events

Bluetooth Night: Akbar Dhanaliwala, Chris Mollis, Craig Miller, and Ernst Schmidt (Tues, Apr 8)
Akbar Dhanaliwala of Pocobor will present CoreBluetooth with custom hardware. Chris Mollis, Craig Miller, and Ernst Schmidt of Objectlab will run through a live end-to-end demo of iBeacons and CoreLocation.

Writing DSLs with Parslet (Tues, Apr 8)
Parslet makes it easy to write well-designed DSLs in pure Ruby. In this talk you’ll learn the basics, feel out the limitations of several approaches and find some common solutions.

Small 1545224 10152105765716192 1764874921 n Pete Soderling on

Big, Small, Hot or Cold - Your Data Needs a Robust Pipeline (Examples from Stripe, Tapad, Etsy & Square)

In response to a recent post from MongoHQ entitled “You don’t have big data," I would generally agree with many of the author’s points.

However, regardless of whether you call it big data, small data, hot data or cold data - we are all in a position to admit that *more* data is here to stay - and that’s due to many different factors.

Perhaps primarily, as the article mentions, this is due to the decreasing cost of storage over time. Other factors include access to open APIs, the sheer volume of ever-increasing consumer activity online, as well as a plethora of other incentives that are developing (mostly) behind the scenes as companies “share” data with each other. (You know they do this, right?)

But one of the most important things I’ve learned over the past couple of years is that it’s crucial for forward thinking companies to start to design more robust data pipelines in order to collect, aggregate and process their ever-increasing volumes of data. The main reason for this is to be able to tee up the data in a consistent way for the seemingly-magical quant-like operations that infer relationships between the data that would have otherwise surely gone unnoticed - ingeniously described in the referenced article as correctly “determining the nature of needles from a needle-stack.”

But this raises the question - what are the characteristics of a well-designed data pipeline? Can’t you just throw all your data in Hadoop and call it a day?

As many engineers are discovering - the answer is a resounding "no!" We've rounded up four examples from smart engineers at Stripe, Tapad, Etsy & Square that show aspects of some real-world data pipelines you'll actually see in the wild.

Unknown author on

DevOps Week Schedule!

Monday, Jan 27th

Stop Hiring DevOps Experts and Start Growing Them (Video)

Released 12 PM, Jan 27

Everyone is putting “DevOps” on their LinkedIn profile, and everyone is trying to hire them. In this post, Jez Humble from ThoughtWorks will argue that this is not a recruitment problem but an organizational failure. If you want to learn some of the best DevOps practices that your organization can employ, this is the talk for you.

Selenium for Automation – Testing Axial-Style Part 1 (Article)

Released 2 PM, Jan 27

At Axial they recognized the need to make their testing efforts more reusable and they accomplished this by doing what their corpus does best: build. They built Axium, a test automation suite that is easily executed, understood, maintained and configured. This article is part one of an insightful three part series.

DevOps Best Practices: Artsy (Article)

Released 5 PM, Jan 27

Great Q & A in this article as Daniel Doubrovkine, Head of Engineering at Artsy talks about some best DevOps practices at Artsy.

Join Us