Brooklyn, NY

About Etsy Engineering

At Etsy, our mission is to enable people to make a living making things. The engineers who make Etsy make our living making something we love: software.
Etsy engineering team believes that code is craft, good software and systems designs are works of art, and that the work we do is part of larger creative culture represented by the hundreds of thousands of inspired makers who make Etsy such a wondrous marketplace. We believe that small, empowered, self-motivated teams can do big things. We also believe in the right tool for the job, not language-as-religion. Our current systems run PHP, Java, Scala, Python, Ruby, Solr/Lucene, Postgres, MySQL, and more.

Small rafe bigger Rafe Coburn on

Migrating to Kafka in Three Short Years

Three years ago, Etsy's analytics data pipeline was built around a pixel hosted on Akamai, FTP uploads, and Amazon EMR. Rafe Colburn, manager of the data engineering team at Etsy, talks about their migration to a data ingestion pipeline based on Kafka. He gives an overview on how they rebuilt their data pipeline without disrupting ongoing analytics work, as well as the tradeoffs made in building these systems.

Dataenconfnyc2016 logos3


This talk was given at the NYC Data Engineering meetup at Spotify.

This talk is included in our collection of the Top 12 Apache Kafka talks. Check out the others here.

Small mikebrittain Mike Brittain on

Distributed Release Management: Deploying 40+ Times per Day

Etsy engineers deploy 40+ times per day to How does a team of 175+ committers maintain uptime for 60+ million unique monthly visitors?

In this talk, Mike Brittain (Engineering Director, Etsy) explains the structures and processes behind continuous delivery at Etsy.


You can see Mike's slides here:

This talk was recorded at the Full-Stack Engineering Meetup hosted by Gilt Group in NYC.

Small 4efce66200f3aa9ff8f64b01679b9bbe Daniel Schauenberg on

Scaling Deployment at Etsy by Daniel Schauenberg

In this talk, "Scaling Deployment," Daniel Schauenberg from Etsy talks on the development and deployment infrastructure that they utilize at Etsy. This talk was recorded at the Continuous Delivery NYC meetup at Etsy Labs. At Etsy they have over 100 engineers deploying more than 60 times a day. This culture of continuously deploying small change sets enables them to build and release robust features all while serving over a billion page views per month. In order to make sure they can keep up this pace, they have development and deployment infrastructure in place that makes it comfortable and simple to make changes. So simple that as an engineer at Etsy you deploy the site on your first day - even if you're a dog.


But how is it possible to deploy so frequently among so many engineers and yet maintain a stable system? To answer this Dave gives a high-level overview of the basic application structure to introduce the specific architecture at Etsy and how their development environment is set up. For development, every engineer gets their own VM with the full application stack configured. This makes it easy to get started and puts every developer in the same, familiar setup. This is a crucial part in removing confusion and ambiguity about how to work on and deploy changes.

For the actual deployment Etsy uses a one-button deploy system - Deployinator - which they developed and open sourced. This system is integrated tightly with their company-wide IRC server and its set of tools that they've built to foster confidence, fast feedback and easy communication and collaboration between engineers. A detailed overview will be given on how the system works, how they use it and what problems they had to solve while making sure everyone can deploy as easily and fast as possible. One of the pillars of keeping it fast and scalable was also implementing Atomic Deploys for their web stack. This talk will go into details about what challenges they faced and how Etsy made it work with minimal interruption to the developer workflow.

Small a3a73e4cad5fa0a6c7249b929c0be14e Abe Stanway on

Etsy - A Deep Dive into Monitoring with Skyline by Abe Stanway - Transcript

(Original post with video of talk here)

Abe Stanway: Okay. Hi, I’m Abe. I’m a data engineer at Etsy and today we’re going to talk about Skyline, of which I was the primary author. And so we’re going to talk about how we monitor, why we decided to build this, and how it advances the art of monitoring. So let’s start.

So Etsy is the world’s handmade vintage marketplace. We are based right here in Dumbo, so this wasn’t too much of a pain to get up here. We have a large stack. We’ve got a lot of stuff going on. Specifically, or actually not specifically at all, these are just some of our numbers of some of the servers that we’re dealing with – 41 shards, MySQL, 24 API servers, 72 web servers, 42 Gearman boxes, a 150 node Hadoop cluster, 15 memcached boxes, and around 60 search machines, and a lot more than that. Probably on a scale of a hundred to two hundred, for sure other various services come with a lot of things that we have.

And that’s not to mention the app itself, which is running on top of all these machines, and all the services that are actually running on these machines. In addition to that, we practice something called continuous deployment, which is kind of the new hotness we’ve developed with Devoxx, right. It’s kind of always deploying every single day, so we deploy around thirty to sixty times a day, every day, and we make this really really easy to do for all our engineers.

Small a3a73e4cad5fa0a6c7249b929c0be14e Abe Stanway on

Etsy - A Deep Dive into Monitoring with Skyline

Data Engineer at EtsyAbe Stanway, talks about Skyline, a real-time anomaly detection tool. The talk was recorded at eBay NYC. Abe goes over Skyline's architecture and design, taking a deep dive into the architecture and design of Skyline. This talk was recorded at the NYC Data Engineering meetup at Ebay NYC.

Skyline is a real-time anomaly detection system, built to enable passive monitoring of hundreds of thousands of metrics, without the need to configure a model/thresholds for each one, as you might do with Nagios. It is designed to be used wherever there are a large quantity of high-resolution time-series which need constant monitoring. Once a time metrics stream is set up (from StatsD or Graphite or other source), additional metrics are automatically added to Skyline for analysis. Skyline's easily extendible algorithms allow you to define what each metrics baseline should be, thereby also defining anomalous behavior. After Skyline detects an anomalous metric, it surfaces the entire time-series to the web app, where the anomaly can be viewed and acted upon.

Get updates of upcoming tech talks and presentations

If you'd like to be notified when we post new tech talks, developer presentations and opensource updates, you can subscribe to our newsletter, or YouTube channel.

Want to hear from more top engineers?

Our weekly email contains the best software development content and interviews with top CTOs. Enter your email address now to stay in the loop.


Join Us