Small 20e3319227b03b50ec589c29a1e7fd25 400x400 Fabrizio Milo on

Tensorflow is one of the fastest growing open source deep learning frameworks available today. Tensorflow was developed internally by Google and released open source in November 2015.

Although it is mainly known to be applied to model deep learning architectures, Tensorflow's flexible interface makes it a good candidate for production level data-science pipelines as well.

In this talk, you will learn about the fundamentals of distributing Tensorflow models over multiple computers.

Fabrizio Milo is a deep learning architect and early TensorFlow contributor @ H2O.ai.

Continue
Placeholder Silviu Calinoiu on

This talk shows how to build an ETL pipeline using Google Cloud Dataflow/Apache Beam that ingests textual data into a BigQuery table. Google engineer Silviu Calinoiu gives a live coding demo and discusses concepts as he codes. You don't need any previous background with big data frameworks, although people familiar with Spark or Flink will see some similar concepts. Because of the way the framework operates the same code can be used to scale from GB files to TB files easily.

This talk was given as a joint event from SF Data Engineering and SF Data Science.

Continue
Placeholder Peter Bakas on

Peter Bakas from Netflix discusses Keystone, their new data pipeline. Hear in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak!

This talk was given as a joint event from SF Data Engineering and SF Data Science, and Peter Bakas - Director of Engineering, Real-Time Data Infrastructure, Netflix is the speaker.
Dataenconfnyc2016 logos4

Continue
Small 1545224 10152105765716192 1764874921 n Pete Soderling on

After the success of our last event in NYC, we decided to bring DataEngConf to San Francisco, April 7-8, 2016!

logos

DataEngConf is the first engineering conference that tackles real-world issues with data processing architectures and covers essential concepts of data science from an engineer's perspective.

Hear real world war-stories from data engineering & data science heroes from companies like Google, Airbnb, Slack, Stripe, Netflix, Clover Health, Yammer, Lyft and many more.

Use code "site20" for 20% off regularly priced tickets through 3/24.

Full info & tickets: http://www.dataengconf.com

Continue
Small joe doliner Joe Doliner on

As companies continue to become more data-driven, data pipelines have gotten much more complicated and we need new tools and workflows for managing them. In this talk, Joe Doliner, co-founder of Pachyderm, looks at some of the current data pipelining challenges and how he envisions them being solved in the future.

Dataenconfnyc2016 logos4

This talk was recorded at the SF Data Engineering Meetup at New Relic in San Francisco.

Continue
Unknown author on


Interested in learning more about data engineering and data science? Don't miss our 2 day DataEngConf with top engineers in San Francisco, April 2016.

In this article we put together 12 of the top Kafka talks on Hakka Labs.


Introduction to Apache Kafka

Apache Kafka is a commit log for your entire data center and infrastructure. In this lightning talk, Joe Stein, founder of Big Data Open Source Security LLC, gives a brief introduction to Kafka and talks about the producers, consumers, and client libraries it has to offer. This talk was given at the Apache Kafka NYC meetup at Tapad.

00:00


Kafka and Hadoop

Getting data from Kafka to Hadoop should be simple, which is why the community has so many options to choose from. Cloudera engineer, Gwen Shapira, reviews some popular solutions: Storm, Spark, Flume and Camus. She goes over the pros and cons of each, and recommends use-cases and future development plans as well. This talk was given at the Apache Kafka NYC meetup at Tapad.

00:00

Continue

Small joe crobak Joe Crobak on

Big data processing with Apache Hadoop, Spark, Storm and friends is all the rage right now. But getting started with one of these systems requires an enormous amount of infrastructure, and there are an overwhelming number of decisions to be made. Oftentimes you don't even know what kinds of questions you can or should be answering with your data.

As a first step, Joe Crobak (Software Engineer, Project Florida) describes the types of problems that people typically solve with a data pipeline—things like A/B testing and data warehousing. Then, drawing from his personal experience of building data tools at Foursquare and a from-scratch data pipeline at a new startup, he'll highlight the key questions to ask and best practices you should implement to encourage success.

57:54

This talk was presented at the Axial Lyceum in NYC.

Continue

Join Us