Small hakka labs 2 Hakka Labs on

Check out the first 20 minutes of our previous Practical Machine Learning training taught by Juan M. Huerta, Senior Data Scientist at PlaceIQ.

20:22

Join us for our next 3-day training! November 10th-12th. This course is designed to help engineers collaborate with data scientists and create code that tackles increasingly complex machine learning problems. The course will be taught by Rachit Srivastava (Senior Data Scientist, PlaceIQ) and supervised by Juan.

By the end of this training, you will be able to:


  • Apply common classification methods for supervised learning when given a data set

  • Apply algorithms for unsupervised learning problems

  • Select/reduce features for both supervised and unsupervised learning problems

  • Optimize code for common machine learning tasks by correcting inefficiencies by using advanced data structures

  • Choose basic tools and criteria to perform predictive analysis


We screen applicants for engineering ability and drive, so you'll be in a room full of passionate devs who ask the right questions. Applicants should have 3+ years of coding experience, knowledge of Python, and previous exposure to linear algebra concepts.

You can apply for a seat on our course page.

Continue
Small 115151 Joe Stein on

Apache Kafka is a commit log for your entire data center and infrastructure. In this lightning talk, Joe Stein, founder of Big Data Open Source Security LLC, gives a brief introduction to Kafka and talks about the producers, consumers, and client libraries it has to offer.

This talk was given at the Apache Kafka NYC meetup at Tapad.

07:11

 
We have many more articles on Apache Kafka. Check out our collection of top 12 tech talks on Apache Kafka.

Continue
Small jay kreps Jay Kreps on

Apache Kafka committer, Jay Kreps from LinkedIn, walks through a brief production timeline for Kafka. Jay goes over what's new with 0.8.2 and how to get the most out of new features like Log Compaction and the new Java producer. Jay also gives an overview what to expect from 0.9(?): a new consumer, better security and operational improvements.

This talk was given at the Apache Kafka NYC meetup at Tapad.

16:45

 

We have many more articles on Apache Kafka. Check out our collection of top 12 tech talks on Apache Kafka.

Continue
Small gwen shapira Gwen Shapira on

Getting data from Kafka to Hadoop should be simple, which is why the community has so many options to choose from. Cloudera engineer, Gwen Shapira, reviews some popular solutions: Storm, Spark, Flume and Camus. She goes over the pros and cons of each, and recommends use-cases and future development plans as well.

This talk was given at the Apache Kafka NYC meetup at Tapad.

20:22

We have many more articles on Apache Kafka. Check out our collection of top 12 tech talks on 12 Apache Kafka tutorials.

 

Continue
Small shah Sameena Shah on

Dr. Shah, team lead at Thomson Reuters R&D, discusses strategies that have (and have not) worked in dealing with massive datasets.

She draws from two projects at Thomson Reuters to create novel insights from large scale data:
1. Using automation to identify expert stock recommenders on Twitter.
2. "Language Magnet", a project that uses natural language processing to mine abnormalities in SEC filings.

01:05:12

This talk was filmed at the Machine Learning meetup at Pivotal Labs in New York.

 

Interested in Machine Learning? Check out our 3-day course Practical Machine Learning for Engineers, Nov 10-12th in NYC.

Continue
Small nick gorski Nick Gorski on

TellApart Software Engineer Nick Gorski takes us through a technical deep-dive into TellApart's personalization system. He discusses the machine learning data pipeline at TellApart that powers the models, real-time calculations of the expected value of shoppers, and how to translate that value into a bid price for every bid request received (hundreds of thousands per second).

01:08:49

This talk was given at the SF Data Mining meetup hosted at TellApart.

 

Interested in Machine Learning? Check out our 3-day course Practical Machine Learning for Engineers, Nov 10-12th in NYC.

Continue
Small 634bfc74aad700ce04b7b4a4167030ea Anand Henry on

In this talk, Anand Henry, Senior Software Engineer at Eventbrite, talks about their use of Apache Cassandra. Anand focuses on the Eventbrite data model & access patterns and the architecture of an Apache Cassandra Powered Recommendation Engine. He also goes over Cassandra as a data store to serve recommendations based on email, mobile push notifications, and web APIs. Later in the talk he touches on user audit logging with Apache Cassandra.

30:43

This talk was recorded at the DataStax Cassandra SF Users meetup.

 

Interested in Machine Learning? Check out our 3-day course Practical Machine Learning for Engineers, Nov 10-12th in NYC.

Continue
Small toby tapad summer party 2014 square med Toby Matejovsky on

Tapad's data pipeline is an elastic combination of technologies (Kafka, Hadoop, Avro, Scalding) that forms a reliable system for analytics, realtime and batch graph-building, and logging. In this talk, Tapad Senior Software Developer Toby Matejovsky speaks about the creation and evolution of the pipeline. He demonstrates a concrete example – a day in the life of an event tracking pixel. Toby also talks about common challenges that his team has overcome such as integrating different pieces of the system, schema evolution, queuing, and data retention policies.

01:01:19

We have many more articles on Apache Kafka. Check out our collection of our top Apache Kafka tech talks.

This talk was given at the NYC Data Engineering meetup hosted at Spotify HQ in NYC.

 

Continue
Small kiyototamura Kiyoto Tamura on

Fluentd is an open source data collector started by Treasure Data, that helps simplify and scale log management. In this talk, Kiyoto Tamura, Director of Developer Relations at Treasure Data, Inc. gives an overview and demo of Fluentd and goes through its most popular use cases.

38:46

This talk was given at the Logging meetup hosted by Salesforce in SF.

Continue
Small prasanna swaminathan Prasanna Swaminathan on

As online advertising has grown from an experiment on a marketer’s checklist to a critical tool in the proverbial toolbox, so has the demand for actionable metrics of performance.

At first, measuring engagement was straightforward. A site serves a user an ad (delivered by an unbiased third-party, the ad server), and a user clicks on that ad to go to whatever page the marketer desired. Ad servers then collect the number of clicks and impressions, which serves two primary purposes. The first is that marketers use these numbers to draw insights into how their campaigns are performing. The second is that marketers pay their advertising partners based on things like number of clicks.

Soon, marketers clamored to gain deeper insights. Technology vendors introduced cookies to attribute actions on the site, such as a product purchase or online signup, called a “conversion,” to an ad impression or click. It’s this process — attributing actions on a site to ad impressions and clicks — where things get tricky, and which this blog post will attempt to explain.

Continue

Join Us