Xiangrui Meng, a committer on Apache Spark, talks about how to make machine learning easy and scalable with Spark MLlib. Xiangrui has been actively involved in the development of Spark MLlib and the new DataFrame API. MLlib is an Apache Spark component that focuses on large-scale machine learning (ML). With 50+ organizations and 110+ individuals contributing, MLlib is one of the most active open-source projects on ML. In this talk, Xiangrui shares his experience in developing MLlib. The talk covers both higher-level APIs, ML pipelines, that make MLlib easy to use, as well as lower-level optimizations that make MLlib scale to massive datasets.Continue
Amy is a digital personal assistant who schedules meetings for you. Amy understands your calendar-preferences and negotiates meeting times with other people on your behalf - through email. Your meeting guests talk to Amy as they would with a human.Continue
Hosted by Hakka Labs
This 3-day course will demonstrate the fundamental concepts of machine learning by working on a dataset of moderate size, using open source software tools.
This course is designed to help engineers collaborate with data scientists and create code that tackles increasingly complex machine learning problems. By the end of this course, you will be able to:
-Apply common classification methods for supervised learning when given a data set
-Apply algorithms for unsupervised learning problems
-Select/reduce features for both supervised and unsupervised learning problems -Optimize code for common machine learning tasks by correcting inefficiencies by using advanced data structures
-Choose basic tools and criteria to perform predictive analysis
The intended audience of this Machine Learning course is the engineer with strong programming skills as well as a certain level of exposure to linear algebra and probability. Students should understand the basic issue of prediction as well as Python.
Day 1: Linear Algebra/Probability Fundamentals and Supervised Learning
The goal of day one is to give engineers the linear algebra/probability foundation they need to tackle problems during the rest of the course and introduce tools for supervised learning problems.
-Quick Introduction to Machine Learning
-Linear Algebra, Probability and Statistics,
-Linear and Quadratic Discriminant Analysis
-Support Vector Machines and Kernels
-Lab: Working on classification problems on a data set
Day 2: Unsupervised learning, Feature Selection and Reduction
The goal of day two is to help students understand the mindset and tools of data scientists.
-K nearest neighbors, Random Forests, Naive Bayes Classifier
-Information Theoretic Approaches
-Feature Selection and Model Selection/Creation
-Principal Component Analysis/Kernel PCA
-Independent Component Analysis
- Lab: Choosing Features and applying unsupervised learning methods to a data set
Day 3: Performance Optimization of Machine Learning Algorithms
The goal of day three is to help students understand how developers contribute to complex machine learning projects.
-Unsupervised Learning Continued
-DB-SCAN and K-D Trees
-Recommendation Systems and Matrix Factorization Methods
-Lab: Longer lab working on back-end Machine Learning optimization programming problems in Python