Outlier Selection and One-Class Classification
In this talk, Jeroen Janssens, senior data scientist at YPlan, introduces both the outlier selection and one-class classification setting. He then presents a novel algorithm called Stochastic Outlier Selection (SOS). The SOS algorithm computes for each data point an outlier probability. These probabilities are more intuitive than the unbounded outlier scores computed by existing outlier-selection algorithms. Jeroen has evaluated SOS on a variety of real-world and synthetic datasets, and compared it to four state-of-the-art outlier-selection algorithms. The results show that SOS has a superior performance while being more robust to data perturbations and parameter settings. Click Here for the link to Jeroen's blogpost on the subject, it contains links to the d3 demo! This talk was recorded at the NYC Machine Learning meetup at Pivotal Labs.
What is common in a terrorist attack, a forged painting, and a rotten apple? The answer is: all three are anomalies; they are real-world observations that deviate from what is considered to be normal. Detecting anomalies is of utmost importance because an undetected anomaly can be dangerous or expensive. A human domain expert may suffer from three cognitive limitations: fatigue, information overload, and emotional bias. The cognitive limitations will hamper the detection of anomalies. Outlier-selection and one-class classification algorithms are capable of automatically classifying data points as outliers in large amounts of data. During Jeroen's Ph.D. he studied to what extent outlier-selection and one-class classification algorithms can support domain experts with real-world anomaly detection.
This talk is largely based on chapters 1, 2, and 4 of Jeroen's Ph.D. thesis (see https://github.com/jeroenjanssens/phd-thesis). In case you are just interested in the SOS algorithm itself, you can download the Technical Report, which corresponds to chapter 4 (see https://github.com/jeroenjanssens/sos). Jeroen will soon add a Python implementation of the SOS algorithm to the latter repository
Bio: Jeroen Janssens is a senior data scientist at YPlan, tonight's going out app, where he's responsible for making event recommendations more personal. Jeroen holds an M.Sc. in Artificial Intelligence from Maastricht University and a Ph.D. in Machine Learning from Tilburg University. He is authoring a book called "Data Science at the Command-line", which will be published by O'Reilly in summer 2014. Jeroen enjoys biking the Brooklyn Bridge, building tools, and blogging at http://jeroenjanssens.com.