Yunliang Jiang, engineer at Thumbtack, shares from his PhD research about data mining techniques he's applied to the wealth of unstructured health data available online.Continue
Deep learning is all the rage in advanced analytics. How does it work and how can it scale? Adam Gibson, Data Scientist and Co-founder of Skymind, explains why representational learning is an advance over traditional machine learning techniques. He also gives a demo of a working deep-belief net with a tour through DL4J's API, showing how a DBN extracts features and classifies data.
This talk was given at the SF Data Mining meetup at Trulia.Continue
Mining Big Data can be an incredibly frustrating experience due to its inherent complexity and a lack of tools. Reynold Xin and Aaron Davidson are Committers and PMC Members for Apache Spark and use the framework to mine big data at Databricks. In this presentation and interactive demo, you'll learn about data mining workflows, the architecture and benefits of Spark, as well as practical use cases for the framework.Continue
Open source tools usually delegate their support service to community forums. How reliable is this strategy? In this talk, Rosaria Silipo answers that question and this one, "who says that Open Source Software does not have support?" She measures the efficiency of the community forum from 2007 to 2012 of KNIME, an open source data analytics platform. Commonly used techniques in social media analysis, such as web crawling, web analytics, text mining, and network analytics, are used to investigate the forum characteristics. Each part is described in detail during this presentation. This talk was recorded at the SF Data Mining meetup at inPowered.Continue
In this talk Vitaly Gordon and Patrick Philips of LinkedIn will present how the LinkedIn data science team hacks data science using sophisticated data mining and crowdsourcing techniques to leverage the data they already have and create the data that's missing. This talk was recorded at SF Data Mining meetup at Trulia.Continue
John Jensen and Mike Sherman will be speaking about their problem domain over at Rich Relevance . At Rich Relevance, they provide content personalization as a service, mostly to retailers. Unlike Pandora, they don't use intrinsic similarity metrics with in-depth knowledge about the domain they are recommending. This talk was recorded at the SF Data Mining meetup at Pandora HQ.Continue
Recommendation engines typically produce a list of recommendations in one of two ways - through collaborative or content-based filtering. Collaborative filtering approaches to build a model from a user's past behavior (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users, then use that model to predict items (or ratings for items) that the user may have an interest in. Content-based filtering approaches utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties.Continue