Hadoop Summit 2015: How to Use PageRank for Fraud Detection in Healthcare Data
Anomaly detection in healthcare data is an enabling technology for the detection of overpayment and fraud. In this talk, we demonstrate how to use PageRank with Hadoop and SociaLite (a distributed query language for large-scale graph analysis) to identify anomalies in healthcare payment information. We demonstrate a variant of PageRank applied to graph data generated from the Medicare-B dataset for anomaly detection, and show real anomalies discovered in the dataset.
Hadoop Summit 2015: Using Natural Language Processing on Non-Textual Data with MLLib
Word2Vec is an interesting unsupervised way to construct vector representations of words to act as features for downstream algorithms or as a basis for similarity searches. We look at using the Spark implementation of Word2Vec shipped in MLLib to help us organize and make sense of some non-textual data by treating discrete clinical events (I.e. Diagnoses, drugs prescribed, etc.) in a medical dataset as non-textual "words”.
Information Theoretic Metrics for Multi-Class Predictor Evaluation
The most common metrics used to evaluate a classifier are accuracy, precision, recall and F1 score.