Word2Vec is an interesting unsupervised way to construct vector representations of words to act as features for downstream algorithms or as a basis for similarity searches. We look at using the Spark implementation of Word2Vec shipped in MLLib to help us organize and make sense of some non-textual data by treating discrete clinical events (I.e. Diagnoses, drugs prescribed, etc.) in a medical dataset as non-textual "words”.

Continue