Machine Learning Tutorial: High Performance Text Processing

This talk is part lecture, part hands-on machine learning tutorial. Daniel Krasner (Co-founder KFit Solutions and Research Scholar at Columbia University) covers rapid development of high performance scalable text processing solutions for tasks such as classification, semantic analysis, topic modeling and general machine learning.


Watch and learn:

  • How Python modules, and in particular the Rosetta Python library, can be used to process, clean, tokenize, extract features, and finally build statistical models with large volumes of text data.

  • Details about ┬áRosetta library which focuses on creating small and simple modules (each with command line interfaces) that use very little memory and are parallelized with the multiprocessing package.

  • How to get started with LDA topic modeling and different implementations thereof (Vowpal Wabbit and Gensim).

This talk was recorded at the NYC Machine Learning meetup at Pivotal Labs.