Convex Relaxations for Weakly Supervised Information Extraction

Machine learning researcher, Edouard Grave, gives a presentation on the field of information extraction (pulling structured data from unstructured documents). Edouard talks about current challenges in the field and introduces distant supervision for relation extraction.

Distant supervision is a recent paradigm for learning to extract information by using an existing knowledge base instead of label data as a form of supervision. The corresponding problem is an instance of multiple label, multiple instance learning. Edouard shows how to obtain a convex formulation of this problem, inspired by the discriminative clustering framework.

He also presents a method to learn to extract named entities from a seed list of such entities. This problem can be formulated as PU learning (learning from positive and unlabeled examples only) and Edouard describe a convex formulation for this problem.

51:53

This talk was presented at the NYC Machine Learning Meetup at Pivotal Labs.