Simon Chan Simon Chan on

In this talk, Simon Chan (co-founder of PredictionIO) introduces the latest developments and shows how to use PredictionIO to build and deploy predictive engines in real production environments. PredictionIO is an open source machine learning server built on Apache Spark and MLlib. It is designed for data scientists and developers to build predictive engines for real-world applications in a fraction of the time normally required.

Using PredictionIO’s DASE design pattern, Simon illustrates how developers can develop machine learning applications with the separation of concerns (SoC) in mind.
“D" stands for Data Source and the Data Preparator, which take care of the preparation of data for model training.
“A" stands for Algorithm, which is where the code of one or more algorithms are implemented. MLlib, the machine learning library of Apache Spark, is natively supported here.
“S” stands for Serving, which handles the application logic during the retrieval of predicted results.
Finally, “E” stands for Evaluation.

Simon also covers upcoming development work, including new Engine Templates for various business scenarios.


This video was recorded at the SF Data Mining meetup at in SF.

Unknown author on

Nick Elprin, founder of Domino Data Lab, talks about how to deploy predictive models into production, specifically in the context of a corporate enterprise use case. Nick demonstrates an easy way to “operationalize” your predictive models by exposing them as low-latency web services that can be consumed by production applications. In the context of a real-world use case this translates into more subtle requirements for hosting predictive models, including zero-downtime upgrades and retraining/redeploying against new data. Nick also focuses on the best practices for writing code that will make your predictive models easier to deploy.