Distributed Data Structures in R for General, Large-Scale Computing

In this presentation, Dr. Michael Kane (Associate Research Scientist, Yale University) introduces a scalable, distributed software design that facilitates communications patterns beyond those supported by existing MapReduce frameworks making it appropriate for a more general class of computing challenges.


Unlike existing framework (e.g. MPI) this design is elastic and supports operations for mixed sparse and dense data representations, including numerical computing. This generality is achieved through a generative communication scheme that provides transparent data movement in a peer-to-peer fashion. The design integrates advances in both "cloud" computing and distributed numerical computing and can be used to implement scalable computing solutions.

This talk was given at the Data Science: Industry Applications and Theory meetup group hosted by WeWork in NYC.