Decision Making with Mllib, Spark and Spark Streaming

Speaker: Girish Kathalagiri, Staff Engineer, Samsung SDS Research America

Online decision making over time needs interacting with an ever changing environment. And underlying machine learning models need to change and adapt to this changing environment. We discuss class of algorithms and provide details of how the computation is parallelized using the Spark framework. Our implementation follows the architectural style of the Lambda Architecture—a batch layer to process bulk data and create models, a speed layer to process incremental data and create updates to models, and a serving layer to respond to decision requests in near real time. The batch layer is implemented as a Spark application, the speed layer is a Spark Streaming application, and the serving layer is implemented using the Play Framework. Spark’s MlLib and low-level API are used for training and creating models in both the batch and speed layers.

Video version

Slideshow version

Download Slides