Hyper-parameter Tuning and Distributed Stacking with Apache Ignite ML

The distributed machine learning algorithms are presented in one greatest library widely known as Spark MLlib, but the current implementation of ML algorithms in Spark has several disadvantages associated with the transition from standard Spark SQL types to ML-specific types, a lack of support for ensembles methods like stacking, boosting and bagging, a low level of algorithms' adaptation to distributed computing, a relatively slow speed of adding new algorithms to the current library and only basic support for hyper-parameter tuning including Grid Search and Cross-Validation methods.

Currently, Apache Ignite ML implements such new features like Evolutionary Strategy (genetic algorithm) and Random Search to find the best hyper-parameters, online learning for all algorithms (not only for KMeans and LinReg unlike Apache Spark) and the main new feature in Ignite ML - model ensembles! Easy to use, easy to combine a few low-level trainers as Logistic Regression or Decision Tree together to predict better.

During this session, I will show a few examples how to find the best model with hyper-parameter tuning 10x times faster then using brute-force GridSearch strategy via new techniques presented in Apache Ignite 2.8 and 2.9

Speakers

ALEXEY

ZINOVYEV

ML Engineer

JetBrains

Just as Charon from the Greek myths, Alexey helps people to get from one side to the other, the sides being Java and Big Data in his case. Or, in more simple words, he is a Java/BigData trainer. He works with Hadoop/Spark and other Big Data projects since 2012, forks such projects and sends pull requests since 2014, presents talks since 2015. His favorite areas are text data and large graphs. Also, Alexey is a contributor of Ignite ML, he wrote manually SVM, KNN, Logistic Regression, a lot of preprocessing staff and author of the official Ignite ML tutorial.

Track:

Machine Learning

Schedule:

Thu, 10/29/2020 - 08:30

(Pacific Time Zone)