SnappyData: Apache Spark meets Embedded In-Memory Database

Schedule
Room
Edward 5-7

I will introduce the in-memory based big data processing platform seamlessly integrated with Apache Spark.

Apache Spark is an excellent distributed computing framework. However, it is necessary to read the data each time it is processed. In addition, it is necessary to write the processing results in some data store. As a result, it takes time to read and write, and there is a problem to use for real-time analytics. 

SnappyData can solve this problem. SnappyData integrates the features of distributed in-memory database into Spark JVM. This makes both distributed computing and data store features available in one cluster. In other words, the data already exists in the distributed in-memory database cluster and you can execute Spark processing on that database.

The advantages of using SnappyData are as follows:

  1. Simple (Because both distributed computing and database features can be used in one cluster)
  2. Fast (No need to access another data store when reading / writing data)
  3. Spark Tuning (Optimized DAG generation, partially using extended Spark SQL workload)
  4. Mutable DataFrame (Can update DataFrame)
  5. Real-time state sharing between Spark wokers and transaction
  6. Unified data access API by table and SQL
  7. Synapsis Data Engine (Return processing results at fast at the expense of accuracy)

SnappyData is an extension of Spark. Therefore, in addition to SnappyData's own features, Spark's various features are also available.

Speakers
Profile picture for user masaki.yamakawa
Masaki
Yamakawa
Managing consultant
at
UL Systems, Inc.
Masaki Yamakawa is an IT consultant at UL Systems, Inc. He specializes in distributed system and in-memory computing, and makes the best use of various technologies to resolve problems related to enterprise IT. About 10 years ago, He was responsible for building a scalable distributed system utilizing the in-memory data grid in a securities company. After that He has been building numerous scalable data platforms as an IT architect of distributed systems. In recent years, He has been doing verification and PoC for various distributed system technologies. He is also currently involved in the launch of "Japan Apache Geode User Group" community.