Implementing User-Defined Data Structures in In-Memory Data Grids

Speaker: Dr. William L. Bain, Founder & CEO, ScaleOut Software, Inc.

In-memory data grids (IMDGs) are widely used as distributed, key-value stores for serialized objects, providing fast data access, location transparency, scalability, and high availability. With its support for built-in data structures, such as hashed sets and lists, Redis has demonstrated the value of enhancing standard create/read/update/delete (CRUD) APIs to provide extended functionality and performance gains. This talk describes new techniques which can be used to generalize this concept and enable the straightforward creation of arbitrary, user-defined data structures both within single objects and sharded across the IMDG.

A key challenge for IMDGs is to minimize network traffic when accessing and updating stored data. Standard CRUD APIs place the burden of implementing data structures on the client and require that full objects move between client and server on every operation. In contrast, implementing data structures within the server streamlines communication since only incremental changes to stored objects or requested subsets of this data need to be transferred. However, building extended data structures within IMDG servers creates several challenges, including, how to extend this mechanism, how to efficiently implement data-parallel operations spanning multiple shards, and how to protect the IMDG from errors in user-defined extensions.

This talk will describe two techniques which enable IMDGs to be extended to implement user-defined data structures. One technique, called single method invocation (SMI), allows users to define a class which implements a user-defined data structure stored as an IMDG object and then remotely execute a set of class methods within the IMDG. This enables IMDG clients to pass parameters to the IMDG and receive a result from method execution.

A second technique, called parallel method invocation (PMI), extends this approach to execute a method in parallel on multiple objects sharded across IMDG servers. PMI also provides an efficient mechanism for combining the results of method execution and returning a single result to the invoking client. In contrast to client-based techniques, this combining mechanism is integrated into the IMDG and completes in O(logN) time, where N is the number of IMDG servers.

The talk will describe how user-defined data structures can be implemented within the IMDG to run in a separate process (e.g., a JVM) to ensure that execution errors do not impair the stability of the IMDG. It will examine the associated performance trade-offs and techniques that can be used to minimize overhead.

Lastly, the talk will describe how popular Redis data structures, such as hashed sets, can be implemented as a user-defined data structure using SMI and then extended using both SMI and PMI to build a scalable hashed set that spans multiple shards. It will also examine other examples of user-defined data structures that can be built using these techniques.

The audience will learn (1) how to extend an IMDG to incorporate user-defined data structures, (2) the trade-offs between an extensible mechanism and the use of built-in data structures, such as in Redis, and (3) examples of using this mechanism in various applications.

Download Slides