Running 100 Million Queries Reliably on In-Memory Distributed SQL Engine

Running 100 Million Queries Reliably on In-Memory Distributed SQL Engine

Schedule
June 21, 11:45am
Room
Matterhorn 2
Track

Presto is an open source software which was initially released by Facebook and was widely adapted by enterprise use cases such as advertising and web. Presto is an in-memory fast distributed SQL engine which fits to adhoc/interactive use cases. Our Treasure Data provides data management platform on cloud by using Presto. Since the library is now widely used in critical use cases in our customers, it is important to pursuit the high performance and stability on millions of queries. In addition the data size our Presto cluster is processing become increasing day by day.
The key solutions which will be covered by this session are

  • Multi-resource awareness of node scheduling
  • Distributed RDBMS based job queue called 'PerfectQueue'
  • Partition-based resource scheduler
  • Resource isolation semantics to fit in-memory workload context.

This session introduces the technologies used in real use cases about how to make reliable in-memory distributed cluster which can be tolerate to be provided as PaaS.

Speakers
Kai
Sasaki
Software Engineer
at
Treasure Data Inc
Kai Sasaki

A software engineer working at Treasure Data. Treasure Data provides a big data management platform on cloud. We are original creator of Fluentd, unified log collector which was joined clout native computing foundation (CNCF) . He is working on developing and maintaining distributed processing platform in our service. His professional area is distributed computing using open source software like Hadoop, Spark and Presto. In addition He is a committer of Apache Hivemall which is a scalable machine learning library running on Hive/Spark.