Best Practices for Monitoring Distributed In-Memory Systems

Best Practices for Monitoring Distributed In-Memory Systems

When you add a distributed cluster in-between existing systems or new APIs, you introduce a lot of moving parts that can be almost impossible to track and troubleshoot for performance issues or failures. Learn how the veterans monitor various components of a distributed cluster for network, memory, or node-specific issues, and troubleshoot to resolve issues. By the end of this session you'll have a handy check-list and set of tools to consider using for your own deployments. This session will cover:

  • How to monitor applications, cluster node logs and metrics, JVM, operating system, and the network
  • What some of the best tools are for different scenarios, including:
    • Log-based monitoring including Logstash, Elasticsearch, Kibana or Splunk
    • Grafana
    • Application monitoring (throughput and latency, GC)
    • Node’s local metrics monitoring (memory/GC/CPU)
    • Network issues monitoring (checking node connectivity and latency)
    • GridGain Web Console
  • Tips and tricks for how to configure and optimize monitoring



Albert 2-3
Client Service Lead

Slides & Recordings

   Download Slides