Speaker: Alexey Kharlamov, VP of Technology, Integral Ad Science Inc

Integral Ad Science employs stream processing systems to extract value from data. We rely on Apache Storm to collect, aggregate and take decisions in real-time. Such systems frequently try to use external storage for persistent state to store real-time data view and provide failure recovery capabilities.

 

However, in heavily loaded systems disk- and SSD based storages easily become performance bottleneck and complicate software evolution. Given the typical data consistency and performance requirements, external state and reliance on world clock become a taxing and hardly maintainable choice.

In this talk, we will discuss how we handled the challenges when building 1.5M msg/sec global processing system with Apache Storm and Apache Kafka. We will review benefits of volatile in-memory state, inspect technology agnostic patterns reemerging in multiple applications including stream rewind, derived logical time and synchronization, and precision/performance trade offs.