How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures

We conduct functional testing, check performance, write unit tests. However, all these activities may not be enough when it comes to large-scale, heavily loaded distributed systems.

What will happen to your distributed system in case of network segmentation caused by network problems?
Will your system respond correctly to the failure of cluster nodes?
Are you sure that your database does not lose data?
Have you ever thought about the reliability and security of your system?

In this presentation, I will share my story on how adopting the experience of Amazon, Netflix and Twitter I created a framework to test the ability of the system to cope with failures. You will learn what technologies and approaches can be useful for testing distributed in-memory systems.

Schedule:

Tue, 06/04/2019 - 14:40

Room:

Albert 2-3

Tracks:

Tales from the Trenches

Speakers

Pavel

Lipsky

Principal Software Engineer

at

Dell Technologies

Pavel is a principal software engineer at Dell Technologies. He is an expert in performance optimization and testing the ability of distributed software systems to cope with failures. Pavel spent the last decade building high-load software systems for companies around the world. Before joining Dell Technologies, he worked for Rambler, Opera Software and Sberbank. Pavel actively contributes to open source communities.

Slides & Recordings

Download Slides

How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures

How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures

Slides & Recordings

Win a £20 Ticket Voucher

Stay

Updated!

Follow us @imcsummit