How to Test the Ability of Large-Scale, Distributed Software Systems to Cope with Failures
We conduct functional testing, check performance, write unit tests. However, all these activities may not be enough when it comes to large-scale, heavily loaded distributed systems.
- What will happen to your distributed system in case of network segmentation caused by network problems?
- Will your system respond correctly to the failure of cluster nodes?
- Are you sure that your database does not lose data?
- Have you ever thought about the reliability and security of your system?
In this presentation, I will share my story on how adopting the experience of Amazon, Netflix and Twitter I created a framework to test the ability of the system to cope with failures. You will learn what technologies and approaches can be useful for testing distributed in-memory systems.