Traps in Chaos Engineering
Notes on this post
- Goal is resilience instead of finding risks
- Don’t measure KR by errors discovers - they show only ignorance instead of risk
- Human under pressure IS part of the system to test. Need to optimize how they react
- Focus on things done right rather than pointing out risks. Don’t chase the fixes
- Providing context and let service owner decide what to do with the vulnerability
- Automate the discovery of what is wrong, but not what should be
- Manual gamedays and chaos experiements in non-prod env are the prerequisite of prod testing
- Creating experiement often has more value than running them
- Define “steady state” at first