Notes on this post

  • Goal is resilience instead of finding risks
  • Don’t measure KR by errors discovers - they show only ignorance instead of risk
  • Human under pressure IS part of the system to test. Need to optimize how they react
  • Focus on things done right rather than pointing out risks. Don’t chase the fixes
  • Providing context and let service owner decide what to do with the vulnerability
  • Automate the discovery of what is wrong, but not what should be
  • Manual gamedays and chaos experiements in non-prod env are the prerequisite of prod testing
  • Creating experiement often has more value than running them
  • Define “steady state” at first