My notes from the DevOps Handbook

by Gene Kim, Jez Humble, Patrick Debois, John Willis

55. Enable and Inject Learning into Daily Work

When we work within a complex system, it is impossible for us to predict all the outcomes for the actions we take.

Organizations must become ever better at self-diagnostics and self-improvement and must be skilled at detecting problems, solving them, and multiplying the effects by making the solutions available throughout the organization.

Responsiveness that is the source of reliability.

Rearchitecting to cloud native -> be resilient enough to survive significant failures.

Require system to be loosely coupled, with each component having aggressive timeouts to ensure that failing components didn't bring the entire system down. Instead, each feature and component should be designed to gracefully degrade.

Engineering teams were used to a constant level of failure in the cloud so that services could automatically recover without any manual intervention.

Human error is a consequence of the design of the tools that we gave them.

Maximize opportunities for organizational learning. Continually reinforce that we value actions that expose and share more widely the problems in our daily work. This is what enables us to improve the quality and safety of the system we operate within and reinforce the relationships between everyone who operates within that system.

Two effective practices that help create a just, learning based culture are blameless post mortems and the controlled introduction of failures into production to create opportunities to practice for the inevitable problems that arise within complex systems.