Non functional requirements: availability, scalability, capacity, security.
Many of those are fulfilled by correct configuration of our environments, so write tests that validate that our environments have been build and configured properly.
Enforce consistency and correctness of: * Supporting applications, databases, libraries * Language interpreters, compilers * Operating systems * All dependencies
When we use infrastructure as code configuration tools, we can use the same testing frameworks that we use our code to also test that our environments are configured and operating correctly.
Also run tools that analyze the IaC files in terms of static analysis.
Whenever someone introduces a change that causes our build or tests to fail, no new work is allowed to enter the system until the problem is fixed. If someone needs help to resolve the problem, they can brin in whatever help they need.
When our deployment pipeline is broken, we notify the entire team of failure, so anyone can fix the problem or rollback. We can ever configure VCS to prevent further commits.
Our job is not just to write code but run a service.
To increase visibility of automated tests failures, we should create highly visible indicators so that the entire team can see when the build fails.
This step is more challenging - it requires changing human behavior and incentives.
The consequences of not pulling the cord immediately:
Water-Scrum-Fall anti-pattern - organization claims to use Agile practices but in reality all testing is performed at the end of the project.