My notes from the DevOps Handbook

by Gene Kim, Jez Humble, Patrick Debois, John Willis

Centralized telemetry infrastructure (cont)

Once we have centralized logs, we transform them into metrics by counting them in the event router.

By transforming logs into metrics, we can now perform statistical operations on them, such as using anomaly detection to find outliers and variances even earlier in the problem cycle.

We must also collect telemetry from our deployment pipeline when important events occur, such as when our automated tests pass or fail and when we perform deployment to any environment, how long it takes to execute builds and tests.

It must be easy to retrieve information from our telemetry infrastructure, using self service APIs.

Telemetry should tell us exactly when anything of interest happens as well as where and how. Our telemetry should also be suitable for manual and automated analysis and be able to be analyzed without having the application that produced the logs.

Monitoring is so important that it should be more available than the systems that are being monitored.

Creating application logging telemetry that helps production

Dev and Ops create production telemetry as part of their daily work. Creating application telemetry is one of the highest return investments. Every feature should be instrumented.

Every member of our value stream will use telemetry in variety of ways - for example diagnosing production problems. Infosec might also confirm effectiveness of control and managers can track business outcomes, feature usage and conversion rates.

Logging levels:

When choosing if a message should be error or warn, imagine being woken up at 4 a.m. Low printer toner is not an error.

Create logging hierarchical categories, such as non-functional attributes (performance, security) and for attributes related to features (search, rank).