My notes from the DevOps Handbook

by Gene Kim, Jez Humble, Patrick Debois, John Willis

Non-gaussian distribution of telemetry data

When the distribution of the data set does not have the Gaussian bell curve, the properties associated with standard deviations do not apply. For example, consider the scenario in which we are monitoring the number of file downloads.

When we create a histogram that shows the frequency of downloads per minute, we can see that it does not have the classic, symmetrical bell curve shape. It is obvious that the distribution is skewed toward the lower end, showing that the majority of the time we have very few downloads per minute but that download counts frequently spike three standard deviations higher.

Many production data sets are non-gaussian distribution. Using standard deviation for this data not only results in over or under alerting but it also results in nonsensical results.

Over alerting causes ops engineers to be woken up in the middle of the night for protracted periods of time even when there are few actions that they can take. The problem with under alerting is just as significant.

Netflix took advantage of the fact that their consumer viewing patterns were surprisingly consistent and predictable, despite not having Gaussian distribution.

Using anomaly detection techniques

Anomaly detection - search for items or events which do not conform to an expected pattern.

We even have an engineer trained in statistics who writes R code - this engineer has her own backlog, filled with requests from other teams inside the company who want to fund variance even earlier, before it causes an even larger variance that could affect our customers.

Statistical techniques we can use is called smoothing, which is especially suitable if our data is a time series, meaning each data point has a time stamp. Other examples of smoothing filters include weighted averages, exponential smoothing, FFT, Kolmogorov-Smirnov test.

We can expect that a large percentage of telemetry concerning user data will be periodic and seasonal. This enables us to detect situations that vary from historical norms, such as when our order transaction rate on Thursday drops to 50% of the weekly norm.

Because of the usefulness of these techniques in forecasting, we may be able to find people in marketing or BI departments with knowledge and skills necessary to analyze this data.