# My notes from the DevOps Handbook

*by Gene Kim, Jez Humble, Patrick Debois, John Willis*

# Non-gaussian distribution of telemetry data

When the distribution of the data set does not have the Gaussian bell curve, the
properties associated with standard deviations do not apply. For example,
consider the scenario in which we are monitoring the number of file downloads.

When we create a histogram that shows the frequency of downloads per minute, we
can see that it does not have the classic, symmetrical bell curve shape. It is
obvious that the distribution is skewed toward the lower end, showing that the
majority of the time we have very few downloads per minute but that download
counts frequently spike three standard deviations higher.

Many production data sets are non-gaussian distribution. Using standard
deviation for this data not only results in over or under alerting but it also
results in nonsensical results.

Over alerting causes ops engineers to be woken up in the middle of the night for
protracted periods of time even when there are few actions that they can take.
The problem with under alerting is just as significant.

*Netflix took advantage of the fact that their consumer viewing patterns were*
*surprisingly consistent and predictable, despite not having Gaussian*
*distribution.*

## Using anomaly detection techniques

Anomaly detection - search for items or events which do not conform to an
expected pattern.

We even have an engineer trained in statistics who writes R code - this engineer
has her own backlog, filled with requests from other teams inside the company
who want to fund variance even earlier, before it causes an even larger variance
that could affect our customers.

Statistical techniques we can use is called smoothing, which is especially
suitable if our data is a time series, meaning each data point has a time stamp.
Other examples of smoothing filters include weighted averages, exponential
smoothing, FFT, Kolmogorov-Smirnov test.

We can expect that a large percentage of telemetry concerning user data will be
periodic and seasonal. This enables us to detect situations that vary from
historical norms, such as when our order transaction rate on Thursday drops to
50% of the weekly norm.

Because of the usefulness of these techniques in forecasting, we may be able to
find people in marketing or BI departments with knowledge and skills necessary
to analyze this data.