Pandas Standard Deviation

Calculate standard deviation in pandas

Standard deviation is the amount of variance you have in your data. It is measured in the same units as your data points (dollars, temperature, minutes, etc.). To find standard deviation in pandas, you simply call .std() on your Series or DataFrame

pandas.DataFrame.std()
pandas.Series.std()

I do this most often when I’m working with anomaly detection. I’m trying to find the outliers of a specific dataset. For example: If I’m looking at a time series of temperature readings per day, which days were ‘out of the ordinarily hot’? Looking at standard deviation would help me with this.

Pseudo Code: With your Series or DataFrame, find how much variance, or how spread out, your data points are.

Standard deviation describes how much variance, or how spread out your data is. In the picture below, the chart on the left does not have a wide spread in the Y axis. Meaning the data points are close together. This is called low standard deviation.

The chart on the right has high spread of data in the Y Axis. The data points are spread out. This would mean there is a high standard deviation.

Pandas STD Parameters

The standard deviation function is pretty standard, but you may want to play with a view items.

axis = Do you want to compute the standard deviation across rows? or or columns? Index (rows) = 0, columns = 1
skipna = By default, Pandas will skip the NAs in your dataset. If you set skipna=False, make sure you understand how your NAs are impacting your results.
level = For when you have a multi index. 95% of the time this won’t matter because you’ll be on a single index. If not, then set your level to the level you want to compute the STD for.
Others: For the other lesser-used parameters, see the official documentation.

Now the fun part, let’s take a look at a code sample

Link to code

Official Documentation

Pandas Standard Deviation – pd.Series.std()

Pandas Standard Deviation

Pandas STD Parameters

On this page