Pandas Standard Deviation β pd.Series.std()
Calculate standard deviation in pandas
Standard deviation is the amount of variance you have in your data. It is measured in the same units as your data points (dollars, temperature, minutes, etc.). To find standard deviation in pandas, you simply call .std() on your Series or DataFrame
I do this most often when Iβm working with anomaly detection. Iβm trying to find the outliers of a specific dataset. For example: If Iβm looking at a time series of temperature readings per day, which days were βout of the ordinarily hotβ? Looking at standard deviation would help me with this.
Pseudo Code: With your Series or DataFrame, find how much variance, or how spread out, your data points are.
Pandas Standard Deviation
Standard deviation describes how much variance, or how spread out your data is. In the picture below, the chart on the left does not have a wide spread in the Y axis. Meaning the data points are close together. This is called low standard deviation.
The chart on the right has high spread of data in the Y Axis. The data points are spread out. This would mean there is a high standard deviation.
Pandas STD Parameters
The standard deviation function is pretty standard, but you may want to play with a view items.
- axis = Do you want to compute the standard deviation across rows? or or columns? Index (rows) = 0, columns = 1
- skipna = By default, Pandas will skip the NAs in your dataset. If you set skipna=False, make sure you understand how your NAs are impacting your results.
- level = For when you have a multi index. 95% of the time this wonβt matter because youβll be on a single index. If not, then set your level to the level you want to compute the STD for.
- Others: For the other lesser-used parameters, see the official documentation.
Now the fun part, letβs take a look at a code sample