Pandas Diff β pd.df.diff()
Calculate differences in data using Pandas Diff. Assess changes over specified periods in rows or columns
Pandas Diff will difference your data. This means calculating the change in your row(s)/column(s) over a set number of periods. Or simply, pandas diff will subtract 1 cell value from another cell value within the same index.
Diff is very helpful when calculating rates of change. For example: you have temperature readings per day, calculating the difference will tell you how the temperatures have changed Day-Over-Day.
You can also think of this as taking the derivative (rate of change) of the data. This is also helpful when working with time series data and calculating Week-Over-Week.
There are 1 core concept youβll need to grasp:
- Period = How many observations do you want to difference your data by? Most of the time this will be 1 period diff, but you can select as many as you want.
Pseudo code: For a given DataFrame or Series, find the difference (or rate of change) between rows/columns.
Pandas Diff
Your first row in your resulting diff DataFrame will generally be NaN. This is because there is no other observation to difference it with. If you had periods=2
, then there would be 2 NaNs.
Diff Parameters
- Periods (Default=1): You can select how many periods youβd like to difference by via the
periods
parameter. An easier way to think about this is, βhow many rows would you like to difference from each cell?β In the picture above, our periods=1 so we take the difference from each neighboring cell above. - Axis (Default=0): We usually talk about differencing rows (Axis=0), but pandas also allows you to difference columns (Axis=1).
Letβs take a look at a code sample