The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Diff – pd.df.diff()

Calculate differences in data using Pandas Diff. Assess changes over specified periods in rows or columns

Pandas Diff will difference your data. This means calculating the change in your row(s)/column(s) over a set number of periods. Or simply, pandas diff will subtract 1 cell value from another cell value within the same index.

Diff is very helpful when calculating rates of change. For example: you have temperature readings per day, calculating the difference will tell you how the temperatures have changed Day-Over-Day.

You can also think of this as taking the derivative (rate of change) of the data. This is also helpful when working with time series data and calculating Week-Over-Week.

There are 1 core concept you’ll need to grasp:

  1. Period = How many observations do you want to difference your data by? Most of the time this will be 1 period diff, but you can select as many as you want.
1. pd.DataFrame.diff(periods=1)

Pseudo code: For a given DataFrame or Series, find the difference (or rate of change) between rows/columns.

Pandas Diff

Your first row in your resulting diff DataFrame will generally be NaN. This is because there is no other observation to difference it with. If you had periods=2, then there would be 2 NaNs.

Diff Parameters

  • Periods (Default=1): You can select how many periods you’d like to difference by via the periods parameter. An easier way to think about this is, β€˜how many rows would you like to difference from each cell?’ In the picture above, our periods=1 so we take the difference from each neighboring cell above.
  • Axis (Default=0): We usually talk about differencing rows (Axis=0), but pandas also allows you to difference columns (Axis=1).

Let’s take a look at a code sample

Link to code

Official Documentation

On this page