The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Rank – Rank Your Data – pd.df.rank()

Rank data in pandas DataFrame and subgroups

Pandas Rank will compute the rank of your data point within a larger dataset. It is extremely useful for filtering the β€˜first’ or 2nd of of a sub dataset. We will look at two methods today:

  1. Rank data within your entire DataFrame
  2. Rank data within subgroups (group by)
1. pd.DataFrame.diff(periods=1)
2. pd.DataFrame.groupby().rank()

Pseudo code: For a given data point, rank its value within the total DataFrame or Series.

Pandas Rank

There are two core concepts you’ll need to grasp with .rank(): Rank order (ascending or not) and method (how to rank data points with the same value).

  • Rank Order: Ascending means you are climbing something, β€œI am ascending stairs.” This means you are going up in number. With ascending = True, Pandas will start at your lowest values and go up, meaning your lowest values will have the lowest rank and highest values will have the highest rank. Usually I do ascending=False so the highest value has a rank=1.
  • Method: There are many ways you can handle data points of the same value. Should you force a distinct rank? or should you have a rank end in .5? Check out the parameters below for a list of how to handle these.

Rank Pro Tip: Group By

Did you know that .rank() can be used as an aggregate function too? This means you can use it within your group by function. Simply call .rank() on top of your group by function and you’ll get the ranks specific to each subgroup in your DataFrame.

Check out the code sample below for a preview of this.

Rank Parameters

  • axis (Default=0): Believe it or not, you can rank either by rows or columns. By default (axis=0) you will be ranking by rows. Change axis=1 to rank your columns. 99% of the time we are ranks rows.
  • method (β€˜average’, β€˜min’, β€˜max’, β€˜first’, β€˜dense’): What should you do with your data points that have the same value? First, think of them as a group, then see which method you want
    • average: Use the average rank of the group and apply to all items
    • min: Take the lowest rank of the group and apply to all items
    • max: Take the highest rank of the group and apply to all items
    • first: Ranks are assigned in order the data point appears in the DataFrame or Series. This is essentially forcing a unique rank on each item.
    • dense: Like min but the rank will increase only +1 between groups. We don’t use this one often.
  • numeric_only (Default=True): Only rank your numeric columns. If false, .rank() will also rank your strings.
  • ascending (Default=True): True if you want the ranks in ascending order, False if you do not.
  • pct (Default=False): You can also normalize your ranks by setting pct=True. This will assign a percent to your ranks and put them all between 0-1.

Let’s take a look at a code sample

Link to code

Official Documentation

On this page