Pandas Rank β Rank Your Data β pd.df.rank()
Rank data in pandas DataFrame and subgroups
Pandas Rank will compute the rank of your data point within a larger dataset. It is extremely useful for filtering the βfirstβ or 2nd of of a sub dataset. We will look at two methods today:
- Rank data within your entire DataFrame
- Rank data within subgroups (group by)
Pseudo code: For a given data point, rank its value within the total DataFrame or Series.
Pandas Rank
There are two core concepts youβll need to grasp with .rank()
: Rank order (ascending or not) and method (how to rank data points with the same value).
- Rank Order: Ascending means you are climbing something, βI am ascending stairs.β This means you are going up in number. With
ascending = True
, Pandas will start at your lowest values and go up, meaning your lowest values will have the lowest rank and highest values will have the highest rank. Usually I doascending=False
so the highest value has arank=1
. - Method: There are many ways you can handle data points of the same value. Should you force a distinct rank? or should you have a rank end in .5? Check out the parameters below for a list of how to handle these.
Rank Pro Tip: Group By
Did you know that .rank()
can be used as an aggregate function too? This means you can use it within your group by function. Simply call .rank()
on top of your group by function and youβll get the ranks specific to each subgroup in your DataFrame.
Check out the code sample below for a preview of this.
Rank Parameters
- axis (Default=0): Believe it or not, you can rank either by rows or columns. By default (
axis=0
) you will be ranking by rows. Changeaxis=1
to rank your columns. 99% of the time we are ranks rows. - method (βaverageβ, βminβ, βmaxβ, βfirstβ, βdenseβ): What should you do with your data points that have the same value? First, think of them as a group, then see which method you want
average
: Use the average rank of the group and apply to all itemsmin
: Take the lowest rank of the group and apply to all itemsmax
: Take the highest rank of the group and apply to all itemsfirst
: Ranks are assigned in order the data point appears in the DataFrame or Series. This is essentially forcing a unique rank on each item.dense
: Likemin
but the rank will increase only +1 between groups. We donβt use this one often.
- numeric_only (Default=True): Only rank your numeric columns. If false,
.rank()
will also rank your strings. - ascending (Default=True):
True
if you want the ranks in ascending order,False
if you do not. - pct (Default=False): You can also normalize your ranks by setting
pct=True
. This will assign a percent to your ranks and put them all between 0-1.
Letβs take a look at a code sample