The 6 AI Engineering Patterns, come build with Greg live: Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Sort By Column – pd.DataFrame.sort_values()

Sort datasets in Pandas by multiple columns and customize sorting parameters

One of the beautiful thinks about Pandas is the ability to sort datasets.

Through sorting, you’re able to see your relevant data at the top (or bottom) of your table. There isn’t a ton you need to know out of the box. The magic starts to happen when you sort multiple columns and use sort keys.

YourDataFrame.sort_values('your_column_to_sort')

Pseudo code: Take a DataFrame column (or columns) and sort by Ascending or Descending order.

Reasons to sort a dataset include:

  • Seeing top ranked items in order
  • Sorting strings in alphabetical order

Pandas Sort

Let’s take a look at the different parameters you can pass pd.DataFrame.sort_values():

  • by – Single name, or list of names, that you want to sort by. This can either be column names, or index names. Pass a list of names when you want to sort by multiple columns.
  • axis (Default: ‘index’ or 0) – This is the axis to be sorted. You need to tell Pandas, do you want to sort the rows (axis=’index’ or 0)? Or do you want to sort the columns (axis=’columns’ or 1)?
  • ascending (Default: True) – You can pass a single boolean (True or False) or a list of booleans ([True, False]) if you’re sorting by multiple columns. See the diagram below for the difference between Ascending vs Descending.
  • inplace (Default: False) – If true, then your new sort order will write over your current DataFrame. It will change it in place. If false, then your DataFrame will be returned to you.
    • I usually do inplace=True if I’m working with the DataFrame later in my code. I’ll do inplace=False if I’m just viewing the sort order visually.
  • kind (default ‘quicksort’) – Your choice of which sorting algorithm you’d like to use. This won’t matter much unless you’re dealing with huge datasets. Even then you’d have to know the differences and tradeoffs with using each one.
  • na_position (Default: ‘last’) – You can tell pandas where you would like to put your NAs (if you have them). At the top (‘first’), or at the bottom (‘last’).
  • ignore_index (Default: False) – If false, then your index values will move with the sorting. This is useful when you want to see how the rows have moved around. However, if you want your index to remain in order, then set ignore_index=True and it’ll remain 0, 1, 2, 3, …, n-1.
  • key – Super awesome parameter! With key you can pass a function that, based on your column or row, will return a derived value that will be the key which is sorted on. Check out below for an example
    • Example: Say you wanted to sort by the absolute value of a column. You could create a derived column with absolute values and sort that, but that feels cumbersome. Instead, sort your column with a key function that grabs the absolute values.

Pandas Ascending vs Descending

One key decision you’ll need to make…do you want your values sorted highest to lowest (descending) or lowest to highest (ascending)?

  • Ascending = Values that are lowest will appear first or on top
  • Descending = Values that are highest will appear first or on top

Here’s a Jupyter notebook showing the different ways to sort pandas DataFrame.

Link to code

Official Documentation

On this page