The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Group By Guide – 3 Methods

Pandas Group By methods for data aggregation

Pandas Group By, the foundation of any data analysis. This is a MUST know function when working with the pandas library. 95% of analysis will require some form of grouping and aggregating data.

This post will focus directly on how to do a group by in Pandas. To learn what is a group by check out our future business analytics post.

Pandas Group By will aggregate your data around distinct values within your β€˜group by’ columns. The result will apply a function (an aggregate function) to your data.

When it comes to group by functions, you’ll need two things from pandas

  1. The group by function – The function that tells pandas how you would like to consolidate your data.
  2. An aggregate function – The function that tells pandas what you would like to do with your consolidated data.

We are going to go over 3 ways to do Group Bys in Pandas. They are listed from quickest to most complete. Pick the one that works best for your situation.

1. pd.DataFrame.groupby('column_to_group_by') \
						['col_to_agg'] \
                  		.your_aggregate_function()
2. pd.DataFrame.groupby('column_to_group_by').agg({
	'col_to_agg1' : aggregate_function1,
  	'col_to_agg2' : aggregate_function2,
	})
3. pd.DataFrame.groupby('column_to_group_by').agg(
	new_column_name1=pd.NamedAgg(column='col_to_agg1', aggfunc=aggfunc1),
 	new_column_name2=pd.NamedAgg(column='col_to_agg2', aggfunc=aggfunc2)
	)

Pseudo code: For a given column, find the distinct groups within that column. Then combine all of the values in those groups from another column and apply a function.

Pandas Group By – 3 Methods

Method 1 – Single Aggregate Function

1. pd.DataFrame.groupby('column_to_group_by') \
						['col_to_agg'] \
                  		.your_aggregate_function()

In method 1 we are doing the most simple type of group by in pandas. This method only has 1 aggregate function. You start by defining the column (or columns) you’d like to group by, then the column you’d like to aggregate, then specify your aggregate function.

This method works great when you’re looking for a quick group by. The result returned will be a Pandas Series.

Method 2 – Multiple Aggregate Functions

2. pd.DataFrame.groupby('column_to_group_by').agg({
	'col_to_agg1' : aggregate_function1,
  	'col_to_agg2' : aggregate_function2,
	})

Say you have different aggregate functions you’d like to apply to different columns. The way we like to do this is with method 2: using a dictionary within .agg(). Unfortunately with this method you can not specify your new column names.

To do this, you’ll need to call the column you want to group by, the column(s) you want to aggregate, and then finally an aggregate function for each column. If you want to see a list of potential aggregate functions, check out the Pandas Series documentation.

Method 3 – Multiple Aggregate Functions with new column names

3. pd.DataFrame.groupby('column_to_group_by').agg(
	new_column_name1=pd.NamedAgg(column='col_to_agg1', aggfunc=aggfunc1),
 	new_column_name2=pd.NamedAgg(column='col_to_agg2', aggfunc=aggfunc2)
	)

In method 3 you’ll get to specify your new column names. This method is the most explicit and flexible. It takes the longest to write out, but the output is clear so it’s a winner in our book.

Let’s take a look at code examples

Link to code

Official Documentation

On this page