Pandas Group By Guide β 3 Methods
Pandas Group By methods for data aggregation
Pandas Group By, the foundation of any data analysis. This is a MUST know function when working with the pandas library. 95% of analysis will require some form of grouping and aggregating data.
This post will focus directly on how to do a group by in Pandas. To learn what is a group by check out our future business analytics post.
Pandas Group By will aggregate your data around distinct values within your βgroup byβ columns. The result will apply a function (an aggregate function) to your data.
When it comes to group by functions, youβll need two things from pandas
- The group by function β The function that tells pandas how you would like to consolidate your data.
- An aggregate function β The function that tells pandas what you would like to do with your consolidated data.
We are going to go over 3 ways to do Group Bys in Pandas. They are listed from quickest to most complete. Pick the one that works best for your situation.
Pseudo code: For a given column, find the distinct groups within that column. Then combine all of the values in those groups from another column and apply a function.
Pandas Group By β 3 Methods
Method 1 β Single Aggregate Function
In method 1 we are doing the most simple type of group by in pandas. This method only has 1 aggregate function. You start by defining the column (or columns) youβd like to group by, then the column youβd like to aggregate, then specify your aggregate function.
This method works great when youβre looking for a quick group by. The result returned will be a Pandas Series.
Method 2 β Multiple Aggregate Functions
Say you have different aggregate functions youβd like to apply to different columns. The way we like to do this is with method 2: using a dictionary within .agg(). Unfortunately with this method you can not specify your new column names.
To do this, youβll need to call the column you want to group by, the column(s) you want to aggregate, and then finally an aggregate function for each column. If you want to see a list of potential aggregate functions, check out the Pandas Series documentation.
Method 3 β Multiple Aggregate Functions with new column names
In method 3 youβll get to specify your new column names. This method is the most explicit and flexible. It takes the longest to write out, but the output is clear so itβs a winner in our book.
Letβs take a look at code examples