The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Value Counts – pd.Series.value_counts()

Pandas value counts to analyze column value distribution

Often when you’re doing exploratory data analysis (EDA), you’ll need to get a better feel for a column. One of the best ways to do this is to understand the distribution of values with you column. This is where Pandas Value Counts comes in.

Pandas Series.value_counts() function returns a Series containing the counts (number) of unique values in your Series. By default the resulting series will be in descending order so that the first element is the most frequent element.

1. YourDataFrame['your_column'].value_counts()
2. YourSeries.value_counts()

I usually do this when I want to get a bit more intimate with my date. My workflow goes:

  • Run pandas.Series.nunique() first – This will count how many unique values I have. If it’s +100K it’ll slow down my computer once I call value_counts
  • Run pandas.Series.value_counts() – This will tell me which values appear most frequently

Pseudo code: Take a DataFrame column (or Series) and find the distinct values. Then count how many times each distinct value occurs.

Hint: You can also do this across unique rows in a DataFrame by calling pandas.DataFrame.value_counts()

Pandas Value Counts

By default, you don’t need to input any parameters when counting the values. Let’s take a look at the different parameters you can pass pd.Series.value_counts():

  • normalize (Default: False): If true, then you’ll return the relative frequencies of unique values. This means that instead of returning counts, you Series returned will be the percent each unique value makes up of the whole series.
  • sort (Default: True): This will return your values in the frequency order. The exact order is determined by the next parameter (ascending)
  • ascending (Default: False): If true, ascending will return your values in ascending order (lowest ones on top). By default your highest values appear first.
  • bins: Sometimes you’re working with a continuous variable (think a range of numbers vs discrete labels). In this case you’ll have too many unique values to pull signal from your data. If you set bins (Ex: [0, .25, .5, .75, 1], you’ll assign your values a bin based off of where they fall. value_counts will then count the bin frequency vs distinct value frequency. Check out the video or code below for more.
  • dropna (Default: True): This will either count (False) or not count (True) your NaNs in your Series.

Here’s a Jupyter notebook showing how to set index in Pandas

Link to code

Official Documentation

On this page