The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Histogram – DataFrame.hist()

Pandas histogram plotting

Histograms are the backbone to understanding distribution within your series of data. Pandas Histogram provides an easy way to plot a chart right from your data.

Histogram plots traditionally only need one dimension of data. It is meant to show the count of values or buckets of values within your series.

Pandas DataFrame.hist() will take your DataFrame and output a histogram plot that shows the distribution of values within your series. The default values will get you started, but there are a ton of customization abilities available.

There are multiple ways to make a histogram plot in pandas. We are going to mainly focus on the first

1. pd.DataFrame.hist(column='your_data_column')
2. pd.DataFrame.plot(kind='hist')
3. pd.DataFrame.plot.hist()

This function is heavily used when displaying large amounts of data. Pandas will show you one histogram per column that you pass to .hist()

Pseudo code: For each column in my DataFrame, draw a histogram showing the distribution of data points.

Pandas Histogram

The default .histogram() function will take care of most of your needs. However, the real magic starts to happen when you customize the parameters. Specifically the bins parameter.

Bins are the buckets that your histogram will be grouped by. On the back end, Pandas will group your data into bins, or buckets. Then pandas will count how many values fell into that bucket, and plot the result.

Another way to describe bins, how many bars do you want in your histogram chart? A lot or a little?

Histogram Parameters

Before we get into the histogram specific parameters, keep in mind that Pandas charts inherit other parameters from the general Pandas Plot function. These other parameters will deal with general chart formatting vs scatter specific attributes. We recommend viewing these for full chart flexibility. We’ll use some in our example below.

  • column: This is the specific column(s) that you want to call histogram on. By default, pandas will create a chart for every series you have in your dataset.
  • by: This parameter will split your data into different groups and make a chart for each of them. Check out the example below where we split on another column.
  • bins (Either a scalar or a list): The number of bars you’d like to have in your chart. Or another way, the number of buckets you would like to group your data into. If you pass a list instead of a scale, Pandas will make bins with edges of your list values.
  • formatting parameters: There are a bunch of other formatting parameters that will help you customize the look of your chart. I encourage your to check them out on the official pandas hist page.

Let’s look at a fun example

Link to code

Official Documentation

On this page