The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Scatter Plot – DataFrame.plot.scatter()

Create scatter plots with Pandas

Scatter plots are a beautiful way to display your data. Luckily, Pandas Scatter Plot can be called right on your DataFrame.

Scatter plots traditionally show your data up to 4 dimensions – X-axis, Y-axis, Size, and Color. Of course you can do more (transparency, movement, textures, etc.) but be careful you aren’t overloading your chart.

Pandas DataFrame.plot.scatter() will take your DataFrame and output a scatter plot. The default values will get you started, but there are a ton of customization abilities available.

1. pd.DataFrame.plot.scatter(x=df['your_x_axis'],
                             y=df['your_y_axis'],
                             s=df['your_size_values'],
                             c=df['your_color_values'])

This function is heavily used when displaying large amounts of data.

Pseudo code: For each row in my DataFrame, use the columns specified for each chart attribute and make a scatter plot.

Pandas Scatter Plot

The first question you always want to keep in mind when displaying data – What is the message I’m trying to say?

With this in mind, do not overload your charts. Make sure they are saying exactly what you want and nothing more. No fancy colors if you don’t need them, no exaggerated sizes that don’t provide value.

Picture of the final scatter plot we make below.

Scatter Parameters

Before we get into the scatter plot specific parameters, keep in mind that Pandas charts inherit other parameters from the general Pandas Plot function. These other parameters will deal with general chart formatting vs scatter specific attributes. We recommend viewing these for full chart flexibility. We’ll use some in our example below.

  • x: This where you specify a column name to be your X (horizontal) axis
  • y: This where you specify a column name to be your Y (vertical) axis
  • s: Size – How big do you want your points to be? You can specify
    • Single number (scalar): This will set all of your points to the same size
    • Column name: This will set your sizes per data point according to a value in a column.
    • Array: This will set your data points size alternating between the values in your array. Ex: Passing [3,5] will set every other datapoint 3, then 5.
  • c: Color – You can pass
    • Single color – Either a hex string β€˜#b31d59’ or β€˜red’
    • Array of colors – Setting your data points alternating between array values. Ex: [β€˜green’, β€˜red’, β€˜blue’] means pandas will color your points green, red, blue alternating.
  • **kwargs: There are a huge number of extra parameters you could pass scatter. Check out the general parameters that come with all pandas charts here.

Here’s a Jupyter notebook with a few examples

Link to code

Official Documentation

On this page