Pandas Scatter Plot β DataFrame.plot.scatter()
Create scatter plots with Pandas
Scatter plots are a beautiful way to display your data. Luckily, Pandas Scatter Plot can be called right on your DataFrame.
Scatter plots traditionally show your data up to 4 dimensions β X-axis, Y-axis, Size, and Color. Of course you can do more (transparency, movement, textures, etc.) but be careful you arenβt overloading your chart.
Pandas DataFrame.plot.scatter() will take your DataFrame and output a scatter plot. The default values will get you started, but there are a ton of customization abilities available.
This function is heavily used when displaying large amounts of data.
Pseudo code: For each row in my DataFrame, use the columns specified for each chart attribute and make a scatter plot.
Pandas Scatter Plot
The first question you always want to keep in mind when displaying data β What is the message Iβm trying to say?
With this in mind, do not overload your charts. Make sure they are saying exactly what you want and nothing more. No fancy colors if you donβt need them, no exaggerated sizes that donβt provide value.
Picture of the final scatter plot we make below.
Scatter Parameters
Before we get into the scatter plot specific parameters, keep in mind that Pandas charts inherit other parameters from the general Pandas Plot function. These other parameters will deal with general chart formatting vs scatter specific attributes. We recommend viewing these for full chart flexibility. Weβll use some in our example below.
- x: This where you specify a column name to be your X (horizontal) axis
- y: This where you specify a column name to be your Y (vertical) axis
- s: Size β How big do you want your points to be? You can specify
- Single number (scalar): This will set all of your points to the same size
- Column name: This will set your sizes per data point according to a value in a column.
- Array: This will set your data points size alternating between the values in your array. Ex: Passing [3,5] will set every other datapoint 3, then 5.
- c: Color β You can pass
- Single color β Either a hex string β#b31d59β or βredβ
- Array of colors β Setting your data points alternating between array values. Ex: [βgreenβ, βredβ, βblueβ] means pandas will color your points green, red, blue alternating.
- **kwargs: There are a huge number of extra parameters you could pass scatter. Check out the general parameters that come with all pandas charts here.
Hereβs a Jupyter notebook with a few examples