The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Describe – pd.DataFrame.describe()

Pandas describe for DataFrame and Series analysis

I once had a data teacher told me, β€œYou need to get intimate with your data.” One of the best ways to do this is through pandas describe.

pandas.DataFrame.describe()
pandas.Series.describe()

Pandas Describe does exactly what it sounds like, describe your data. Describe will return a series of descriptive information. This Series will tell you:

  • The count of values
  • The number of unique values
  • The top (most frequent) value
  • The frequency of your top value
  • The mean, standard deviation, min and max values
  • The percentiles of your data: 25%, 50%, 75% by default

Pseudo Code: With your Series or DataFrame, return a Series that tell us what the distribution of values looks like.

Pandas Describe

In order to evaluate a dataset, you need to get a feel for your data. This means you need to get an intuitive sense of how your data is distributed and what spectrum of values you have. This is the first step to launching a successful data analysis.

Often times the process of β€˜getting to know your data’ is called Exploratory Data Analysis (EDA).

Pandas Describe Parameters

The standard deviation function is pretty standard, but you may want to play with a view items.

  • percentiles = By default, pandas will include the 25th, 50th, and 75th percentile. However you can tell pandas whichever ones you want. Simply pass a list to percentiles and pandas will do the rest.
  • include = You may want to β€˜describe’ all of your columns, or you may just want to do the numeric columns. By default, pandas will only describe your numeric columns. Select β€˜all’ to include all columns.
  • exclude = The inverse of include, you can tell pandas which column data types you would like to exclude. Simply pass a list of datatypes you would like to exclude here.
  • datetime_is_numeric: By default pandas will treat your datetimes as objects. Meaning, Pandas will not calculate things like β€˜average time/date’. However, if you select datetime_is_numeric=True then pandas will apply the min, max, and percentiles to your datetimes.

Now the fun part, let’s take a look at a code sample

Link to code

Official Documentation

On this page