Pandas Describe β pd.DataFrame.describe()
Pandas describe for DataFrame and Series analysis
I once had a data teacher told me, βYou need to get intimate with your data.β One of the best ways to do this is through pandas describe.
Pandas Describe does exactly what it sounds like, describe your data. Describe will return a series of descriptive information. This Series will tell you:
- The count of values
- The number of unique values
- The top (most frequent) value
- The frequency of your top value
- The mean, standard deviation, min and max values
- The percentiles of your data: 25%, 50%, 75% by default
Pseudo Code: With your Series or DataFrame, return a Series that tell us what the distribution of values looks like.
Pandas Describe
In order to evaluate a dataset, you need to get a feel for your data. This means you need to get an intuitive sense of how your data is distributed and what spectrum of values you have. This is the first step to launching a successful data analysis.
Often times the process of βgetting to know your dataβ is called Exploratory Data Analysis (EDA).
Pandas Describe Parameters
The standard deviation function is pretty standard, but you may want to play with a view items.
- percentiles = By default, pandas will include the 25th, 50th, and 75th percentile. However you can tell pandas whichever ones you want. Simply pass a list to
percentiles
and pandas will do the rest. - include = You may want to βdescribeβ all of your columns, or you may just want to do the numeric columns. By default, pandas will only describe your numeric columns. Select βallβ to include all columns.
- exclude = The inverse of include, you can tell pandas which column data types you would like to exclude. Simply pass a list of datatypes you would like to exclude here.
- datetime_is_numeric: By default pandas will treat your datetimes as objects. Meaning, Pandas will not calculate things like βaverage time/dateβ. However, if you select
datetime_is_numeric=True
then pandas will apply the min, max, and percentiles to your datetimes.
Now the fun part, letβs take a look at a code sample