Pandas functions
Pandas Sample β pd.DataFrame.sample()
Sample random rows or columns in pandas DataFrame
Pandas Sample is used when you need to pull random rows or columns from a DataFrame.
Why would you ever want random rows? Say youβre running a data science model, and you want to test a subset of data. If youβre not using train test split, you can use pd.sample() to pull a small section of rows.
I use Pandas Sample mostly when I want to view a small section of data, but DataFrame.head() shows me data that is too homogeneous. I want some variability!
Pseudo Code: With your DataFrame, return random rows or columns.
Pandas Sample
Sample Parameters
Sample has some of my favorite parameters of any Pandas function. Each one is packed with dense functionality.
- n β The number of samples you want to return. You can optionally specify n or frac (below). βnβ must be less than the number of rows you have in your DataFrame.
- frac β If you did not specify an βnβ (above) then you can specify βfracβ or fraction. As in, what fraction of your dataset do you want to return to you? Ex: βReturn me 10% of my dataframe. Frac=.1β
- replace (Default: False) β Do you want your rows to be able to be randomly picked twice? By default, if pandas randomly selects a row that has already been picked, then it will not pick it again. However, if replace=True, then pandas will pick a row again.
- weights (Optional) β Super awesome parameter! By default, pandas will apply the same weights to all of your rows. Meaning, each row has an equal chance of being randomly picked. But what if you wanted some rows to have a higher chance to be picked than others? You can set a weight per row which will cause pandas to more heavily pick some rows than others. Check out the example for details.
- random_state (Optional) β By default, pandas will pick different random numbers each time you sample. However, what if you wanted to pick the same random numbers each time? By setting random_state to an int, youβll ensure consistency.
- axis (Default: 0 or βindexβ) β Did you know you could also select random columns from your DataFrame? If you wanted to, set axis=1 or βcolumnsβ.
Now the fun part, letβs take a look at a code sample