The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Sample – pd.DataFrame.sample()

Sample random rows or columns in pandas DataFrame

Pandas Sample is used when you need to pull random rows or columns from a DataFrame.

Why would you ever want random rows? Say you’re running a data science model, and you want to test a subset of data. If you’re not using train test split, you can use pd.sample() to pull a small section of rows.

I use Pandas Sample mostly when I want to view a small section of data, but DataFrame.head() shows me data that is too homogeneous. I want some variability!

pd.df.sample(n=number_of_samples, axis=rows_or_columns)

Pseudo Code: With your DataFrame, return random rows or columns.

Pandas Sample

Sample Parameters

Sample has some of my favorite parameters of any Pandas function. Each one is packed with dense functionality.

  • n – The number of samples you want to return. You can optionally specify n or frac (below). β€˜n’ must be less than the number of rows you have in your DataFrame.
  • frac – If you did not specify an β€˜n’ (above) then you can specify β€˜frac’ or fraction. As in, what fraction of your dataset do you want to return to you? Ex: β€œReturn me 10% of my dataframe. Frac=.1”
  • replace (Default: False) – Do you want your rows to be able to be randomly picked twice? By default, if pandas randomly selects a row that has already been picked, then it will not pick it again. However, if replace=True, then pandas will pick a row again.
  • weights (Optional) – Super awesome parameter! By default, pandas will apply the same weights to all of your rows. Meaning, each row has an equal chance of being randomly picked. But what if you wanted some rows to have a higher chance to be picked than others? You can set a weight per row which will cause pandas to more heavily pick some rows than others. Check out the example for details.
  • random_state (Optional) – By default, pandas will pick different random numbers each time you sample. However, what if you wanted to pick the same random numbers each time? By setting random_state to an int, you’ll ensure consistency.
  • axis (Default: 0 or β€˜index’) – Did you know you could also select random columns from your DataFrame? If you wanted to, set axis=1 or β€˜columns’.

Now the fun part, let’s take a look at a code sample

Link to code

Official Documentation

On this page