The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Selecting Data – Pandas loc \& iloc\[] – The Guide

Select data in DataFrame using Pandas loc and iloc

When it comes to selecting data on your DataFrame, Pandas loc and iloc are two top favorites. They are quick, fast, and easy to read when reviewing code later. Lets see how to view dataframe loc vs iloc.

Pandas loc will select data based off of the label of your index (row/column labels) whereas Pandas iloc will select data based off of the position of your index (position 1, 2, 3, etc.)

Pandas loc/iloc is best used when you want a range of data. To select/set a single cell, check out Pandas .at().

Let’s break down index label vs position:

  1. Index Labels (df.loc[]) – This is the label or what your row/column actually says. Usually your index row labels will be the same as their position because your row labels are you row numbers. Your column labels will be the name of your columns.
  2. Index Positions (df.iloc[])– On the other hand, your index position will be an integer representing where your index sits in relation to other indexes. This of this as row numbers and column numbers. Remember that python starts it’s index at 0 (vs 1 like humans).
1. pd.DataFrame.loc['row_label']
2. pd.DataFrame.loc['row_label', 'column_label']
3. pd.DataFrame.iloc[row_position]
4. pd.DataFrame.iloc[row_position, column_position]

Pseudo code: For a given DataFrame, return a subset of rows/columns based off of their label (loc) or position (iloc)

Selecting Data Via Pandas loc & iloc[]

3 Methods To Select Data Via loc & iloc

Method 1 – Via Scalar

You can select a single row, single column, or single value via scalar (single) values.

pd.DataFrame.loc['row_label', 'column_label']
pd.DataFrame.loc[:, 'column_label'] # To select single column

pd.DataFrame.iloc[row_position, col_position]
pd.DataFrame.iloc[:, col_position] # To select single column

If you want to select multiple columns you’ll need to use a list or slice.

Method 2 – Multiple Rows & Columns Via List

You can select a multiple items by passing a list of labels or index positions. Remember, use index values. This comes in handy when you are select rows or rows and columns from your data frame.

pd.DataFrame.loc[['row_label1', 'row_label2'],
                 ['column_label1', 'column_label2']]

pd.DataFrame.iloc[[row_pos1, row_pos2],
                  [col_pos1, col_pos2]]

Method 3 – Multiple Items Via Slicing

The last way to do data selection is via slices. You can either slice with labels or index positions. Think of it as β€œselect all rows/columns between item 1 and item 2.”

pd.DataFrame.loc[['row_label1' : 'row_label2'],
                 ['column_label1' : 'column_label2']]

pd.DataFrame.iloc[[row_pos1 : row_pos2],
                  [col_pos1 : col_pos2]]

All samples above should give valid output for your pd dataframe. Let’s take a look at a code sample

Link to code

Official Documentation

On this page