Pandas Iterate Over Rows β 5 Methods
Iterate over rows in pandas DataFrame
Folks come to me and often say, βI have a Pandas DataFrame and I want to iterate over rows.β My first response is, are you sure? Ok, fine, letβs continue.
Depending on your situation, you have a menu of methods to choose from. Each with their own performance and usability tradeoffs. Here are the methods in recommended order:
- DataFrame.apply()
- DataFrame.iterrows()
- DataFrame.itertuples()
- Concert to DataFrame to Dictionary
- DataFrame.iloc
Pseudo code: Go through each one of my DataFrameβs rows and do something with row data
Warning: Iterating through pandas objects is slow. In many cases, iterating manually over the rows is not needed.
Pandas Iterate Over Rows β Priority Order
DataFrame.apply()
DataFrame.apply() is our first choice for iterating through rows. Apply() applies a function along a specific axis (rows/columns) of a DataFrame. Itβs quick and efficient β .apply() takes advantage of internal optimizations and uses cython iterators.
DataFrame.iterrows()
iterrows() is a generator that iterates over the rows of your DataFrame and returns 1. the index of the row and 2. an object containing the row itself. Think of this function as going through each row, generating a series, and returning it back to you.
Thatβs a lot of compute on the backend you donβt see.
DataFrame.itertuples()
DataFrame.itertuples() is a cousin of .iterrows() but instead of returning a series, .itertuples() will returnβ¦you guessed it, a tuple. In this case, itβll be a named tuple. A named tuple is a data type from pythonβs Collections module that acts like a tuple, but you can look it up by name.
Since you need to utilize Collections for .itertuples(), many people like to stay in pandas and use .iterrows() or .apply()
Convert your DataFrame To A Dictionary
Not the most elegant, but you can convert your DataFrame to a dictionary. Then iterate over your new dictionary. This wonβt give you any special pandas functionality, but itβll get the job done.
This is the reverse direction of Pandas DataFrame From Dict
Dataframe.iloc[]
As a last resort, you could also simply run a for loop and call the row of your DataFrame one by one. This method is not recommended because it is slow.
Youβre holding yourself back by using this method. To to push yourself to learn one of the methods above.
This is the equivalent of having 20 items on your grocery list, going to store, but only limiting yourself 1 item per store visit. Get your walking shoes on.