Pandas Apply – pd.DataFrame.apply()
Apply function in pandas for DataFrame columns and rows
Pandas Apply is a Swiss Army knife workhorse within the family.
Pandas apply will run a function on your DataFrame Columns, DataFrame rows, or a pandas Series. This is very useful when you want to apply a complicated function or special aggregation across your data.
Here’s an example:
Pseudo code: Iterate through a DataFrame’s columns or rows, and apply a certain function to the data.
Example: Below we show two examples of how apply iterates through a DataFrame. Either column by column, or row by row.
Keep in mind! When apply “receives” a column or a row, it’s actually receiving a series of data, not a list. So when you’re working with your custom functions, make sure you treat your data with it’s index.
One very common use case for .apply() is to use pandas apply lambda. This is when you use a python lambda function to iterate through your data. Python lambda functions are mini little functions that serve a non reusable purpose.
Pandas Apply
Let’s take a look at the different parameters you can pass pd.apply():
- func (required) – This is where most of the magic happens. You’ll pass a function into ‘func’ which will then get applied to your data. You can use a custom function (below) or use an out of the box function.
- Axis (Default 0) – You can set axis to specify whether you want to drop rows, or columns. However, a bit counter intuitive vs other places: Axis = 0 or ‘index’ tells Pandas you want to apply a function to each column. Secondly, axis = 1 or ‘columns’ tells Pandas you want to use a function on each row
- Raw (Default: False) – You’re telling .apply() if you’re passing it a Series (False), or a ndarray (numpy) instead. Sometimes you’ll be only be applying a quick numpy function to a column. In this case, instead of passing a Pandas Series to apply, you can pass just the values (raw=True) which will speed up you code.
- Result_type – You’ll likely not use this parameter too often because pandas does some guessing as to what you want. I’ve honestly never had a use for this yet. That being said, you’ll use result_type when you want to switch between a list (‘reduce’) or Series (‘expand’) returned to you.
Here’s a jupyter notebook example of pandas df apply showing how to apply a function to a column in pandas.