The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Query (Filter Data) – df.query()

Filter data in Pandas using DataFrame.query

Pandas Query, the way to filter your data you haven’t heard of. Well I guess you have because you’re here.

Pandas DataFrame.query() will filter the rows of your DataFrame with a True/False (boolean) expression. This is super helpful when filtering your data.

pandas.DataFrame.query('your_query_expression')

Pseudo Code: Evaluate the expression give, return only the true rows.

Pandas Query

.query() is simple, but the magic lies in how creative you get with your expression. Check out a few examples below.

  • expr – The string query that pandas will evaluate. Awesomely, you can also use variables within your string by starting them with β€˜@’. Ex: β€˜@myvariable’
  • inplace (Default: False) – Whether or not you want the DataFrame to be modified directly, or returned to you.
  • **kwargs – You can go crazy by adding a ton of other parameters from pandas.eval(). I won’t list them out here, explore for yourself!

Why isn’t this function more popular? I’m guessing it is because Pandas education revolves around handing Series and doing series manipulation. Filtering the your dataframe the traditional way (by putting a boolean series in your df) forces students to manually munge their data.

Here is a traditional filter for reference

filter_1 = df['column_1'] > df['column_2']
df[filter_1] # Your filtered data

Now a code sample for .query()

Link to code

Official Documentation

On this page