The 6 AI Engineering Patterns, come build with Greg live:Β Starts Jan 6th, 2025
Leverage
Pandas functions

Pandas Find – pd.Series.str.find()

Find substrings in Pandas Series

If you’re looking for information on how to find data or cell within a Pandas DataFrame or Series, check out a future post – Locating Data Within A DataFrame. This post will be around finding substrings within a series of strings.

Often times you may want to know where a substring exists in a bigger string. You could be trying to extract an address, remove a piece of text, or simply wanting to find the first instance of a substring.

Pandas.Series.Str.Find() helps you locate substrings within larger strings. This has the identical functionality as =find() in Excel or Google Sheets.

Example: β€œday” is a substring within β€œMonday.” However, β€œday” is not a substring of β€œNovember,” since β€œday” does not appear in β€œNovember”

Pseudo code: β€œMonday”.find(β€œday”) returns 4. β€œday” starts at the 4th character in β€œMonday”

But first, what is a string and substring?

  • String = Data type within python that represents text
  • Substring = A piece of text within a larger piece of text

To find where a substring exists (if it does at all) within a larger series of strings you need to call pd.Series.str.find()

Pandas Find

Pandas find returns an integer of the location (number of characters from the left) of a substring. It will return -1 if it does not exist

Find has two important arguments that go along with the function. Start & End

  • Start (default = 0): Where you want .find() to start looking for your substring. By default you’ll start at the beginning of the string (location 0).
  • End: Where you want .find() to finish looking for your substring.

Note: You would only use start & end if you didn’t want to search the entire string.

Link to code

Official Documentation

On this page