Pandas Find β pd.Series.str.find()
Find substrings in Pandas Series
If youβre looking for information on how to find data or cell within a Pandas DataFrame or Series, check out a future post β Locating Data Within A DataFrame. This post will be around finding substrings within a series of strings.
Often times you may want to know where a substring exists in a bigger string. You could be trying to extract an address, remove a piece of text, or simply wanting to find the first instance of a substring.
Pandas.Series.Str.Find() helps you locate substrings within larger strings. This has the identical functionality as =find() in Excel or Google Sheets.
Example: βdayβ is a substring within βMonday.β However, βdayβ is not a substring of βNovember,β since βdayβ does not appear in βNovemberβ
Pseudo code: βMondayβ.find(βdayβ) returns 4. βdayβ starts at the 4th character in βMondayβ
But first, what is a string and substring?
- String = Data type within python that represents text
- Substring = A piece of text within a larger piece of text
To find where a substring exists (if it does at all) within a larger series of strings you need to call pd.Series.str.find()
Pandas Find
Pandas find returns an integer of the location (number of characters from the left) of a substring. It will return -1 if it does not exist
Find has two important arguments that go along with the function. Start & End
- Start (default = 0): Where you want .find() to start looking for your substring. By default youβll start at the beginning of the string (location 0).
- End: Where you want .find() to finish looking for your substring.
Note: You would only use start & end if you didnβt want to search the entire string.