Pandas functions
Pandas Set Index – pd.DataFrame.set\_index()
Set index in pandas DataFrame
Sometimes your row numbers as your index just doesn’t cut it and you need to pandas set index on your DataFrame.
You may want to replace your current index with another column in your DataFrame. No problem with pd.DataFrame.set_index()!
You’re usually doing this when you want to set your index to a list of names, or unique ids. For example, you imported a CSV but forgot to set your index_col.
Pseudo code: Take a DataFrame column (or columns) set them as your DataFrame index.
The questions you’ll have to ask yourself are:
- Do you want to drop your column you’re setting as the index?
- Do you want the transformation to happen in place? or have your DataFrame returned to you?
Pandas Set Index
Let’s take a look at the different parameters you can pass pd.DataFrame.set_index():
- keys: What you want to be the new index. This is either 1) the name of the DataFrame’s column or 2) A Pandas Series, Index, or NumPy Array of the same length as your DataFrame.
- drop (Default: True): If set true, this will delete the column you’re referencing as the new index (unless you’re passing a series)
- append (Default: False): Append=True will attach the column you’re referencing to your current index. Therefore creating a multi-index. My guess is most people will never use this.
- inplace (Default: False): If set true, the operation will happen on the DataFrame inplace and will not return anything. If false, you’ll get your DataFrame with the new index returned to you.
- verify_integrity (Default: False): This will check your new index for duplicates. You won’t need to touch this unless you’re getting into advanced performance territory.
Here’s a Jupyter notebook showing how to set index in Pandas