-->

Wednesday, June 19, 2019

Renaming a Column in Pandas to One That Already Exists Can Break Things

Today I wrestled with an irritating issue where I had a perfectly fine DataFrame, renamed some columns and suddenly the thing was just broken. It turns out that the problem is that in Pandas (v. 0.24.1) when you rename a column to an already existing column it just breaks. Try this example:

import pandas as pd

df = pd.DataFrame({"colA": [1,2], "colB": [3,4]})
df = df.rename(columns={"colA": "colB"})

df.colB.unique()

Instead of printing "[1,2]" as you'd expect, instead it throws an AttributeError: 'DataFrame' object has no attribute 'unique'. Other than that, the dataframe appears to be fine. Calling df.columns show that there are now two columns with identical names. When you try and access that column name Pandas returns both in a DataFrame, rather than a single Series object for the one column. Since a DataFrame object doesn't have the unique() function, that's why we get the error above.