df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})
If you do a GroupBy operation on a specific column of the DataFrame Pandas returns a Series object. Like
df3.groupby(['X'])['Y'].sum()
X
A 4
B 6
Name: Y, dtype: int64
Now if we want to found out which groups had a specific aggregate value - say which groups had a sum == 4, we can do something like:
>>> df3.groupby(['X'])['Y'].sum().eq(4)
X
A True
B False
Name: Y, dtype: bool
Now the question is, how do we get the *index* name where the row equals 4 (in this example we want `A` since it's value is `True` in the Series).
>>> groupings = df3.groupby(['X'])['Y'].sum().eq(4)
>>> groupings.index[groupings == True]
Index([u'A'], dtype='object', name=u'X')
PS. groupings.index[groupings is True] doesn't work even though PEP8 checkers will warn you to switch to it. The groupings object isn't Truthy. The syntax groupings.index[groupings.eq(True)] is an alternative.
No comments:
Post a Comment