df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})

If you do a GroupBy operation on a specific column of the DataFrame Pandas returns a Series object. Like

df3.groupby(['X'])['Y'].sum()

X

A 4

B 6

Name: Y, dtype: int64

Now if we want to found out which groups had a specific aggregate value - say which groups had a sum == 4, we can do something like:

>>> df3.groupby(['X'])['Y'].sum().eq(4)

X

A True

B False

Name: Y, dtype: bool

Now the question is, how do we get the *index* name where the row equals 4 (in this example we want `A` since it's value is `True` in the Series).

>>> groupings = df3.groupby(['X'])['Y'].sum().eq(4)

>>> groupings.index[groupings == True]

Index([u'A'], dtype='object', name=u'X')

PS. groupings.index[groupings is True] doesn't work even though PEP8 checkers will warn you to switch to it. The groupings object isn't Truthy. The syntax groupings.index[groupings.eq(True)] is an alternative.

## No comments:

## Post a Comment