Wednesday, April 10, 2019

Get index of Pandas Series row when column matches certain value

Say you have a Pandas DataFrame that looks like:

df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})

If you do a GroupBy operation on a specific column of the DataFrame Pandas returns a Series object. Like

A    4
B    6
Name: Y, dtype: int64

Now if we want to found out which groups had a specific aggregate value - say which groups had a sum == 4, we can do something like:

>>> df3.groupby(['X'])['Y'].sum().eq(4)
A     True
B    False
Name: Y, dtype: bool

Now the question is, how do we get the *index* name where the row equals 4 (in this example we want `A` since it's value is `True` in the Series).

>>> groupings = df3.groupby(['X'])['Y'].sum().eq(4)
>>> groupings.index[groupings == True]
Index([u'A'], dtype='object', name=u'X')

PS. groupings.index[groupings is True] doesn't work even though PEP8 checkers will warn you to switch to it. The groupings object isn't Truthy. The syntax groupings.index[groupings.eq(True)] is an alternative.

Tuesday, April 9, 2019

Python Doesn't Require Commas in Lists .. Sorta

Today I helped a colleague with a subtle Python bug. We have a system that queries for data given a list of IDs. The list of IDs looked like this:

ids = [
    '7d38c515-d543-4186-a6a6-e46d4e356a81' # location 1
    'f384fc68-3030-473f-95a8-52d5fee6cfd4' # location 2
    'b27fef7f-9e5d-4af5-8596-a6949dd257a5' # location 3

It look us an unfortunate amount of time to realize we were missing commas in that list. Python blissfully will concatenate string elements inside of a list for you.

bad_list = ['a' 'b' 'c']
bad_list[0] == 'abc'

This is because ,
"a""b" == "ab"
I'm failing to come up with a helpful example of where this behavior is useful though.