Python Pandas Groupby Resetting Values Based on Index -


so have dataframe contains wrong information want fix:

import pandas pd tuples_index = [(1,1990), (2,1999), (2,2002), (3,1992), (3,1994), (3,1996)] index = pd.multiindex.from_tuples(tuples_index, names=['id', 'firstyear']) df = pd.dataframe([2007, 2006, 2006, 2000, 2000, 2000], index=index, columns=['lastyear'] )   df out[4]:                lastyear id firstyear           1  1990           2007 2  1999           2006    2002           2006 3  1992           2000    1994           2000    1996           2000 

id refers business, , dataframe small example slice of larger 1 shows how business moves. each record unique location, , want capture first , last year there. current 'lastyear' accurate businesses 1 record, , accurate latest record of businesses more 1 record. df should @ end this:

              lastyear id firstyear           1  1990           2007 2  1999           2002    2002           2006 3  1992           1994    1994           1996    1996           2000 

and did there super clunky:

multirecord = df.groupby(level=0).filter(lambda x: len(x) > 1) multirecord_grouped = multirecord.groupby(level=0)  ls = [] _, group in multirecord_grouped:     levels = group.index.get_level_values(level=1).tolist() + [group['lastyear'].iloc[-1]]     ls += levels[1:]  multirecord['lastyear'] = pd.series(ls, index=multirecord.index.copy()) final_joined = pd.concat([df.groupby(level=0).filter(lambda x: len(x) == 1),multirecord]).sort_index() 

is there better way?

shift_year = lambda df: df.index.get_level_values('firstyear').to_series().shift(-1) df.groupby(level=0).apply(shift_year) \     .combine_first(df.lastyear).astype(int) \     .rename('lastyear').to_frame() 

enter image description here


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -