Python Pandas Groupby Resetting Values Based on Index -
so have dataframe contains wrong information want fix:
import pandas pd tuples_index = [(1,1990), (2,1999), (2,2002), (3,1992), (3,1994), (3,1996)] index = pd.multiindex.from_tuples(tuples_index, names=['id', 'firstyear']) df = pd.dataframe([2007, 2006, 2006, 2000, 2000, 2000], index=index, columns=['lastyear'] ) df out[4]: lastyear id firstyear 1 1990 2007 2 1999 2006 2002 2006 3 1992 2000 1994 2000 1996 2000
id refers business, , dataframe small example slice of larger 1 shows how business moves. each record unique location, , want capture first , last year there. current 'lastyear' accurate businesses 1 record, , accurate latest record of businesses more 1 record. df should @ end this:
lastyear id firstyear 1 1990 2007 2 1999 2002 2002 2006 3 1992 1994 1994 1996 1996 2000
and did there super clunky:
multirecord = df.groupby(level=0).filter(lambda x: len(x) > 1) multirecord_grouped = multirecord.groupby(level=0) ls = [] _, group in multirecord_grouped: levels = group.index.get_level_values(level=1).tolist() + [group['lastyear'].iloc[-1]] ls += levels[1:] multirecord['lastyear'] = pd.series(ls, index=multirecord.index.copy()) final_joined = pd.concat([df.groupby(level=0).filter(lambda x: len(x) == 1),multirecord]).sort_index()
is there better way?
shift_year = lambda df: df.index.get_level_values('firstyear').to_series().shift(-1) df.groupby(level=0).apply(shift_year) \ .combine_first(df.lastyear).astype(int) \ .rename('lastyear').to_frame()
Comments
Post a Comment