pandas datetime slicing: junkdf.ix['2015-08-03':'2015-08-06'] not working -
junkdf:
rev dtime 2015-08-03 20.45 2015-08-04 -2.57 2015-08-05 12.53 2015-08-06 -8.16 2015-08-07 -4.41
junkdf.reset_index().to_dict('rec')
[{'dtime': datetime.date(2015, 8, 3), 'rev': 20.45}, {'dtime': datetime.date(2015, 8, 4), 'rev': -2.5699999999999994}, {'dtime': datetime.date(2015, 8, 5), 'rev': 12.53}, {'dtime': datetime.date(2015, 8, 6), 'rev': -8.16}, {'dtime': datetime.date(2015, 8, 7), 'rev': -4.41}] junkdf.set_index('dtime',inplace=true)
why can't datetime slicing described at:
python-pandas-dataframe-slicing-by-date-conditions
junkdf['2015-08-03':]
c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter) 1112 def searchsorted(self, key, side='left', sorter=none): 1113 # needs coercion on key (datetimeindex already) -> 1114 return self.values.searchsorted(key, side=side, sorter=sorter) 1115 1116 _shared_docs['drop_duplicates'] = ( typeerror: unorderable types: datetime.date() > str()
junkdf.ix['2015-08-03':'2015-08-06']
c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter) 1112 def searchsorted(self, key, side='left', sorter=none): 1113 # needs coercion on key (datetimeindex already) -> 1114 return self.values.searchsorted(key, side=side, sorter=sorter) 1115 1116 _shared_docs['drop_duplicates'] = ( typeerror: unorderable types: datetime.date() > str()
start = junkdf.index.searchsorted(dt.datetime(2015, 8, 4))
c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter) 1112 def searchsorted(self, key, side='left', sorter=none): 1113 # needs coercion on key (datetimeindex already) -> 1114 return self.values.searchsorted(key, side=side, sorter=sorter) 1115 1116 _shared_docs['drop_duplicates'] = ( typeerror: can't compare datetime.datetime datetime.date))
however, following works if use dt.date():
start = junkdf.index.searchsorted(dt.date(2015, 8, 4)) end = junkdf.index.searchsorted(dt.date(2015, 8, 6)) junkdf.ix[start:end] rev dtime 2015-08-04 -2.57 2015-08-05 12.53
update:
junkdf = df[['dtime','rev']].groupby((df.dtime).dt.date).sum().copy()
where df[['dtime','rev']]
looks like:
dtime rev 0 2015-08-03 07:59:59 -0.18 1 2015-08-03 08:59:59 -0.11 2 2015-08-03 09:59:59 -0.29 3 2015-08-03 10:59:59 -0.08 4 2015-08-03 11:59:59 0.69
update2:
i tried:
df[['dtime','rev']].head() dtime rev 0 2015-08-03 07:59:59 -0.18 1 2015-08-03 08:59:59 -0.11 2 2015-08-03 09:59:59 -0.29 3 2015-08-03 10:59:59 -0.08 4 2015-08-03 11:59:59 0.69 df[['dtime','rev']].groupby(pd.timegrouper('d', key=df.dtime)).sum() c:\users\blah\anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self) 804 def __hash__(self): 805 raise typeerror('{0!r} objects mutable, cannot be' --> 806 ' hashed'.format(self.__class__.__name__)) 807 808 def __iter__(self): typeerror: 'series' objects mutable, cannot hashed
assuming have following source df (i took previous question , made changes have data multiple days):
in [85]: df out[85]: datetime hour rev 0 2016-05-01 01:00:00 1 -0.02 1 2016-05-01 02:00:00 2 -0.01 2 2016-05-01 03:00:00 3 -0.02 3 2016-05-01 04:00:00 4 -0.02 4 2016-05-01 05:00:00 5 -0.01 5 2016-05-02 06:00:00 6 -0.03 6 2016-05-02 07:00:00 7 -0.10 7 2016-05-02 08:00:00 8 -0.09 8 2016-05-03 09:00:00 9 -0.08 9 2016-05-03 10:00:00 10 -0.10 10 2016-05-03 11:00:00 11 -0.12 11 2016-05-04 12:00:00 12 -0.14 12 2016-05-04 13:00:00 13 -0.17 13 2016-05-04 14:00:00 14 -0.16 14 2016-05-05 15:00:00 15 -0.15 15 2016-05-05 16:00:00 16 -0.15 16 2016-05-05 17:00:00 17 -0.17 17 2016-05-06 18:00:00 18 -0.16 18 2016-05-06 19:00:00 19 -0.18 19 2016-05-06 20:00:00 20 -0.17 20 2016-05-07 21:00:00 21 -0.14 21 2016-05-07 22:00:00 22 -0.16 22 2016-05-08 23:00:00 23 -0.08 23 2016-05-08 00:00:00 24 -0.06
let's group day , calculate sum
:
in [89]: rslt = (df.assign(t=df.datetime - pd.timedelta(hours=1)) ....: .groupby(pd.timegrouper('d', key='t'))['rev'] ....: .sum()) in [90]: rslt out[90]: t 2016-05-01 -0.08 2016-05-02 -0.22 2016-05-03 -0.30 2016-05-04 -0.47 2016-05-05 -0.47 2016-05-06 -0.51 2016-05-07 -0.36 2016-05-08 -0.08 freq: d, name: rev, dtype: float64 in [92]: rslt.index.dtype out[92]: dtype('<m8[ns]')
now slicing should work (because index has datetime
dtype):
in [91]: rslt.ix['2016-05-03':'2016-05-06'] out[91]: t 2016-05-03 -0.30 2016-05-04 -0.47 2016-05-05 -0.47 2016-05-06 -0.51 freq: d, name: rev, dtype: float64
Comments
Post a Comment