pandas datetime slicing: junkdf.ix['2015-08-03':'2015-08-06'] not working -


junkdf:

            rev dtime    2015-08-03  20.45 2015-08-04  -2.57 2015-08-05  12.53 2015-08-06  -8.16 2015-08-07  -4.41 

junkdf.reset_index().to_dict('rec')

[{'dtime': datetime.date(2015, 8, 3), 'rev': 20.45},  {'dtime': datetime.date(2015, 8, 4), 'rev': -2.5699999999999994},  {'dtime': datetime.date(2015, 8, 5), 'rev': 12.53},  {'dtime': datetime.date(2015, 8, 6), 'rev': -8.16},  {'dtime': datetime.date(2015, 8, 7), 'rev': -4.41}]  junkdf.set_index('dtime',inplace=true) 

why can't datetime slicing described at:

python-pandas-dataframe-slicing-by-date-conditions

time series datetime slicing

junkdf['2015-08-03':]

c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)    1112     def searchsorted(self, key, side='left', sorter=none):    1113         # needs coercion on key (datetimeindex already) -> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)    1115     1116     _shared_docs['drop_duplicates'] = (  typeerror: unorderable types: datetime.date() > str() 

junkdf.ix['2015-08-03':'2015-08-06']

c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)    1112     def searchsorted(self, key, side='left', sorter=none):    1113         # needs coercion on key (datetimeindex already) -> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)    1115     1116     _shared_docs['drop_duplicates'] = (  typeerror: unorderable types: datetime.date() > str() 

start = junkdf.index.searchsorted(dt.datetime(2015, 8, 4))

c:\users\blah\anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)    1112     def searchsorted(self, key, side='left', sorter=none):    1113         # needs coercion on key (datetimeindex already) -> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)    1115     1116     _shared_docs['drop_duplicates'] = (  typeerror: can't compare datetime.datetime datetime.date)) 

however, following works if use dt.date():

start = junkdf.index.searchsorted(dt.date(2015, 8, 4)) end = junkdf.index.searchsorted(dt.date(2015, 8, 6))  junkdf.ix[start:end]                  rev     dtime        2015-08-04  -2.57     2015-08-05  12.53 

update:

junkdf = df[['dtime','rev']].groupby((df.dtime).dt.date).sum().copy() 

where df[['dtime','rev']] looks like:

dtime   rev 0   2015-08-03 07:59:59 -0.18 1   2015-08-03 08:59:59 -0.11 2   2015-08-03 09:59:59 -0.29 3   2015-08-03 10:59:59 -0.08 4   2015-08-03 11:59:59 0.69 

update2:

i tried:

df[['dtime','rev']].head()  dtime   rev 0   2015-08-03 07:59:59 -0.18 1   2015-08-03 08:59:59 -0.11 2   2015-08-03 09:59:59 -0.29 3   2015-08-03 10:59:59 -0.08 4   2015-08-03 11:59:59 0.69  df[['dtime','rev']].groupby(pd.timegrouper('d', key=df.dtime)).sum()  c:\users\blah\anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)     804     def __hash__(self):     805         raise typeerror('{0!r} objects mutable, cannot be' --> 806                         ' hashed'.format(self.__class__.__name__))     807      808     def __iter__(self):  typeerror: 'series' objects mutable, cannot hashed 

assuming have following source df (i took previous question , made changes have data multiple days):

in [85]: df out[85]:               datetime  hour   rev 0  2016-05-01 01:00:00     1 -0.02 1  2016-05-01 02:00:00     2 -0.01 2  2016-05-01 03:00:00     3 -0.02 3  2016-05-01 04:00:00     4 -0.02 4  2016-05-01 05:00:00     5 -0.01 5  2016-05-02 06:00:00     6 -0.03 6  2016-05-02 07:00:00     7 -0.10 7  2016-05-02 08:00:00     8 -0.09 8  2016-05-03 09:00:00     9 -0.08 9  2016-05-03 10:00:00    10 -0.10 10 2016-05-03 11:00:00    11 -0.12 11 2016-05-04 12:00:00    12 -0.14 12 2016-05-04 13:00:00    13 -0.17 13 2016-05-04 14:00:00    14 -0.16 14 2016-05-05 15:00:00    15 -0.15 15 2016-05-05 16:00:00    16 -0.15 16 2016-05-05 17:00:00    17 -0.17 17 2016-05-06 18:00:00    18 -0.16 18 2016-05-06 19:00:00    19 -0.18 19 2016-05-06 20:00:00    20 -0.17 20 2016-05-07 21:00:00    21 -0.14 21 2016-05-07 22:00:00    22 -0.16 22 2016-05-08 23:00:00    23 -0.08 23 2016-05-08 00:00:00    24 -0.06 

let's group day , calculate sum:

in [89]: rslt = (df.assign(t=df.datetime - pd.timedelta(hours=1))    ....:           .groupby(pd.timegrouper('d', key='t'))['rev']    ....:           .sum())  in [90]: rslt out[90]: t 2016-05-01   -0.08 2016-05-02   -0.22 2016-05-03   -0.30 2016-05-04   -0.47 2016-05-05   -0.47 2016-05-06   -0.51 2016-05-07   -0.36 2016-05-08   -0.08 freq: d, name: rev, dtype: float64  in [92]: rslt.index.dtype out[92]: dtype('<m8[ns]') 

now slicing should work (because index has datetime dtype):

in [91]: rslt.ix['2016-05-03':'2016-05-06'] out[91]: t 2016-05-03   -0.30 2016-05-04   -0.47 2016-05-05   -0.47 2016-05-06   -0.51 freq: d, name: rev, dtype: float64 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -