python - Dictionary comprehension to calculate statistics across dict of dicts for each key in inner dicts -

- July 15, 2015

i have dictionary this:

property2region2value = {     'countrya':{         'a': 24,         'b': 56,         'c': 78     },     'countryb':{         'a': 3,         'b': 98     },     'countryc':{         'a': 121,         'b': 12121,         'c': 12989121,         'd':16171     },     'countryd':{         'a': 123,         'b': 1312,         'c': 1231     },     'countrye':{         'a': 1011,         'b': 1911     },     'countryf'{         'a': 1433,         'b': 19829,         'c': 1132,         'd':1791     } }

and trying create multiple dictionaries each contains statistics master dictionary (min, max, std etc) values property (a','b','c etc) across countries (e.g. countrya, countryb) etc.

so example: {'a': min of 'a' across countries, 'b': min of 'b' across countries....} might 1 dictionary.

at moment, huge loop of code a) isn't efficient , b) doesn't allow me calculate statistics e.g. using np.min() or np.max() functions in numpy.

how can use dictionary comprehension achieve this? current code calculating min , max:

for country, property2value in property2region2value.items():     property,value in property2value.items():         if property not in property2rangemax:             property2rangemax[property] = 0         if property2rangemax[property]<value:             property2rangemax[property]=value         if property not in property2rangemin:             property2rangemin[property] = 0         if property2rangemin[property]>value:             property2rangemin[property] = value

you should use pandas task:

edit

pandas can accomplish want:

in [3]: pd.dataframe(property2region2value) out[3]:     countrya  countryb  countryc  countryd  countrye  countryf      24.0       3.0       121     123.0    1011.0      1433 b      56.0      98.0     12121    1312.0    1911.0     19829 c      78.0       nan  12989121    1231.0       nan      1132 d       nan       nan     16171       nan       nan      1791  in [4]: df.apply(np.min, axis=1) out[4]:        3.0 b      56.0 c      78.0 d    1791.0 dtype: float64  in [5]: df.apply(np.mean, axis=1) out[5]:     4.525000e+02 b    5.887833e+03 c    3.247890e+06 d    8.981000e+03 dtype: float64  in [6]: mean_dict = df.apply(np.mean, axis=1).to_dict()  in [7]: mean_dict out[7]: {'a': 452.5, 'b': 5887.833333333333, 'c': 3247890.5, 'd': 8981.0}

or, more easily, can transpose dataframe:

in [20]: df.t out[20]:                        b           c        d countrya    24.0     56.0        78.0      nan countryb     3.0     98.0         nan      nan countryc   121.0  12121.0  12989121.0  16171.0 countryd   123.0   1312.0      1231.0      nan countrye  1011.0   1911.0         nan      nan countryf  1433.0  19829.0      1132.0   1791.0  in [21]: df.t.describe() out[21]:                               b             c             d count     6.000000      6.000000  4.000000e+00      2.000000 mean    452.500000   5887.833333  3.247890e+06   8981.000000 std     612.768717   8215.770187  6.494154e+06  10168.195513 min       3.000000     56.000000  7.800000e+01   1791.000000 25%      48.250000    401.500000  8.685000e+02   5386.000000 50%     122.000000   1611.500000  1.181500e+03   8981.000000 75%     789.000000   9568.500000  3.248204e+06  12576.000000 max    1433.000000  19829.000000  1.298912e+07  16171.000000  in [22]: df.t.describe().to_dict() out[22]:  {'a': {'25%': 48.25,   '50%': 122.0,   '75%': 789.0,   'count': 6.0,   'max': 1433.0,   'mean': 452.5,   'min': 3.0,   'std': 612.76871656441472},  'b': {'25%': 401.5,   '50%': 1611.5,   '75%': 9568.5,   'count': 6.0,   'max': 19829.0,   'mean': 5887.833333333333,   'min': 56.0,   'std': 8215.770187065038},  'c': {'25%': 868.5,   '50%': 1181.5,   '75%': 3248203.5,   'count': 4.0,   'max': 12989121.0,   'mean': 3247890.5,   'min': 78.0,   'std': 6494153.687626767},  'd': {'25%': 5386.0,   '50%': 8981.0,   '75%': 12576.0,   'count': 2.0,   'max': 16171.0,   'mean': 8981.0,   'min': 1791.0,   'std': 10168.195513462553}}

and if want finer control, can pick , choose:

in [24]: df.t.describe().loc[['mean','std','min','max'],:] out[24]:                              b             c             d mean   452.500000   5887.833333  3.247890e+06   8981.000000 std    612.768717   8215.770187  6.494154e+06  10168.195513 min      3.000000     56.000000  7.800000e+01   1791.000000 max   1433.000000  19829.000000  1.298912e+07  16171.000000  in [25]: df.t.describe().loc[['mean','std','min','max'],:].to_dict() out[25]:  {'a': {'max': 1433.0,        'mean': 452.5,        'min': 3.0,        'std': 612.76871656441472},  'b': {'max': 19829.0,        'mean': 5887.833333333333,        'min': 56.0,        'std': 8215.770187065038},  'c': {'max': 12989121.0,        'mean': 3247890.5,        'min': 78.0,        'std': 6494153.687626767},  'd': {'max': 16171.0,        'mean': 8981.0,        'min': 1791.0,        'std': 10168.195513462553}}

from original answer

then can achieve whatever want:

in [8]: df.apply(np.min) out[8]:  countrya      24.0 countryb       3.0 countryc     121.0 countryd     123.0 countrye    1011.0 countryf    1132.0 dtype: float64  in [9]: df.apply(np.max) out[9]:  countrya          78.0 countryb          98.0 countryc    12989121.0 countryd        1312.0 countrye        1911.0 countryf       19829.0 dtype: float64  in [10]: df.apply(np.std) out[10]:  countrya    2.217105e+01 countryb    4.750000e+01 countryc    5.620356e+06 countryd    5.424170e+02 countrye    4.500000e+02 countryf    7.960893e+03 dtype: float64

you can bring dictionaries ease:

in [11]: df.apply(np.min).to_dict() out[11]:  {'countrya': 24.0,  'countryb': 3.0,  'countryc': 121.0,  'countryd': 123.0,  'countrye': 1011.0,  'countryf': 1132.0}

go nuts! data-processing needs easier:

in [12]: df.describe() out[12]:          countrya   countryb      countryc     countryd     countrye  \ count   3.000000   2.000000  4.000000e+00     3.000000     2.000000    mean   52.666667  50.500000  3.254384e+06   888.666667  1461.000000    std    27.153882  67.175144  6.489829e+06   664.322462   636.396103    min    24.000000   3.000000  1.210000e+02   123.000000  1011.000000    25%    40.000000  26.750000  9.121000e+03   677.000000  1236.000000    50%    56.000000  50.500000  1.414600e+04  1231.000000  1461.000000    75%    67.000000  74.250000  3.259408e+06  1271.500000  1686.000000    max    78.000000  98.000000  1.298912e+07  1312.000000  1911.000000                countryf   count      4.000000   mean    6046.250000   std     9192.447602   min     1132.000000   25%     1357.750000   50%     1612.000000   75%     6300.500000   max    19829.000000

Search This Blog

celery