Python url encode/decode - Convert % escaped hexadecimal digits into string -
for example, if have encoded string as:
url='locality=norwood&address=138+the+parade®ion=sa&country=au&name=pav%c3%a9+cafe&postalcode=5067'
the name parameter has characters %c3%a9 implies character é.
hence, output be:
new_url='locality=norwood&address=138+the+parade®ion=sa&country=au&name=pavé+cafe&postalcode=5067'
i tried following steps on python terminal:
>>> import urllib2 >>> url='locality=norwood&address=138+the+parade®ion=sa&country=au&name=pav%c3%a9+cafe&postalcode=5067' >>> new_url=urllib2.unquote(url).decode('utf8') >>> print new_url locality=norwood&address=138+the+parade®ion=sa&country=au&name=pavé+cafe&postalcode=5067 >>>
however, when tried same thing within python script , run myscript.py, getting following stack trace:
unicodedecodeerror: 'ascii' codec can't decode byte 0xc3 in position 88: ordinal not in range(128)
i using python 2.6.6 , cannot switch other versions due work reasons.
how can overcome error?
any appreciated. in advance!
######################################################
edit
i realized getting above expected output.
however, convert parameters in new_url dictionary follows. while doing so, not able retain special character 'é' in name parameter.
print new_url params_list = new_url.split("&") print(params_list) params_dict={} p in params_list: temp = p.split("=") params_dict[temp[0]] = temp[1] print(params_dict)
outputs:
new_url
locality=norwood&address=138+the+parade®ion=sa&country=au&name=pavé+cafe&postalcode=5067
params_list
[u'locality=norwood', u'address=138+the+parade', u'region=sa', u'country=au', u'name=pav\xe9+cafe', u'postalcode=5067']
params_dict
{u'name': u'pav\xe9+cafe', u'locality': u'norwood', u'country': u'au', u'region': u'sa', u'address': u'138+the+parade', u'postalcode': u'5067'}
basically ... name 'pav\xe9+cafe' opposed required 'pavé'.
how can still retain same special character in params_dict?
this due difference between __repr__
, __str__
. when printing unicode string, __str__
used , results in é
see when printing new_url
. however, when list or dict printed, __repr__
used, uses __repr__
each object within lists , dicts. if print items separately, print desire.
# -*- coding: utf-8 -*- new_url = u'name=pavé+cafe&postalcode=5067' print(new_url) # name=pavé+cafe&postalcode=5067 params_list = [s s in new_url.split("&")] print(params_list) # [u'name=pav\xe9+cafe', u'postalcode=5067'] print(params_list[0]) # name=pavé+cafe print(params_list[1]) # postalcode=5067 params_dict = {} p in params_list: temp = p.split("=") params_dict[temp[0]] = temp[1] print(params_dict) # {u'postalcode': u'5067', u'name': u'pav\xe9+cafe'} print(params_dict.values()[0]) # 5067 print(params_dict.values()[1]) # pavé+cafe
one way print list , dict string representation, decode them withunicode-escape
:
print(str(params_list).decode('unicode-escape')) # [u'name=pavé+cafe', u'postalcode=5067'] print(str(params_dict).decode('unicode-escape')) # {u'postalcode': u'5067', u'name': u'pavé+cafe'}
note: issue in python 2. python 3 prints characters expect. also, may want urlparse
parsing url instead of doing manually.
import urlparse new_url = u'name=pavé+cafe&postalcode=5067' print dict(urlparse.parse_qsl(new_url)) # {u'postalcode': u'5067', u'name': u'pav\xe9 cafe'}
Comments
Post a Comment