Python url encode/decode - Convert % escaped hexadecimal digits into string -


for example, if have encoded string as:

url='locality=norwood&address=138+the+parade&region=sa&country=au&name=pav%c3%a9+cafe&postalcode=5067' 

the name parameter has characters %c3%a9 implies character é.

hence, output be:

new_url='locality=norwood&address=138+the+parade&region=sa&country=au&name=pavé+cafe&postalcode=5067' 

i tried following steps on python terminal:

>>> import urllib2 >>> url='locality=norwood&address=138+the+parade&region=sa&country=au&name=pav%c3%a9+cafe&postalcode=5067' >>> new_url=urllib2.unquote(url).decode('utf8') >>> print new_url locality=norwood&address=138+the+parade&region=sa&country=au&name=pavé+cafe&postalcode=5067 >>> 

however, when tried same thing within python script , run myscript.py, getting following stack trace:

unicodedecodeerror: 'ascii' codec can't decode byte 0xc3 in position 88: ordinal not in range(128) 

i using python 2.6.6 , cannot switch other versions due work reasons.

how can overcome error?

any appreciated. in advance!

###################################################### 

edit

i realized getting above expected output.

however, convert parameters in new_url dictionary follows. while doing so, not able retain special character 'é' in name parameter.

print new_url params_list = new_url.split("&") print(params_list) params_dict={} p in params_list:    temp = p.split("=")    params_dict[temp[0]] = temp[1] print(params_dict) 

outputs:

new_url

locality=norwood&address=138+the+parade&region=sa&country=au&name=pavé+cafe&postalcode=5067

params_list

[u'locality=norwood', u'address=138+the+parade', u'region=sa', u'country=au', u'name=pav\xe9+cafe', u'postalcode=5067']

params_dict

{u'name': u'pav\xe9+cafe', u'locality': u'norwood', u'country': u'au', u'region': u'sa', u'address': u'138+the+parade', u'postalcode': u'5067'}

basically ... name 'pav\xe9+cafe' opposed required 'pavé'.

how can still retain same special character in params_dict?

this due difference between __repr__ , __str__. when printing unicode string, __str__ used , results in é see when printing new_url. however, when list or dict printed, __repr__ used, uses __repr__ each object within lists , dicts. if print items separately, print desire.

# -*- coding: utf-8 -*- new_url = u'name=pavé+cafe&postalcode=5067' print(new_url)  # name=pavé+cafe&postalcode=5067  params_list = [s s in new_url.split("&")] print(params_list)  # [u'name=pav\xe9+cafe', u'postalcode=5067'] print(params_list[0])  # name=pavé+cafe print(params_list[1])  # postalcode=5067  params_dict = {} p in params_list:     temp = p.split("=")     params_dict[temp[0]] = temp[1] print(params_dict)  # {u'postalcode': u'5067', u'name': u'pav\xe9+cafe'} print(params_dict.values()[0])  # 5067 print(params_dict.values()[1])  # pavé+cafe 

one way print list , dict string representation, decode them withunicode-escape:

print(str(params_list).decode('unicode-escape'))  # [u'name=pavé+cafe', u'postalcode=5067'] print(str(params_dict).decode('unicode-escape'))  # {u'postalcode': u'5067', u'name': u'pavé+cafe'} 

note: issue in python 2. python 3 prints characters expect. also, may want urlparse parsing url instead of doing manually.

import urlparse new_url = u'name=pavé+cafe&postalcode=5067' print dict(urlparse.parse_qsl(new_url))  # {u'postalcode': u'5067', u'name': u'pav\xe9 cafe'} 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -