Parse out a URL with regex operation in python -

- July 15, 2010

i have data follows,

data

url http://hostname.com/part1/part2/part3/a+b+c+d http://m.hostname.com/part3.html?nk!e+f+g+h&_junk http://hostname.com/as/ck$st=f+g+h+k+i/ http://www.hostname.com/p-l-k?wod=q+w+e+r+t africa

i want check first + symbol in url , move backward until find special character such / or ? or = or other special character , start , go on until find space or end of line or & or /.my output should be,

parsed abcd efgh fghki qwert

my aim find first + in url , go until find special character , go front until find end of line or space or & symbol.

i new regex , still learning , since bit complex, finding difficult write. can me in writing regex in python, parse out these?

thanks

here expression works sample use cases:

>>> import re >>> >>> l = [ ...     "http://hostname.com/part1/part2/part3/a+b+c+d", ...     "http://m.hostname.com/part3.html?nk!e+f+g+h&_junk", ...     "http://hostname.com/as/ck$st=f+g+h+k+i/", ...     "http://www.hostname.com/p-l-k?wod=q+w+e+r+t africa" ... ] >>> >>> pattern = re.compile(r"[^\w\+]([\w\+]+\+[\w\+]+)(?:[^\w\+]|$)") >>> item in l: ...     print("".join(pattern.search(item).group(1).split("+"))) ...  abcd efgh fghki qwert

the idea capture alphanumerics , plus character between non-alphanumerics , non-plus character or end of string. then, split plus , join.

regex101 link.

i have feeling can further simplified/improved.

Search This Blog

celery

Parse out a URL with regex operation in python -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -