Parse out a URL with regex operation in python -
i have data follows,
data
url http://hostname.com/part1/part2/part3/a+b+c+d http://m.hostname.com/part3.html?nk!e+f+g+h&_junk http://hostname.com/as/ck$st=f+g+h+k+i/ http://www.hostname.com/p-l-k?wod=q+w+e+r+t africa
i want check first + symbol in url , move backward until find special character such / or ? or = or other special character , start , go on until find space or end of line or & or /.my output should be,
parsed abcd efgh fghki qwert
my aim find first + in url , go until find special character , go front until find end of line or space or & symbol.
i new regex , still learning , since bit complex, finding difficult write. can me in writing regex in python, parse out these?
thanks
here expression works sample use cases:
>>> import re >>> >>> l = [ ... "http://hostname.com/part1/part2/part3/a+b+c+d", ... "http://m.hostname.com/part3.html?nk!e+f+g+h&_junk", ... "http://hostname.com/as/ck$st=f+g+h+k+i/", ... "http://www.hostname.com/p-l-k?wod=q+w+e+r+t africa" ... ] >>> >>> pattern = re.compile(r"[^\w\+]([\w\+]+\+[\w\+]+)(?:[^\w\+]|$)") >>> item in l: ... print("".join(pattern.search(item).group(1).split("+"))) ... abcd efgh fghki qwert
the idea capture alphanumerics , plus character between non-alphanumerics , non-plus character or end of string. then, split plus , join.
i have feeling can further simplified/improved.
Comments
Post a Comment