Python 3.5: Web-scraping with Stripping html codes -

- August 15, 2015

i scraping web content stuck problem. after series of processing strip scope want, cannot strip html code make plain text in list. have tried using function of replace, re.compile , join (try change list text stripping). doesn't work designed string or pops out errors when running.

could give me hint how that. example, want output following code change from

<p class="course-d-title">instructor</p>

to instructor.

import tkinter tk import re  def test():     bs4 import beautifulsoup     import urllib.request     urllib.parse import urljoin      '''for layer 0'''             url_text = 'http://www.scs.cuhk.edu.hk/en/part-time/accounting-and-finance/accounting-and-finance/fundamental-accounting/162-610441-01'     resp = urllib.request.urlopen(url_text)     soup = beautifulsoup(resp, from_encoding=resp.info().get_param('charset'))     = soup.find_all('p')      k=0     item in a[:]:         if 'instructor' in item:             a=a[k:]             break         k+=1      j=0     item in a[:]:         if 'enquiries' in item:             a=a[:j-1]             break         j+=1      in range(0,a.__len__()):         print (a[i])  if __name__ == '__main__':     test()

use .text extract text bs4 element

>>> = soup.find_all('p') >>> data = [ item item in if 'instructor' in item] [<p class="course-d-title">instructor</p>]  >>> data[0].text 'instructor'

Search This Blog

celery

Python 3.5: Web-scraping with Stripping html codes -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -