python - Parse hierarchical XML tags -

- August 15, 2012

need parse hierarchical tags xml , tag's value in desired output

input

<doc> <pid id="231">     <label key="">electronics</label>         <desc/>         <cid id="122">         <label key="">tv</label>         </cid>         <desc/>         <cid id="123">         <label key="">computers</label>         <cid id="12433">             <label key="">lenovo</label>             </cid>             <desc/>             <cid id="12434">             <label key="">ibm</label>             <desc/>             </cid>             <cid id="12435">             <label key="">mac</label>             </cid>             <desc/>     </cid> </pid> <pid id="7764">     <label key="">music</label>     <desc/>         <cid id="1224">         <label key="">play</label>         <desc/>             <cid id="341">             <label key="">pqr</label>             </cid>             <desc/>         </cid>         <cid id="221">         <label key="">itunes</label>             <cid id="341">             <label key="">xyz</label>             </cid>             <desc/>             <cid id="515">             <label key="">abc</label>             </cid>             <desc/>         </cid> </pid> </doc>

output

electornics/ electornics/tv electornics/computers/lenovo electornics/computers/ibm electornics/computers/mac music/ music/play/pqr music/itunes/xyz music/itunes/abc

what have tried (in python)

import xml.etree.elementtree et import os import sys import string  def perf_func(elem, func, level=0):     func(elem,level)     child in elem.getchildren():         perf_func(child, func, level+1)  def print_level(elem,level):     print '-'*level+elem.tag  root = et.parse('products.xml') perf_func(root.getroot(), print_level)  # added find logic root = tree.getroot()  n in root.findall('doc')   l = n.find('label').text   print l

with above code, able nodes , levels (just tag not value) . , 1st level of labels. need suggestion (perl/python) on how proceed hirerachical structure in format mentioned in output.

we going use 3 pieces: find of elements in order in occur, depth of each one, build bread crumb based on depth , order.

from lxml import etree xml = etree.fromstring(xml_str) elems = xml.xpath(r'//label')  #xpath expression find '<label ...> elements  # counts number of parents root element def get_depth(element):     depth = 0     parent = element.getparent()     while parent not none:         depth += 1         parent = parent.getparent()     return depth  # build bread crumbs tracking depth # when new element entered, replaces value in list # @ level , drops values right def reduce_by_depth(element_list):     crumbs = []     depth = 0     elem_crumb = ['']*10     elem in element_list:         depth = get_depth(elem)         elem_crumb[depth] = elem.text         elem_crumb[depth+1:] = ['']*(10-depth-1)         # join non-empty string breadcrumb         crumbs.append('/'.join([e e in elem_crumb if e]))     return crumbs  reduce_by_depth(elems)  # output: ['electronics',  'electronics/tv',  'electronics/computers',  'electronics/computers/lenovo',  'electronics/computers/ibm',  'electronics/computers/mac',  'music',  'music/play',  'music/play/pqr',  'music/itunes',  'music/itunes/xyz',  'music/itunes/abc']

Search This Blog

celery

python - Parse hierarchical XML tags -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -