python - Parse hierarchical XML tags -
need parse hierarchical tags xml , tag's value in desired output
input
<doc> <pid id="231"> <label key="">electronics</label> <desc/> <cid id="122"> <label key="">tv</label> </cid> <desc/> <cid id="123"> <label key="">computers</label> <cid id="12433"> <label key="">lenovo</label> </cid> <desc/> <cid id="12434"> <label key="">ibm</label> <desc/> </cid> <cid id="12435"> <label key="">mac</label> </cid> <desc/> </cid> </pid> <pid id="7764"> <label key="">music</label> <desc/> <cid id="1224"> <label key="">play</label> <desc/> <cid id="341"> <label key="">pqr</label> </cid> <desc/> </cid> <cid id="221"> <label key="">itunes</label> <cid id="341"> <label key="">xyz</label> </cid> <desc/> <cid id="515"> <label key="">abc</label> </cid> <desc/> </cid> </pid> </doc>
output
electornics/ electornics/tv electornics/computers/lenovo electornics/computers/ibm electornics/computers/mac music/ music/play/pqr music/itunes/xyz music/itunes/abc
what have tried (in python)
import xml.etree.elementtree et import os import sys import string def perf_func(elem, func, level=0): func(elem,level) child in elem.getchildren(): perf_func(child, func, level+1) def print_level(elem,level): print '-'*level+elem.tag root = et.parse('products.xml') perf_func(root.getroot(), print_level) # added find logic root = tree.getroot() n in root.findall('doc') l = n.find('label').text print l
with above code, able nodes , levels (just tag not value) . , 1st level of labels. need suggestion (perl/python) on how proceed hirerachical structure in format mentioned in output.
we going use 3 pieces: find of elements in order in occur, depth of each one, build bread crumb based on depth , order.
from lxml import etree xml = etree.fromstring(xml_str) elems = xml.xpath(r'//label') #xpath expression find '<label ...> elements # counts number of parents root element def get_depth(element): depth = 0 parent = element.getparent() while parent not none: depth += 1 parent = parent.getparent() return depth # build bread crumbs tracking depth # when new element entered, replaces value in list # @ level , drops values right def reduce_by_depth(element_list): crumbs = [] depth = 0 elem_crumb = ['']*10 elem in element_list: depth = get_depth(elem) elem_crumb[depth] = elem.text elem_crumb[depth+1:] = ['']*(10-depth-1) # join non-empty string breadcrumb crumbs.append('/'.join([e e in elem_crumb if e])) return crumbs reduce_by_depth(elems) # output: ['electronics', 'electronics/tv', 'electronics/computers', 'electronics/computers/lenovo', 'electronics/computers/ibm', 'electronics/computers/mac', 'music', 'music/play', 'music/play/pqr', 'music/itunes', 'music/itunes/xyz', 'music/itunes/abc']
Comments
Post a Comment