most efficient way in c# to parse a large Xml string (to expand DTD references, add new lines etc) -
i have interface provides large xml strings valid xml may not in standard form (say missing prefix default namespace specified) or without line-endings or need expansion of entities in in-lined dtd. need parse these strings standard xml parser can handle in-lined dtd definitions. string data can anywhere few characters giga bytes.
at present using following code (and such simple parsing seems able fix issues mentioned above):
xdocument doc = xdocument.parse(largexmlstring); var settings = new xmlwritersettings(); settings.indent = true; settings.encoding = encoding.unicode; //more settings stringbuilder parsedoutput = new stringbuilder(); using (xmlwriter xmlwriter = xmlwriter.create(parsedoutput, settings)) { doc.writeto(xmlwriter); }
while easy use, not sure how good/bad compared using other .net xml parsing classes xmlreader/xmltextreader or xmldocument etc?
what best/most efficient way of doing using .net/c# supported classes (possibly without writing lot of new code)?
thanks help
`<?xml version="1.0" encoding="utf-8"?><catalogue xmlns="http://www.somewhere.org/bookcatalogue" xmlns:cat="http://www.somewhere.org/bookcatalogue" xmlns:html="http://www.somewhere.org/htmlcatalogue" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.somewhere.org/bookcatalogue txjsgen14.txt"><cat:magazine><title>natural health</title><author>october</author><date>december, 1999</date><volume>12</volume>.....`
gets converted
`<?xml version="1.0" encoding="utf-8"?> <cat:catalogue xmlns="http://www.somewhere.org/bookcatalogue" xmlns:cat="http://www.somewhere.org/bookcatalogue" xmlns:html="http://www.somewhere.org/htmlcatalogue" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.somewhere.org/bookcatalogue txjsgen14.txt"> <cat:magazine> <cat:title>natural health</cat:title> <cat:author>october</cat:author> <cat:date>december, 1999</cat:date> <cat:volume>12</cat:volume> <cat:htmltable>.....`
note addition of cat prefix title , other elements based on name space declarations
thank responses.
@ enigmativity sorry confusion created in confusion. actually, need string string conversion first string has not-so-proper xml not formatted, not expanding dtd entities, not having line delimeters , may missing prefixes etc. while second string should have fixed of these things.
if component (say xmlreader) can take first string argument , make canonical/properly formatted/expanded xml , return string need 1 component. in example above, parsing done xdocument , formatting done xmlwriter. , not sure of expansion of entities, parser or xmlwriter. writer.
for time being try use combination of xmreader , xmlwriter, xmlreader reads first string , xmlwriter writes formated 1 (as specified xmlwritersettings used xmlwriter). let me know if there better approach.
you can have in example, xmlreader
:
xmlreader xmlreader = ...; using (xmlwriter xmlwriter = ...) { xmlwriter.writenode(reader, true); }
this efficient way -- streaming document node node vs. reading entire thing memory before writing out.
Comments
Post a Comment