most efficient way in c# to parse a large Xml string (to expand DTD references, add new lines etc) -

- May 15, 2015

i have interface provides large xml strings valid xml may not in standard form (say missing prefix default namespace specified) or without line-endings or need expansion of entities in in-lined dtd. need parse these strings standard xml parser can handle in-lined dtd definitions. string data can anywhere few characters giga bytes.

at present using following code (and such simple parsing seems able fix issues mentioned above):

              xdocument doc = xdocument.parse(largexmlstring);                  var settings = new xmlwritersettings();                 settings.indent = true;                 settings.encoding = encoding.unicode;                 //more settings                  stringbuilder parsedoutput = new stringbuilder();                 using (xmlwriter xmlwriter =                                  xmlwriter.create(parsedoutput, settings))                 {                     doc.writeto(xmlwriter);                 }

while easy use, not sure how good/bad compared using other .net xml parsing classes xmlreader/xmltextreader or xmldocument etc?

what best/most efficient way of doing using .net/c# supported classes (possibly without writing lot of new code)?

thanks help

`<?xml version="1.0" encoding="utf-8"?><catalogue    xmlns="http://www.somewhere.org/bookcatalogue" xmlns:cat="http://www.somewhere.org/bookcatalogue" xmlns:html="http://www.somewhere.org/htmlcatalogue" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.somewhere.org/bookcatalogue                         txjsgen14.txt"><cat:magazine><title>natural health</title><author>october</author><date>december, 1999</date><volume>12</volume>.....`

gets converted

`<?xml version="1.0" encoding="utf-8"?> <cat:catalogue xmlns="http://www.somewhere.org/bookcatalogue" xmlns:cat="http://www.somewhere.org/bookcatalogue" xmlns:html="http://www.somewhere.org/htmlcatalogue" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.somewhere.org/bookcatalogue                         txjsgen14.txt">   <cat:magazine>     <cat:title>natural health</cat:title>     <cat:author>october</cat:author>     <cat:date>december, 1999</cat:date>     <cat:volume>12</cat:volume>     <cat:htmltable>.....`

note addition of cat prefix title , other elements based on name space declarations

thank responses.

@ enigmativity sorry confusion created in confusion. actually, need string string conversion first string has not-so-proper xml not formatted, not expanding dtd entities, not having line delimeters , may missing prefixes etc. while second string should have fixed of these things.
if component (say xmlreader) can take first string argument , make canonical/properly formatted/expanded xml , return string need 1 component. in example above, parsing done xdocument , formatting done xmlwriter. , not sure of expansion of entities, parser or xmlwriter. writer.

for time being try use combination of xmreader , xmlwriter, xmlreader reads first string , xmlwriter writes formated 1 (as specified xmlwritersettings used xmlwriter). let me know if there better approach.

you can have in example, xmlreader:

xmlreader xmlreader = ...;  using (xmlwriter xmlwriter = ...) {     xmlwriter.writenode(reader, true); }

this efficient way -- streaming document node node vs. reading entire thing memory before writing out.

Search This Blog

celery

most efficient way in c# to parse a large Xml string (to expand DTD references, add new lines etc) -

Comments

Post a Comment

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -