I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem.
XML example
This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT).
<?xml version="1.0" standalone="yes"?> <!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10--> <!--File saved on: 2012-12-12 23:58:13.118 GMT--> <!--XML Schema used: V099--> <cat:catalogue xmlns:cat="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:id="urn:iso:std:iso:ts:29002:-5:ed-1:tech:xml-schema:identifier" xmlns:val="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:value" xmlns:bas="urn:iso:std:iso:ts:29002:-4:ed-1:tech:xml-schema:basic" xsi:type="cat:catalogue_Type" xsi:schemaLocation="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue http://www.paradine.net/schema/dictionary/1.0/ontoML/ISO29002/catalogue.xsd"> <!--SOT1209--> ... </cat:catalogue>XQuery
First, I have to use a namespace declaration because of the cat namespace used in xml. Second, the task becomes trivial if you know comment() XPath expression that matches XML comment nodes.
declare namespace cat = "urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue"; for $file in collection("ontoML/packages") return concat($file/cat:catalogue/comment(), ',', $file/comment()[2]/substring-after(., ': '))Results
This is the CSV-output of the given XQuery:
SOT1209,2012-12-12 23:58:13.118 GMT SOT120A,2012-12-12 23:58:18.665 GMT SOT1210,2012-12-12 23:58:22.517 GMT ...
P.S. If you're dealing with an opposite task of creating XML comments from XQuery, there is an XML comment constructor.
Comments
Post a Comment