Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem.

XML example

This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT).

<?xml version="1.0" standalone="yes"?>

<!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10-->

<!--File saved on: 2012-12-12 23:58:13.118 GMT-->

<!--XML Schema used: V099-->
<cat:catalogue xmlns:cat="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:id="urn:iso:std:iso:ts:29002:-5:ed-1:tech:xml-schema:identifier" xmlns:val="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:value" xmlns:bas="urn:iso:std:iso:ts:29002:-4:ed-1:tech:xml-schema:basic" xsi:type="cat:catalogue_Type" xsi:schemaLocation="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue http://www.paradine.net/schema/dictionary/1.0/ontoML/ISO29002/catalogue.xsd">
<!--SOT1209-->
...
</cat:catalogue>

XQuery

First, I have to use a namespace declaration because of the cat namespace used in xml. Second, the task becomes trivial if you know comment() XPath expression that matches XML comment nodes.

declare namespace cat = "urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue";

for $file in collection("ontoML/packages")
return concat($file/cat:catalogue/comment(), ',',
              $file/comment()[2]/substring-after(., ': '))

Results

This is the CSV-output of the given XQuery:

SOT1209,2012-12-12 23:58:13.118 GMT
SOT120A,2012-12-12 23:58:18.665 GMT
SOT1210,2012-12-12 23:58:22.517 GMT
...

P.S. If you're dealing with an opposite task of creating XML comments from XQuery, there is an XML comment constructor.

Ivan Lagunov's Blog

Search This Blog

Extracting XML comments with XQuery

Labels

Comments

Post a Comment

Popular posts from this blog

Connection to Amazon Neptune endpoint from EKS during development

How to import an untrusted website certificate to the Java keystore

Managing Content Security Policy (CSP) in IBM MAS Manage