Skip to main content

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem.

XML example
This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT).
<?xml version="1.0" standalone="yes"?>

<!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10-->

<!--File saved on: 2012-12-12 23:58:13.118 GMT-->

<!--XML Schema used: V099-->
<cat:catalogue xmlns:cat="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:id="urn:iso:std:iso:ts:29002:-5:ed-1:tech:xml-schema:identifier" xmlns:val="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:value" xmlns:bas="urn:iso:std:iso:ts:29002:-4:ed-1:tech:xml-schema:basic" xsi:type="cat:catalogue_Type" xsi:schemaLocation="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue http://www.paradine.net/schema/dictionary/1.0/ontoML/ISO29002/catalogue.xsd">
<!--SOT1209-->
...
</cat:catalogue>
XQuery
First, I have to use a namespace declaration because of the cat namespace used in xml. Second, the task becomes trivial if you know comment() XPath expression that matches XML comment nodes.
declare namespace cat = "urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue";

for $file in collection("ontoML/packages")
return concat($file/cat:catalogue/comment(), ',',
              $file/comment()[2]/substring-after(., ': '))
Results
This is the CSV-output of the given XQuery:
SOT1209,2012-12-12 23:58:13.118 GMT
SOT120A,2012-12-12 23:58:18.665 GMT
SOT1210,2012-12-12 23:58:22.517 GMT
...
P.S. If you're dealing with an opposite task of creating XML comments from XQuery, there is an XML comment constructor.

Comments

Popular posts from this blog

DynamicReports and Spring MVC integration

This is a tutorial on how to exploit DynamicReports reporting library in an existing Spring MVC based web application. It's a continuation to the previous post where DynamicReports has been chosen as the most appropriate solution to implement an export feature in a web application (for my specific use case). The complete code won't be provided here but only the essential code snippets together with usage remarks. Also I've widely used this tutorial that describes a similar problem for an alternative reporting library.
So let's turn to the implementation description and start with a short plan of this how-to:
Adding project dependencies.Implementing the Controller part of the MVC pattern.Modifying the View part of the MVC pattern.Modifying web.xml.Adding project dependencies
I used to apply Maven Project Builder throughout my Java applications, thus the dependencies will be provided in the Maven format.

Maven project pom.xml file:
net.sourceforge.dynamicreportsdynamicrepo…

Choosing Java reporting tool - part 2

I've provided a general overview of possible solutions to get a reporting/exporting functionality in the previous post. This is the second overview of alternatives based on JasperReports reporting engine.

Since the previous part I've done the following:
Implemented a simple report using both DynamicJasper and DynamicReports to compare them from technical side.Investigated JasperServer features and tried to implement a simple report for JasperServer instance (it appeared we already have a ready licensed installation of JasperServer that makes it unreasonable to install a fresh one).
First, the comparison results of Java libraries (DynamicJasper and DynamicReports):
Both libraries suffer from poor-quality or missing Java docs but they look a bit better in DynamicJasper.Taking into account the point 1, a developer has to use online documentation and to review the code. Here the code looks definitely nicer and more readable for DynamicReports. With respect t…

Do It Yourself Java Profiling

This article is a free translation of the Russian one that is a transcript of the Russian video lecture done by Roman Elizarov at the Application Developer Days 2011 conference.
The lecturer talked about profiling of Java applications without any standalone tools. Instead, it's suggested to use internal JVM features (i.e. threaddumps, java agents, bytecode manipulation) to implement profiling quickly and efficiently. Moreover, it can be applied on Production environments with minimal overhead. This concept is called DIY or "Do It Yourself". Below the lecture's text and slides begin.
Today I'm giving a lecture "Do It Yourself Java Profiling". It's based on the real life experience that was gained during more than 10 years of developing high-loaded finance applications that work with huge amounts of data, millions currency rate changes per second and thousands of online users. As a result, we have to deal with profiling. Application profiling is an i…