Skip to main content

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem.

XML example
This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT).
<?xml version="1.0" standalone="yes"?>

<!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10-->

<!--File saved on: 2012-12-12 23:58:13.118 GMT-->

<!--XML Schema used: V099-->
<cat:catalogue xmlns:cat="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:id="urn:iso:std:iso:ts:29002:-5:ed-1:tech:xml-schema:identifier" xmlns:val="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:value" xmlns:bas="urn:iso:std:iso:ts:29002:-4:ed-1:tech:xml-schema:basic" xsi:type="cat:catalogue_Type" xsi:schemaLocation="urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue http://www.paradine.net/schema/dictionary/1.0/ontoML/ISO29002/catalogue.xsd">
<!--SOT1209-->
...
</cat:catalogue>
XQuery
First, I have to use a namespace declaration because of the cat namespace used in xml. Second, the task becomes trivial if you know comment() XPath expression that matches XML comment nodes.
declare namespace cat = "urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue";

for $file in collection("ontoML/packages")
return concat($file/cat:catalogue/comment(), ',',
              $file/comment()[2]/substring-after(., ': '))
Results
This is the CSV-output of the given XQuery:
SOT1209,2012-12-12 23:58:13.118 GMT
SOT120A,2012-12-12 23:58:18.665 GMT
SOT1210,2012-12-12 23:58:22.517 GMT
...
P.S. If you're dealing with an opposite task of creating XML comments from XQuery, there is an XML comment constructor.

Comments

Popular posts from this blog

DynamicReports and Spring MVC integration

This is a tutorial on how to exploit DynamicReports reporting library in an existing Spring MVC based web application. It's a continuation to the previous post where DynamicReports has been chosen as the most appropriate solution to implement an export feature in a web application (for my specific use case). The complete code won't be provided here but only the essential code snippets together with usage remarks. Also I've widely used this tutorial that describes a similar problem for an alternative reporting library.
So let's turn to the implementation description and start with a short plan of this how-to:
Adding project dependencies.Implementing the Controller part of the MVC pattern.Modifying the View part of the MVC pattern.Modifying web.xml.Adding project dependencies
I used to apply Maven Project Builder throughout my Java applications, thus the dependencies will be provided in the Maven format.

Maven project pom.xml file:
net.sourceforge.dynamicreportsdynamicrepo…

Do It Yourself Java Profiling

This article is a free translation of the Russian one that is a transcript of the Russian video lecture done by Roman Elizarov at the Application Developer Days 2011 conference.
The lecturer talked about profiling of Java applications without any standalone tools. Instead, it's suggested to use internal JVM features (i.e. threaddumps, java agents, bytecode manipulation) to implement profiling quickly and efficiently. Moreover, it can be applied on Production environments with minimal overhead. This concept is called DIY or "Do It Yourself". Below the lecture's text and slides begin.
Today I'm giving a lecture "Do It Yourself Java Profiling". It's based on the real life experience that was gained during more than 10 years of developing high-loaded finance applications that work with huge amounts of data, millions currency rate changes per second and thousands of online users. As a result, we have to deal with profiling. Application profiling is an i…

Java 8 Lambdas applied to QuickSort algorithm

In this article I'm going to review Java 8 Lambdas use cases after I've watched the Lambdas have come to Java! screencast from Typesafe. As a nice example, I've decided to count comparisons in the Quicksort algorithm. Basic algorithm.Inline lambdas.Method references.Basic algorithm
Here is a basic implementation where we count comparisons in the Quicksort algorithm: public class QuickSort { public static long countComparisons(List<Integer> a) { if (a.size() <= 1) return 0; int p = getPivot(a); int i = 1; for (int j = 1; j < a.size(); j++) { if (a.get(j) < p) { if (j > i) swapInList(a, i, j); i++; } } swapInList(a, 0, i - 1); return countComparisons(a.subList(0, i - 1)) + countComparisons(a.subList(i, a.size())) + a.size() - 1; } private static Integer getPivot(List<Integer> a) { return a.get(0); …