Skip to main content

Handling data issues with XQuery

This post is about the good practice on how to handle the data issues with XQuery. We store a huge amount of xml data in Sedna XML database, and there happen to be synchronization issues with external systems resulting in xml data issues. One of possible consequences is that an XQuery function may receive an input parameter of an unexpected type. The limitation of the problem is that Sedna supports only XQuery 1.0 and not XQuery 3.0. As a result, try/catch expressions are not yet available for Sedna that makes handling issues harder and nastier. BTW, a while ago I created a feature request for Sedna to add support of XQuery 3.0 - you're welcome to upvote it!

Issue description
Here is the initial XQuery that simply returns the subtitle tag value.
declare function vp:getMapSubtitle($vp as element(value-proposition)?) as xs:string? {
    data($vp/topicmeta/subtitle)
};
We faced a data issue when there appeared two value propositions as an input parameter $vp. It resulted in the following Java exception and Sedna error message:
org.xmldb.api.base.XMLDBException: SEDNA Message: ERROR XPTY0004
It is a type error if, during the static analysis phase, an expression is found to 
have a static type that is not appropriate for the context in which the expression 
occurs, or during the dynamic evaluation phase, the dynamic type of a value does 
not match a required type as specified by the matching rules in 2.5.4 SequenceType 
Matching.
Details: Error in function call. Return value does not match the required type. 
Expected type is [element(value-proposition)?], more than one item is given.
Query line: 7, column:1
It's a pretty good message, contains all the required details to locate the failing code, including the query line. However, it still takes time to navigate there, to make sure that it was indeed a data issue and finally to sort it out. Also it'd be unwise to fix the input parameter type to accept any number of elements as it would result in ignoring the data issues. So the problem here is to reduce the amount of time wasted on unnecessary (and probably repeated) investigations.

Solution
That's why it makes sense to catch the error using if condition as a workaround and provide an appropriate error message with a sufficient data issue description. Here is the new code:
declare function vp:getMapSubtitle($vp as element(value-proposition)*) as xs:string? {
    if (count($vp) < 2)
    then data($vp/topicmeta/subtitle)
    else
      let $vp_ids := for $single_vp in $vp return data($single_vp/@id)
      return error(xs:QName("vp_getMapSubtitle"), 
        concat("More than one VP provided: ", string-join($vp_ids, ", ")))
};
As a result, in case of the frequent data issue with multiple VP’s, we’ll see the following message in Java application logs:
org.xmldb.api.base.XMLDBException: SEDNA Message: ERROR vp_getMapSubtitle
More than one VP provided: vp_ACTT4S-800E, vp_BT136B-600E
This approach saves plenty of time, allowing to resolve the data issue right away. Still it's far from perfect as the message is hidden in the application logs. So the next step will be to show a proper message to the end user and set up a log checker system to notify responsible parties about the data issue.

Comments

Popular posts from this blog

Connection to Amazon Neptune endpoint from EKS during development

This small article will describe how to connect to Amazon Neptune database endpoint from your PC during development. Amazon Neptune is a fully managed graph database service from Amazon. Due to security reasons direct connections to Neptune are not allowed, so it's impossible to attach a public IP address or load balancer to that service. Instead access is restricted to the same VPC where Neptune is set up, so applications should be deployed in the same VPC to be able to access the database. That's a great idea for Production however it makes it very difficult to develop, debug and test applications locally. The instructions below will help you to create a tunnel towards Neptune endpoint considering you use Amazon EKS - a managed Kubernetes service from Amazon. As a side note, if you don't use EKS, the same idea of creating a tunnel can be implemented using a Bastion server . In Kubernetes we'll create a dedicated proxying pod. Prerequisites. Setting up a tunnel.

Notes on upgrade to JSF 2.1, Servlet 3.0, Spring 4.0, RichFaces 4.3

This article is devoted to an upgrade of a common JSF Spring application. Time flies and there is already Java EE 7 platform out and widely used. It's sometimes said that Spring framework has become legacy with appearance of Java EE 6. But it's out of scope of this post. Here I'm going to provide notes about the minimal changes that I found required for the upgrade of the application from JSF 1.2 to 2.1, from JSTL 1.1.2 to 1.2, from Servlet 2.4 to 3.0, from Spring 3.1.3 to 4.0.5, from RichFaces 3.3.3 to 4.3.7. It must be mentioned that the latest final RichFaces release 4.3.7 depends on JSF 2.1, JSTL 1.2 and Servlet 3.0.1 that dictated those versions. This post should not be considered as comprehensive but rather showing how I did the upgrade. See the links for more details. Jetty & Tomcat. JSTL. JSF & Facelets. Servlet. Spring framework. RichFaces. Jetty & Tomcat First, I upgraded the application to run with the latest servlet container versio

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem. XML example This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209 ) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT ). <?xml version="1.0" standalone="yes"?> <!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10--> <!--File saved on: 2012-12-12 23:58:13.118 GMT--> <!--XML Schema used: V099--> <cat:catalogue xmlns:cat=