Ivan Lagunov's Blog

Posts

Showing posts with the label Sedna XML DB

Sedna XML DB and RelWithDebugInfo mode

Once we had a severe issue with Sedna hanging regularly. It was caused by broken indexes after an upgrade at that moment. The issue caused quite a nightmare and led to a lot of time wasted until we solved it together with Sedna devs. Since that moment it has become very important to be able to look into what is happening inside Sedna at any particular moment. Fortunately, there is a suitable way although it's not documented properly on the Sedna website. All you need is to build Sedna from source with a special flag RelWithDebugInfo . Cmake build modes. Using gdb. Using netstat. Cmake build modes Cmake has several build modes with Release and Debug obviously among them. Another mode that can be of big use is called RelWithDebugInfo . There is a perfect explanation for it on the mailing list : The difference between Debug and RelwithDebInfo is that RelwithDebInfo is quite similar to Release mode. It produces fully optimised code, but also builds the program database, and in...

Handling data issues with XQuery

This post is about the good practice on how to handle the data issues with XQuery. We store a huge amount of xml data in Sedna XML database , and there happen to be synchronization issues with external systems resulting in xml data issues. One of possible consequences is that an XQuery function may receive an input parameter of an unexpected type. The limitation of the problem is that Sedna supports only XQuery 1.0 and not XQuery 3.0 . As a result, try/catch expressions are not yet available for Sedna that makes handling issues harder and nastier. BTW, a while ago I created a feature request for Sedna to add support of XQuery 3.0 - you're welcome to upvote it ! Issue description Here is the initial XQuery that simply returns the subtitle tag value. declare function vp:getMapSubtitle($vp as element(value-proposition)?) as xs:string? { data($vp/topicmeta/subtitle) }; We faced a data issue when there appeared two value propositions as an input parameter $vp . It result...

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem. XML example This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209 ) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT ). <?xml version="1.0" standalone="yes"?>    <cat:catalogue xmlns:cat=...

Extracting collection from Sedna XML DB

This post is actually based on a kind of an epic fail story. Initially the task was just to rename a collection in Sedna XML DB . The solution is as primitive as using RENAME COLLECTION statement of Sedna Data Definition Language. But I'm probably too enthusiastic about writing Bash scripts in Linux. So I missed out single-statement solution and wrote a bunch of scripts to perform the same task via extracting-loading procedure. Anyway, it can still be quite valuable for more complex tasks like moving a collection between XML DB installations (e.g. from Production to Test environment) or merging collections. So my solution follows below. Extracting a single file It's always wise to modularize the code and divide a task into smaller parts. First, we need a script for extracting a single file. It need be parametrized with a file name and a collection name. Also I address another essential problem here that is the safety of file names. It's not a common problem but we do...

Bulk loading files into Sedna XML DB - part 2

In the part 1 of the article I've used scripts to generate bulk load file with LOAD instructions. But that approach has several drawbacks: existing files are not overwritten; hard to track the progress of long-term operation in case of huge number of files. I've written a better script to solve those issues. Bash script for loading files The following Linux Bash script uploads files one by one using separate LOAD instructions . Also it tries to remove the file first using DROP DOCUMENT instruction . As a result, existing files are overwritten. After each 100 of files being loaded, you get a message with a timestamp. It helps to predict the end time of the operation. #!/bin/bash # This function writes a status message to both stdout and $OUTPUT_FILE function print_status { echo ">>> Loaded $counter files, time: `date`" | tee -a $OUTPUT_FILE } OUTPUT_FILE=load_files.log COLLECTION_NAME=legacyBasicTypes echo "" > $OUTPUT_FILE counter=0...

Bulk loading files into Sedna XML DB

The problem is to upload plenty of files into Sedna XML DB . How would you do this? If it is a repeated action, it's logical to create an application for this. This is quite easy using Sedna XML:DB Java API . Actually we've already done so but this article addresses another case. There is a problem using Java API that is the performance. Using Java API always brings overhead compared to using embedded terminal utility (I got the performance of 2 seconds per file with the remote Sedna installation). Now I have several thousands of files and I want to upload them fast so let's turn to writing some useful scripts to automate it. Generate bulk load file First we need to generate an xquery file with LOAD instructions that are supported by Sedna terminal utility. Let's do this with another simple script. I had to do this under both Linux and Windows systems so you'll find two scripts below. First comes the Linux shell script: #!/bin/sh OUTPUT_FILE=bulk_load.xque...