Ivan Lagunov's Blog

Posts

Handling data issues with XQuery

This post is about the good practice on how to handle the data issues with XQuery. We store a huge amount of xml data in Sedna XML database , and there happen to be synchronization issues with external systems resulting in xml data issues. One of possible consequences is that an XQuery function may receive an input parameter of an unexpected type. The limitation of the problem is that Sedna supports only XQuery 1.0 and not XQuery 3.0 . As a result, try/catch expressions are not yet available for Sedna that makes handling issues harder and nastier. BTW, a while ago I created a feature request for Sedna to add support of XQuery 3.0 - you're welcome to upvote it ! Issue description Here is the initial XQuery that simply returns the subtitle tag value. declare function vp:getMapSubtitle($vp as element(value-proposition)?) as xs:string? { data($vp/topicmeta/subtitle) }; We faced a data issue when there appeared two value propositions as an input parameter $vp . It result

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem. XML example This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209 ) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT ). <?xml version="1.0" standalone="yes"?>    <cat:catalogue xmlns:cat=

IntelliJ IDEA Compiler Excludes issue with generated sources

I've recently got a fresh new licensed IntelliJ IDEA 12 and have been so glad about it until I've suddenly stumbled upon a strange issue. The Java project that was being developed successfully in previous versions of IDEA crashed during building this time. Shortly, the solution was hidden under IDEA Settings Compiler.Excludes where the JAXB generated sources directory was excluded due to some unknown reason. Below are the details and the screenshots. Symptoms of issue Here are the symptoms of the issue. Whenever the sources directory is excluded from compiling, it's marked with cross signs. See generated directory below: This issue results in numerous "cannot find symbol" errors during compilation: Settings Compiler.Excludes Here is the screenshot with the solution for this issue. You just need to delete the item with excluded sources and they will again magically appear in the classpath. Update - the root cause found After a while I real

Linux command line tips and tricks

This post lists a number of useful tips and tricks from my daily Linux experience. Mostly I deal with RHEL but I believe these commands are quite independent on Linux distribution (or can be adapted). Network commands Here are network commands represented. Basic net utils: # Who is listening to port: netstat -lp | grep <port> # Show all connections with numeric addresses and proc IDs: netstat -anp # Listen to port (to check connectivity from another side): netcat -l -p <port> # -or- nc -l -p <port> SSH tunnel: # Tunnel to remote_ip:remote_port via proxy_ip with known login/password # The remote_ip:remote_port is being redirected to localhost:local_port ssh -L local_port:remote_ip:remote_port login@proxy_ip # Real-world example of tunnel to remote Sedna XML DB: ssh -L 5050:134.27.100.67:5050 pxqa1@134.27.100.67 Download via HTTP proxy with wget: # Download resource from internet from behind a proxy: http_proxy=http://host:port ; export http_proxy ; w

Extracting collection from Sedna XML DB

This post is actually based on a kind of an epic fail story. Initially the task was just to rename a collection in Sedna XML DB . The solution is as primitive as using RENAME COLLECTION statement of Sedna Data Definition Language. But I'm probably too enthusiastic about writing Bash scripts in Linux. So I missed out single-statement solution and wrote a bunch of scripts to perform the same task via extracting-loading procedure. Anyway, it can still be quite valuable for more complex tasks like moving a collection between XML DB installations (e.g. from Production to Test environment) or merging collections. So my solution follows below. Extracting a single file It's always wise to modularize the code and divide a task into smaller parts. First, we need a script for extracting a single file. It need be parametrized with a file name and a collection name. Also I address another essential problem here that is the safety of file names. It's not a common problem but we do

Using JavaScript hashCode to enable Cocoon caching of POST requests

I've just faced an issue with the Cocoon caching related to POST requests. Let me describe the use case here. We use a custom XQueryGenerator to execute XQuery code over Sedna XML Database and then process the XML results in the Cocoon pipeline. For the sake of performance, I configured the pipeline caching based on the expiration timeout of 60 seconds for all XQuery invocations: <map:pipeline id="cached-services" type="expires" internal-only="true"> <map:parameter name="cache-expires" value="60"/> <map:parameter name="cache-key" value="{request:sitemapURI}?{request:queryString}"/> <map:match pattern="cached-internal-xquery/**"> <map:generate src="cocoon:/xquery-macro/{1}" type="queryStringXquery"> <map:parameter name="contextPath" value="{request:contextPath}"/> </map:generate>

Bulk loading files into Sedna XML DB - part 2

In the part 1 of the article I've used scripts to generate bulk load file with LOAD instructions. But that approach has several drawbacks: existing files are not overwritten; hard to track the progress of long-term operation in case of huge number of files. I've written a better script to solve those issues. Bash script for loading files The following Linux Bash script uploads files one by one using separate LOAD instructions . Also it tries to remove the file first using DROP DOCUMENT instruction . As a result, existing files are overwritten. After each 100 of files being loaded, you get a message with a timestamp. It helps to predict the end time of the operation. #!/bin/bash # This function writes a status message to both stdout and $OUTPUT_FILE function print_status { echo ">>> Loaded $counter files, time: `date`" | tee -a $OUTPUT_FILE } OUTPUT_FILE=load_files.log COLLECTION_NAME=legacyBasicTypes echo "" > $OUTPUT_FILE counter=0