Skip to main content

Posts

Showing posts from August, 2012

Bulk loading files into Sedna XML DB - part 2

In the part 1 of the article I've used scripts to generate bulk load file with LOAD instructions. But that approach has several drawbacks: existing files are not overwritten; hard to track the progress of long-term operation in case of huge number of files. I've written a better script to solve those issues. Bash script for loading files The following Linux Bash script uploads files one by one using separate LOAD instructions . Also it tries to remove the file first using DROP DOCUMENT instruction . As a result, existing files are overwritten. After each 100 of files being loaded, you get a message with a timestamp. It helps to predict the end time of the operation. #!/bin/bash # This function writes a status message to both stdout and $OUTPUT_FILE function print_status { echo ">>> Loaded $counter files, time: `date`" | tee -a $OUTPUT_FILE } OUTPUT_FILE=load_files.log COLLECTION_NAME=legacyBasicTypes echo "" > $OUTPUT_FILE counter=0

Bulk loading files into Sedna XML DB

The problem is to upload plenty of files into Sedna XML DB . How would you do this? If it is a repeated action, it's logical to create an application for this. This is quite easy using Sedna XML:DB Java API . Actually we've already done so but this article addresses another case. There is a problem using Java API that is the performance. Using Java API always brings overhead compared to using embedded terminal utility (I got the performance of 2 seconds per file with the remote Sedna installation). Now I have several thousands of files and I want to upload them fast so let's turn to writing some useful scripts to automate it. Generate bulk load file First we need to generate an xquery file with LOAD instructions that are supported by Sedna terminal utility. Let's do this with another simple script. I had to do this under both Linux and Windows systems so you'll find two scripts below. First comes the Linux shell script: #!/bin/sh OUTPUT_FILE=bulk_load.xque