Skip to main content

Bulk loading files into Sedna XML DB - part 2

In the part 1 of the article I've used scripts to generate bulk load file with LOAD instructions. But that approach has several drawbacks: existing files are not overwritten; hard to track the progress of long-term operation in case of huge number of files. I've written a better script to solve those issues.

Bash script for loading files
The following Linux Bash script uploads files one by one using separate LOAD instructions. Also it tries to remove the file first using DROP DOCUMENT instruction. As a result, existing files are overwritten. After each 100 of files being loaded, you get a message with a timestamp. It helps to predict the end time of the operation.
#!/bin/bash

# This function writes a status message to both stdout and $OUTPUT_FILE
function print_status {
  echo ">>> Loaded $counter files, time: `date`" | tee -a $OUTPUT_FILE
}

OUTPUT_FILE=load_files.log
COLLECTION_NAME=legacyBasicTypes

echo "" > $OUTPUT_FILE

counter=0
print_status

for file in products/* 
do
  shortname=`echo $file | sed "s/.*\///"`
  /appl/sedna/bin/se_term -query "DROP DOCUMENT '$shortname' IN COLLECTION '$COLLECTION_NAME'" mydatabase >> $OUTPUT_FILE
  /appl/sedna/bin/se_term -query "LOAD '$file' '$shortname' '$COLLECTION_NAME'" mydatabase >> $OUTPUT_FILE
  
  let "counter = $counter + 1"
  if (( $counter % 100 == 0 )); then
    print_status
  fi
done

print_status

Launching script and tracking progress
The following command will launch the script in the background and will prevent the script being terminated on terminal closure:
nohup ./load_files.sh &
On the screen you'll get output including status and error messages that are going to stdout and stderr system output streams. Here is an example:
>>> Loaded 0 files, time: Fri Aug 24 15:52:23 CEST 2012

SEDNA Message: ERROR SE2006
No document with this name.
Details: 74ABT126D.xml

DROP DOCUMENT '74ABT126D.xml' IN COLLECTION 'legacyBasicTypes'>>> Loaded 100 files, time: Fri Aug 24 15:54:01 CEST 2012
>>> Loaded 200 files, time: Fri Aug 24 15:55:57 CEST 2012
>>> Loaded 300 files, time: Fri Aug 24 15:57:55 CEST 2012
>>> Loaded 400 files, time: Fri Aug 24 15:59:43 CEST 2012
>>> Loaded 500 files, time: Fri Aug 24 16:00:36 CEST 2012
Finally you can also watch the log file changes using the following command:
tail -f load_files.log
This will allow you to see the results of each instruction. The following example shows the output for successful DROP and LOAD instructions:
UPDATE is executed successfully
Bulk load succeeded

Comments

Popular posts from this blog

DynamicReports and Spring MVC integration

This is a tutorial on how to exploit DynamicReports reporting library in an existing Spring MVC based web application. It's a continuation to the previous post where DynamicReports has been chosen as the most appropriate solution to implement an export feature in a web application (for my specific use case). The complete code won't be provided here but only the essential code snippets together with usage remarks. Also I've widely used this tutorial that describes a similar problem for an alternative reporting library.
So let's turn to the implementation description and start with a short plan of this how-to:
Adding project dependencies.Implementing the Controller part of the MVC pattern.Modifying the View part of the MVC pattern.Modifying web.xml.Adding project dependencies
I used to apply Maven Project Builder throughout my Java applications, thus the dependencies will be provided in the Maven format.

Maven project pom.xml file:
net.sourceforge.dynamicreportsdynamicrepo…

Do It Yourself Java Profiling

This article is a free translation of the Russian one that is a transcript of the Russian video lecture done by Roman Elizarov at the Application Developer Days 2011 conference.
The lecturer talked about profiling of Java applications without any standalone tools. Instead, it's suggested to use internal JVM features (i.e. threaddumps, java agents, bytecode manipulation) to implement profiling quickly and efficiently. Moreover, it can be applied on Production environments with minimal overhead. This concept is called DIY or "Do It Yourself". Below the lecture's text and slides begin.
Today I'm giving a lecture "Do It Yourself Java Profiling". It's based on the real life experience that was gained during more than 10 years of developing high-loaded finance applications that work with huge amounts of data, millions currency rate changes per second and thousands of online users. As a result, we have to deal with profiling. Application profiling is an i…

Java 8 Lambdas applied to QuickSort algorithm

In this article I'm going to review Java 8 Lambdas use cases after I've watched the Lambdas have come to Java! screencast from Typesafe. As a nice example, I've decided to count comparisons in the Quicksort algorithm. Basic algorithm.Inline lambdas.Method references.Basic algorithm
Here is a basic implementation where we count comparisons in the Quicksort algorithm: public class QuickSort { public static long countComparisons(List<Integer> a) { if (a.size() <= 1) return 0; int p = getPivot(a); int i = 1; for (int j = 1; j < a.size(); j++) { if (a.get(j) < p) { if (j > i) swapInList(a, i, j); i++; } } swapInList(a, 0, i - 1); return countComparisons(a.subList(0, i - 1)) + countComparisons(a.subList(i, a.size())) + a.size() - 1; } private static Integer getPivot(List<Integer> a) { return a.get(0); …