Bulk loading files into Sedna XML DB

The problem is to upload plenty of files into Sedna XML DB. How would you do this? If it is a repeated action, it's logical to create an application for this. This is quite easy using Sedna XML:DB Java API. Actually we've already done so but this article addresses another case. There is a problem using Java API that is the performance. Using Java API always brings overhead compared to using embedded terminal utility (I got the performance of 2 seconds per file with the remote Sedna installation). Now I have several thousands of files and I want to upload them fast so let's turn to writing some useful scripts to automate it.

Generate bulk load file

First we need to generate an xquery file with LOAD instructions that are supported by Sedna terminal utility. Let's do this with another simple script. I had to do this under both Linux and Windows systems so you'll find two scripts below.
First comes the Linux shell script:

#!/bin/sh

OUTPUT_FILE=bulk_load.xquery
COLLECTION_NAME=products

echo "" > $OUTPUT_FILE

for file in /home/ilagunov/files/* 
do
  shortname=`echo $file | sed "s/.*\///"`
  echo "LOAD \"$file\" \"$shortname\" \"$COLLECTION_NAME\"&" >> $OUTPUT_FILE
done

Here is the Windows Batch script:

@echo off

set OUTPUT_FILE=bulk_load.xquery
set COLLECTION_NAME=products
set FILES_DIRECTORY=c:\files

del %OUTPUT_FILE%

for /f %%i in ('dir /b "%FILES_DIRECTORY%"') do (
  echo LOAD "%FILES_DIRECTORY%\%%i" "%%i" "%COLLECTION_NAME%"^& >>%OUTPUT_FILE%
)

So just specify correct values to internal variables and you'll get a nice bulk_load.xquery:

LOAD "c:\files\1075.xml" "1075.xml" "products"& 
LOAD "c:\files\1076.xml" "1076.xml" "products"& 
LOAD "c:\files\1078.xml" "1078.xml" "products"&

Execute generated file

Now locate your Sedna terminal utility se_term and execute the following command (just specify absolute paths where needed):

se_term -file bulk_load.xquery -output bulk_load.log db-name

Comments

RaMJuly 24, 2014 at 2:42 PM
Very useful. Thank you....
ReplyDelete
Replies

Add comment

Ivan Lagunov's Blog

Search This Blog

Bulk loading files into Sedna XML DB

Labels

Comments

Post a Comment

Popular posts from this blog

Connection to Amazon Neptune endpoint from EKS during development

How to import an untrusted website certificate to the Java keystore

Managing Content Security Policy (CSP) in IBM MAS Manage