In the part 1 of the article I've used scripts to generate bulk load file with LOAD instructions. But that approach has several drawbacks: existing files are not overwritten; hard to track the progress of long-term operation in case of huge number of files. I've written a better script to solve those issues.
Bash script for loading files
The following Linux Bash script uploads files one by one using separate LOAD instructions. Also it tries to remove the file first using DROP DOCUMENT instruction. As a result, existing files are overwritten. After each 100 of files being loaded, you get a message with a timestamp. It helps to predict the end time of the operation.
#!/bin/bash # This function writes a status message to both stdout and $OUTPUT_FILE function print_status { echo ">>> Loaded $counter files, time: `date`" | tee -a $OUTPUT_FILE } OUTPUT_FILE=load_files.log COLLECTION_NAME=legacyBasicTypes echo "" > $OUTPUT_FILE counter=0 print_status for file in products/* do shortname=`echo $file | sed "s/.*\///"` /appl/sedna/bin/se_term -query "DROP DOCUMENT '$shortname' IN COLLECTION '$COLLECTION_NAME'" mydatabase >> $OUTPUT_FILE /appl/sedna/bin/se_term -query "LOAD '$file' '$shortname' '$COLLECTION_NAME'" mydatabase >> $OUTPUT_FILE let "counter = $counter + 1" if (( $counter % 100 == 0 )); then print_status fi done print_status
Launching script and tracking progress
The following command will launch the script in the background and will prevent the script being terminated on terminal closure:
nohup ./load_files.sh &
On the screen you'll get output including status and error messages that are going to stdout and stderr system output streams. Here is an example:
>>> Loaded 0 files, time: Fri Aug 24 15:52:23 CEST 2012 SEDNA Message: ERROR SE2006 No document with this name. Details: 74ABT126D.xml DROP DOCUMENT '74ABT126D.xml' IN COLLECTION 'legacyBasicTypes'>>> Loaded 100 files, time: Fri Aug 24 15:54:01 CEST 2012 >>> Loaded 200 files, time: Fri Aug 24 15:55:57 CEST 2012 >>> Loaded 300 files, time: Fri Aug 24 15:57:55 CEST 2012 >>> Loaded 400 files, time: Fri Aug 24 15:59:43 CEST 2012 >>> Loaded 500 files, time: Fri Aug 24 16:00:36 CEST 2012
Finally you can also watch the log file changes using the following command:
tail -f load_files.log
This will allow you to see the results of each instruction. The following example shows the output for successful DROP and LOAD instructions:
UPDATE is executed successfully Bulk load succeeded
Comments
Post a Comment