Once we had a severe issue with Sedna hanging regularly. It was caused by broken indexes after an upgrade at that moment. The issue caused quite a nightmare and led to a lot of time wasted until we solved it together with Sedna devs. Since that moment it has become very important to be able to look into what is happening inside Sedna at any particular moment. Fortunately, there is a suitable way although it's not documented properly on the Sedna website. All you need is to build Sedna from source with a special flag RelWithDebugInfo.
Cmake build modes
Cmake has several build modes with Release and Debug obviously among them. Another mode that can be of big use is called RelWithDebugInfo. There is a perfect explanation for it on the mailing list:
The difference between Debug and RelwithDebInfo is that RelwithDebInfo is quite similar to Release mode. It produces fully optimised code, but also builds the program database, and inserts debug line information to give a debugger a good chance at guessing where in the code you are at any time.So it was suggested by Sedna devs to build the database from source with this flag:
-DCMAKE_BUILD_TYPE=RelWithDebugInfo
Next you'll see how it allows tracking state of open database transactions.
Using gdb
The GNU Project debugger can help to see what is going on inside a running program. It's usage is very simple, you only need to know the process id so you should run gdb . <pid>. Then you can fetch the list of threads with info threads, switch between threads with thread <number> and see the backtrace of a thread with bt command. Here is an example:
lagivan@host:/home/lagivan>gdb . 7108 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5) --- GDB loading traces have been removed for clearness --- warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffb51fc000 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 (gdb) info threads 4 Thread 0x41ec8940 (LWP 7109) 0x0000003fbd40b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x41ee1940 (LWP 7110) 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 2 Thread 0x41189940 (LWP 7111) 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 * 1 Thread 0x2b20394eea80 (LWP 7108) 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x41ee1940 (LWP 7110))]#0 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 (gdb) bt #0 0x0000003fbc8d5887 in semop () from /lib64/libc.so.6 #1 0x00000000004678d0 in EventWaitReset () #2 0x00000000004679c7 in UEventWait () #3 0x000000000042e724 in checkpoint_thread(void*) () #4 0x0000003fbd40673d in start_thread () from /lib64/libpthread.so.0 #5 0x0000003fbc8d44bd in clone () from /lib64/libc.so.6 (gdb) quit
The backtrace above looks like the thread is waiting for a semaphore that is quite normal. However, the hanging Sedna transaction is probably blocked waiting for a client response when you observe the following backtrace:
#0 0x00000037858cd722 in select () from /lib64/libc.so.6 #1 0x000000000097bf1c in uselect_read () #2 0x00000000005942b1 in socket_client::read_msg(msg_struct*) () #3 0x000000000058c001 in TRmain () #4 0x00000000008fb60e in main ()
This can be a sign that the client has not closed the transaction properly or there has been a network interruption. Certainly, this tool can be very useful if you know the program implementation details. In my case the backtraces together with Sedna logs were analyzed by Sedna devs.
Using netstat
Further investigation can be done with netstat tool to track which client behaves badly and makes the database wait. The following command will return the list of established connections with process ids shown:
netstat -nap | grep ESTABLISHED
Hopefully, this information should be sufficient to identify the client. Then it's a matter of analyzing its logs and deciding if it's a client issue or a Sedna bug.
Comments
Post a Comment