I've recently had to deal with the application connectivity issue (details on stackoverflow) that appeared after the migration of the application to a new server. It resulted in "Connection timed out" Java exceptions in certain cases. The answer was on the surface but I didn't know where to look at exactly. So I had to investigate and apply network sniffing tools such as tcpdump and Wireshark. Here I'd like to share my experience with the network analysis.
The issue and the cause
The following exception was thrown by Saxon XSLT processor when the document function was invoked:
Caused by: org.apache.commons.lang.exception.NestableRuntimeException: net.sf.saxon.trans.DynamicError: net.sf.saxon.trans.DynamicError: java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
It clearly shows that some resource was not accessible although I could easily access the URL that was an argument of the document function. Finally it appeared that there was a DOCTYPE declaration of a DTD resource in the target xml file. So Saxon apparently failed when it could not access that DTD to perform the validation. To find out this root cause I had to use network tools.
Wireshark
First of all, I tried to reproduce the issue on my developer's Windows-based laptop but could not do this that confirmed it's a server configuration issue. However, before debugging on QA environment I analyzed the application network activity locally with Wireshark that appears to be the most popular network packet analyzer with GUI. This tool provides numerous filtering options so you can locate whatever you want on the network. It helped me to capture the HTTP request sent by Saxon and to see all request headers. Afterwards I simulated the identical request with the same headers with wget and curl on the QA environment. But generally it didn't help so I had to move to the QA environment.
Tcpdump
On the QA environment we have RHEL system installed so tcpdump appeared to be the best fit. It is another very popular network packet analyzer but only command-line. To investigate the issue further, I've used tcpdump to record the network activity for the current host for two scenarios: sending suspicious HTTP request with curl and doing it via the application itself. Curl worked fine, so the issue proved to be application-related. Indeed curl did not send another request to fetch the DTD file while Saxon did. So this is how curl command looked like:
curl -v -H "Pragma: no-cache" -H "User-Agent: Java/1.6.0_21" -H "Cache-Control: no-cache" -H "Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2" -H "Connection: keep-alive" http://suspicious-request-URL-here
This is how tcpdump command looked like:
sudo /usr/sbin/tcpdump -i eth2 -s 512 -l -A host 134.27.100.153
It prints all the requests coming to and from the specified host. I'm not pasting those results here as they are too big. But having compared the tcpdump outputs for two scenarios, I've found out there were additional requests sent from the application. It did not reveal the exact URL though but it helped me to guess the root cause and locate that DTD declaration. To conclude, the network packet analysis can be very useful for debugging. I'm pretty confident with client-side tools built into browsers (e.g. Firebug). As for the server-side, I'm still not an expert here and it may be possible to discover much more details with the tools I mentioned above.
Comments
Post a Comment