Debugging Production Peformance Issues With The Power Of The Thread Dump using jstack

Until recently I have never had to do much performance tuning. On production systems and when ever I have, I have always been able replicate the problem in the development environment. Recently I ran across a problem was happening in one production environment but not in another production environemnt. You see we have our live production site and our back up site. Our back up site used to be our production site, but we moved to the new environment a few months ago. users started to get time outs when they were trying to access pages that they were not authorised to instead of getting 401s.

 
I was at a loss on how to debug this issue. However, one of the other developers showed me how to use thread dumps to see what was going on the server at the time the problem was happening. Jstack comes with the Java SDK. It prints out what looks like a stack trace for all of the threads in a particular JVM. What we did is we went to  one of the offending pages and while we were waiting for the page to load we ran jstack repeatedly and sent the output to different files which we then compared to see if any of our code was causing the problem. 
 
Our servers are linux so we ran the following command to get our application server's process id.
 
ps -ef | grep java
 
The output was something like this:
 
developers 3456 1 41 07:11 pts/0 00:00:16 /usr/lib/jvm/java-6-sun-1.6.0.26/bin/java -cp /home/developers/glassfish3/glassfish/modules/glassfish.jar -XX:+UnlockDiagnosticVMOptions -XX:MaxPermSize=192m -XX:NewRatio=2 -Xmx512m -javaagent:/home/developers/glassfish3/glassfish/lib/monitor/flashlight-agent.jar -client -Dfelix.fileinstall.disableConfigSave=false -Djavax.net.ssl.keyStore=/home/developers/glass....
 
Where 3456 is the process id. We then used jstack to take a bunch of thread dumps like so:
 
jstack -l 3456 > ~/tdump1.txt
jstack -l 3456 > ~/tdump2.txt
 
We took more than 2 dumps, but we were able to see where our code was stuck. We were doing something silly. We were catching the exception caused by the 401 and emailing it. We changed the code to write to a log file instead. This problem was part of the cause of some of our pages disappearing from the google index.
This entry was posted in Performance Tuning. Bookmark the permalink.

2 Responses to Debugging Production Peformance Issues With The Power Of The Thread Dump using jstack

  1. Pingback: True cause of the pages dissappearing from the google index « Uncategorized « Developers Log

  2. Andres says:

    Have you ever considered danidg more videos to your blog posts to keep the readers more entertained? I mean I just read through the entire article of yours and it was quite good but since I’m more of a visual learner,I found that to be more helpful well let me know how it turns out! I love what you guys are always up too. Such clever work and reporting! Keep up the great works guys I’ve added you guys to my blogroll. This is a great article thanks for sharing this informative information.. I will visit your blog regularly for some latest post.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>