Note: this was originally posted April 22, 2009 on my old blog.
We are using WebLogic with the JRockit JVM and noticed tons of issues after a few weeks of uptime - spurious errors and non-responsiveness. The cause? Too many open file handles. Well, actually it's not so much that there's too many open file handles, it's that the OS (Suse Linux Enterprise Server 10) has a very low limit by default. The max file handles is over 400,000 for our production server (it's pretty high-end) but each process is limited to only 1,024.
After googling around a bit to figure out how to up the limit, I also put together this script to log the number of file handles being used by each of our WebLogic servers.
for i in `ps x | grep [j]rockit | cut -c1-5 | tr -d " "` ; do lsof -p $i | wc -l >> handles-count; done
Here's the breakdown:
ps x -- this lists the processes owned by the current user.
grep [j]rockit -- this filters the list of processes down to those containing the text "jrockit". Note that I use "[j]rockit", which is a neato trick to prevent the grep command from listing itself - the brackets cause the shell to expand the character class containing "j", so it's really "grep jrockit" except it doesn't grep itself.
cut -c1-5 -- cut out the remaining parts of line except for characters 1 through 5
tr -d " " -- trim off any whitespace - a process listing has the PID's right-justified so a space might be there
Those four are placed inside of backquotes (that key to the left of the numeral 1 on the keyboard). This causes the shell to execute the containing statements and use the output of that as the input for the remaining part of the statement. In this case, the result is a list of process IDs, which are looped through in the for-loop. The body of the for-loop is this part:
lsof -p $i -- lsof is the "list open files" command, with the -p parameter it will list open files being used by the process passed in
wc -l -- the venerable "word count" program, although I'm just using the -l option to count lines
>> handles-count -- this just saves the output into a file
This whole line is then saved in a shell script file and called via a cronjob every 15 minutes to collect usage info.