Thursday, April 26, 2007

Java JVM high memory usage problems

Though I have some three four years experience in C++ ,I did not have
that much oppurtunity to work in Java. Currenly I was doing some
analysis for a very simple CLI and was surprised to come with a memory
restriction. I found that if I use Java for developing the CLI
application I will be exhausting the memory of our Application Server
(AS). Just to mention the architecture ,users (telecom operators) use
a metaframe server client ( =something like remote desktop) to login
to the AS and then open a GUI to work on it. With Java I can service
only about 15 clients with the available memory . I just checked the
reason for this resource crunch and found that already many Java based
GUI's are served by the AS and each is taking some 25 MB or more. The
first thing I thought is that adding more RAM will solve this ( though
this is not an easy option) . Then I understood that a 32 bit system
can have just about 3 gb ram for applciations and our AS had already 4
GB with 1.5 GB VM also configured. Then I thought there might be ways
to make the JVM shareable. But no. Also tried to use the flags to fine
tune JVM. No go there also. Irrespective of how much you limit the
actual limit is in the heap size allocated to the application and not
to the private bytes of JVM.

-----------------------------------------
Data from Jconsole for the Java application

Memory
Current Heap Size 4.3 MB
Max heap size 12. 2 MB ( set by flag)
Committed memory 5 MB
Operating System
Committed VM - 26.7 MB

Data from Perfmon (for java.exe)

Private Bytes - 27.7 MB
----------------------------------------

I am part of a large team that developes and maintains a telecom
network managment software system. Also this software is used all
around the globe almost in around 190 countries usually by telecom
service providers. So this is a very real problem that I am speaking
about. I am getting to love the simplicty of Java and of the great IDE
Eclipse as much as I love the power of STL; but I am comming face to
face with the main constraint of Java, its memory footprint and I
guess thousands of others like me must have faced similar problems.

If so the next question is what is SUN doing about this. I could not
find it in their top 25 RFE's or top 25 bugs. (http://bugs.sun.com/
bugdatabase/top25_bugs.do http://bugs.sun.com/bugdatabase/
top25_rfes.do )

1) Is this really not a problem in the outside world then ?

2) Or is it that with the introduction of 64 bit HW and serveres this
will be of no significance ?

Anyway I doubt if companies like ours will adopt 64 bit HW and servers
because of the costs involved in it ( note this is just a logical
guess, I have no experience in such decisions)

Anyway I am now forced to use C++ for the client and use Java on the
server side. Fortunately most of the Java in the server side is done
up as EJB's all conatined in the JBoss server thus consuming only one
JVM there)
-------
I posted this in a popular Java group and did not get any meaningful replies. So I guess this is one limitation that people are living with ; or that J2EE frameworks amalgamates everything to one large chunk of VM.


Monday, April 23, 2007

Windebug and Adplus for debugging process hang

1. Set the symbol, especillay windows symbols
.......26_Merged\UNCRelease;\\blrm2fsp\SCSD-Common\Sym\WinSym2K3SP1

2. Set the souce and image

3. Load the extension dll ( for !locks command ) if not already loaded

.load C:\Program Files\Debugging Tools for Windows\winxp\kdexts.dll

4. Use the !ntsdexts.locks ( or !locks) command to see a list of critical section

:004> !ntsdexts.locks

CritSec ntdll!LdrpLoaderLock+0 at 7C889D94
LockCount -6902
RecursionCount 1
OwningThread 2b3c
EntryCount 0
ContentionCount 28f3
*** Locked

CritSec +12f9b4 at 0012F9B4
LockCount 0
RecursionCount 0
OwningThread 0
*** Locked

CritSec +12f70c at 0012F70C
LockCount 0
RecursionCount 0
OwningThread 0
*** Locked

CritSec +128202c at 0128202C
LockCount -2
RecursionCount 1
OwningThread 3d9c
EntryCount 0
ContentionCount 0
*** Locked
-----------

A 'LockCount' other than 0 means that many threads are waitiing on this and it is the suspect

6. Do a ~* kb ( print stack trace for all threads)

and search for the critsection address (7C889D94) got from step 5, in the stack print.

Usually you will find many thread waiting for the same ( in this case thread id 163)

163 Id: 4d8.3c78 Suspend: 0 Teb: 7fe8e000 Unfrozen
ChildEBP RetAddr Args to Child
225dfbc8 7c822124 7c83970f 000001e4 00000000 ntdll!KiFastSystemCallRet
225dfbcc 7c83970f 000001e4 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
225dfc08 7c839620 00000000 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x19c
225dfc28 7c81a86c 7c889d94 00000000 7ffde000 ntdll!RtlEnterCriticalSection+0xa8
225dfcb8 7c81b22d 225dfd28 225dfd28 00000000 ntdll!LdrpInitializeThread+0x68
225dfd14 7c82ec2d 225dfd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

( the third parameter has the critical section address)


5. !critsec 7c889d94

CritSec ntdll!LdrpLoaderLock+0 at 7C889D94
LockCount -6902
RecursionCount 1
OwningThread 2b3c
EntryCount 0
ContentionCount 28f3
*** Locked

6.. Use ~ to get thread list; Check if you can find the thread in the dump.

If not ,then you have two things to do - Google about this locked critsection assuming it is a windows one,which gives you a hint about the suspect code parts. ( in this case TerminateThread calls from somewhere in your process)

Also you can attach Adplus ( part of windows debug package) to see if there was any exception that had occurred.
The Adplus gives you a dump of the process at the time when the exception is thrown .

Analysing each first and second chance exception is
important as this can also lead to memory corruption, which can manifest later on in many ways

7. Adplus setting F:\\Tools\ADPlus>adplus -pn proceesname.exe -crash -FullonFirst -quiet .

This will dump when ever an exception happens ( even if you are 'handling' it in the code using cach(…) )

Do a !analyze -v command on the dumps ( after setting the symbols correct) and this will give you the stack trace of the crashing code.

Correct these in source and run with adplus again.

Sooner or later you will find the problem area. And of course the solution.

Total Pageviews