Sunday, February 05, 2017

Programming with Apache Spark and Cassandra -draft


Putting the knowledge gained so far in this  and frequent questions that many may ask and what we have asked ourselves.
Spark gives you horizontal scale ability in a programmer friendly way.
There are other options as well. I have listed them below, which describes and highlights Sparks place in the architecture

Type
Level of Granularity
Descritpion
LoadBalancer
(nginx,haproxy)
Request Level
(usually HTTP requests)
Works well for Request-response type client server protocols. Works also well in context of microservices in application program side
However to scale the processing insdie the application programs this is inadequate
Task Managers
(celery, other MQ based)
Task Level
Helps to scale processing in the application program.Takes care of Task handling. However the onus is on the developer to split application logic to independent tasks. Usually
only the simplest things are really split into tasks. Equally hard problem is combining the outputs
Cluster Computing
(Apache Spark,Hadoop)
Application Level,Function
level
Helps to scale processing in the application layer across. Takes care of all the above. The onus is still on the developer to use this properly. However if the few API*, map, foreach,reduce and groupBy/partionBy are used , the programmer can be written as if it is is running in a single node, in a single thread. The system manages shared RAM across mutiple nodes, shared cores, task scheduling, multithreading etc. *P.S - Spark has an extensive library for machine learning as well,which could be the gateway for future
Multithreading
Function level
Helps to scale the processing inside a single node across nodes. Usually has to be done with care to avoid the complexity of threading related problems which many programmers are unaware
Green Threads
Funciton level/Stack level
Ex Greenlets in Python ; Good for switching stack in IO bound applications ; example socket server etc; Not really parallel, but wait time in one stack frame can be used by other stacks waiting to execute. Rather specific for general purpose usage

It takes less than 10 minutes to download and setup Spark. For a software this capable there are surprisingly few Please follow steps in Working with the EE Cloud.

It takes less than 10 minutes to download and setup Spark. For a software this capable there are surprisingly few Please follow steps in Working with the EE Cloud.

How stable is Apache Spark and Apache Cassandra ?
Speaking from our limited experience in running the prototype, all of the Spark and Casandra JVMs survived 20 days of load runs, network problems , application exceptions we threw at them.And that too in a low end EE cloud lab. Looks to be well written
Data modelling and connected the Primary key and partition key design. It is important to design your primary key and the partition key so that write are distributed as well as read are faster. This is expalined well by the Cassandra expert here -> http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/
The hash of the partition key is used by Cassandra to identify the node in which to store. So choosing a partition key that distributes the load equally among nodes prevent write hotspots. Example can be seem in the performance run page
P,S - There are few trivial but important things , like writing commit log and data(SSTable)  in different partition. This link gives basic info about write path.
Have not come across as single important thing as such, but couple of pointers
1.    Avoid doing any major work in Spark driver , rdd.collect() or the more better rdd.toLocalIteraror() are not good ideas and don't scale; You get OOM error soon
2.    There is no way to share state like counters etc between driver and workers, though in the code it may seem so. Only way is via accumilators ; and there workers cannot read;
3.    The way you partition the RDD may be important for performance; esp for operation like group by etc ; need to test and understand this better


Saturday, February 04, 2017

Compiling OpenCV with CUDA (GPU) using Visual Studio


I have a tendency to choose the exact wrong thing every time when given a choice; I have sometimes wondered why.  A good thing with doing things almost wrong is that you get to learn about things.I have a feeling that doing things wrong and getting feedback and correcting is somehow fundamental in the way learning process happens.

I usually start learning a technology or language by jumping right in doing things wrong and learning on the way; if you are like me, then this will save you some time and some hair pulling.

Before we start, just a very short introduciton into the why part,trust me just the bare essentials.

OpenCV operates on images , which in computers (at least the ones we have now) is stored a pixel matrices. Various algorithms that opencv provides, for example for object detection for example does a lot of matrix operations. These operations are 'embarrassingly parallel' -data parallel and could be speeded if executed in the GPU.

Now NVDIA GPU have an  parallel programming  API called CUDA which can help in speeding up matrix multiplication. And OpenCV has support for the same; to use it however you need to compile OpenCV with CUDA. CUDA is NVDIA proprietary and it would work with only NVDIA  GPUs.

There is an open API which should work with different typed of GPU cards and that is OpenCL. However it may not be that  tuned for a particular card through. OpenCV has support for OpenCL too; however we will for now use CUDA.

Finally one more thing; CUDA uses BLAS libraries. The CUDA SDK provided by NVDIA has the cublas libraries for it. Don't ask me why I chose to compile OpenBLAS for it; as I said before CMake gives a lot of choices and if you don't know as much as above, you are sure to do some totally unnecessary but very instructive things.

Okay now to to the how;on Windows

First check if your PC or laptop has an NVDIA card. The easiest to do is via dxdiag windows utility


Now see if your card supports CUDA. There is a good utility from TechPowerUp GPU-Z that  will show this information among other like GPU load etc; which will help later to see if the programs are really using the GPU; Or you could check the NVDIA website for the Card and see if it supports; I guess most cards do; or you could check the very detailed page in wikipedia which lists the various generations of the processors  https://en.wikipedia.org/wiki/CUDA



Next step is to download the CUDA SDK from NVDIA.https://developer.nvidia.com/cuda-downloads; If you have a 64 bit system download the 64 bit SDK. Choose defaults and install it.

Then download the OpenCV source code from GIT and download CMake tool. You need to download MS Visual Studio Community edition for C++ compiler.

The main thing in correct compilation is to choose the right settings in CMake; First these are the minimum WITH variables needed to be configured



Miss few or mess with few and you will have lot of errors coming. I tried guessing and removed and got lot of errors while running the program. WITH_CUDA is madatory; If you need to see the videos image in GUI make sure to select WIN32UI and FFMPEG. I am still not sure if some are needed or why they are here. Please don't feel appaled; I learn this way; I have no clue initially and I learn to figure it out the hard way. It is something to do with being stupid.Why I removed the defaults was to cut down on the compile time from better half of the day to something more reasonable.

 Then I found that the best way to reduce the compile time was to limit the architecture to the number I though the GPU card was supporting. In my case for GeForce GT 720M card in the CUDA wiki page the architecture code name was Fermi and compute capability was given as 2.1 . That did not work; so I gave 2.0 and I found compile time decreased considerably.


After that you Configure and make sure you select the 64 bit Visual Studio Compiler. Select 32 bit or do some other mistake and you will be led to lot of Configurations erros



If that is the case CMake will automatically select the 64 bit libraries from the CUDA SDK. Else it will try to take the 32 bit libraies and you may get configuration error about BLAS



With that you may be able to compile your OpenCV program . Note that when I used the default ARCH_BIN setting which goes all the way from 1 to 5 I got some linker errors -

Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol __cudaRegisterLinkedBinary_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89 referenced in function "void __cdecl __sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89(void)" (?__sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89@@YAXXZ) opencv_core D:\build\opencv2\modules\core\cuda_compile_generated_gpu_mat.cu.obj 1


For your program using  the above built OpenCV usually most of the libraries  given below are needed. If your build of OpenCV is proper you would get these many dlls in the output folder. If some are missing try to build it from Visual Studio


opencv_calib3d320.lib
opencv_core320.lib
opencv_features2d320.lib
opencv_flann320.lib
opencv_highgui320.lib
opencv_imgcodecs320.lib
opencv_imgproc320.lib
opencv_ml320.lib
opencv_objdetect320.lib
opencv_shape320.lib
opencv_ts320.lib
opencv_video320.lib
opencv_videoio320.lib
opencv_cudaimgproc320.lib
opencv_cudaarithm320.lib
opencv_cudabgsegm320.lib
opencv_cudacodec320.lib
opencv_cudaimgproc320.lib
opencv_cudalegacy320.lib
opencv_cudaobjdetect320.lib
opencv_cudawarping320.lib
opencv_cudev320.lib
opencv_cudafilters320.lib

If you get include errors see the link

http://answers.opencv.org/question/29885/does-opencv_moduleshpp-exist-in-opencv248/

Finally check with GPU-Z and see if running the program is really using the GPU


Note for building your OpenCV solutions (1) using these libs and the following headers have to be added to OpenCV

(1) People detection example - https://gist.github.com/alexcpn/aeb8a4b8304639d8f91cc2fbc0c1c7df

Include Directories

C/C++ --> General --> Additional Include Directories

- D:\opencv\modules\calib3d\include;D:\opencv\modules\videoio\include;D:\opencv\modules\video\include;D:\opencv\modules\imgcodecs\include;D:\opencv\modules\cudaoptflow\include;D:\opencv\modules\cudastereo\include;D:\build\opencv4;d:\opencv\modules\core\include;D:\opencv\modules\cudawarping\include;D:\opencv\include;D:\opencv\modules\cudaobjdetect\include;D:\opencv\modules\cudaimgproc\include;D:\opencv\modules\imgproc\include;D:\opencv\modules\highgui\include;D:\opencv\modules\objdetect\include;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include

Note opencv2/opencv_modules.hpp is from the opencv build folder d:\build\opencv4\opencv2
and not from opencv git source; This dir should be in include path
 (d:\build\opencv4\ is the output directory specified in CMake)




Libs
Linker-->Input--> Additional Dependencies --> opencv_calib3d320.lib;opencv_core320.lib;opencv_features2d320.lib;opencv_flann320.lib;opencv_highgui320.lib;opencv_imgcodecs320.lib;opencv_imgproc320.lib;opencv_ml320.lib;opencv_objdetect320.lib;opencv_shape320.lib;opencv_ts320.lib;opencv_video320.lib;opencv_videoio320.lib;opencv_cudaimgproc320.lib;opencv_cudaarithm320.lib;opencv_cudabgsegm320.lib;opencv_cudacodec320.lib;opencv_cudalegacy320.lib;opencv_cudaobjdetect320.lib;opencv_cudawarping320.lib;opencv_cudev320.lib;opencv_cudafilters320.lib;%(AdditionalDependencies)

Lib Directories : - D:\build\opencv4\lib\Release

Here is what I did to install the latest OpenCV in an x86 664 bit machine running Ubuntu

     sudo apt-get install -y build-essential cmake
//video codecs; these many are not given in opencv site but got this from some other blog; I am not sure what is the bare minimum
     sudo apt-get install -y libdc1394-22-dev libavcodec-dev libavformat-dev libswscale-dev libtheora-dev libvorbis-dev libxvidcore-dev libx264-dev yasm libopencore-amrnb-dev libopencore-amrwb-dev libv4l-dev libxine2-dev
     sudo apt-get install -y libtbb-dev libeigen3-dev
     sudo apt-get install libavformat-dev libswscale-dev

    //  The below should be done at the begining; I did not do this and got some broken package error above; so did it; you learn the hard way :)
     sudo apt-get -y update  
     sudo apt-get -y upgrade
     sudo apt-get -y autoremove

     sudo apt-get install -y libtbb-dev libeigen3-dev
     sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
     sudo apt-get install libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev
     sudo apt-get install -y qt5-default
     sudo apt-get install -y zlib1g-dev libjpeg-dev libwebp-dev libpng-dev libtiff5-dev libjasper-dev libopenexr-dev libgdal-dev




Monday, January 23, 2017

Best practises - Selenium WebDriver/ Java

Intermittent failure in Slenium Test cases ?
After clicking the drill down sometimes web-elements are not found causing all summary table test cases to fail.
Common Root Causes
1) Prefer Selection By.ID className  then By.cssSelectior and only if all else fails use By.Xpath
Selection by CSS By.CSSSelector should be preferred over XPath as this is more stable as it is natively supported by browser . XPAth is an abstraction provided by Selenium and not as performant. If XPath is used make sure that you have hand written the XPath and it is performant and not generic in that it has to to brute force search through the entire DOM to find your element.
Where ever possible , use By.ID, else By.CSSSelector and in case of no other option use XPath after proper testing.
You can use FireFinder plugin for FireFox (first add FireBug) to test your CSS or Xpath (if there is no way you can select by CSS)
For example this XPath to finding the drill down element 
//*[@id='scTableTest_Site-PLMN-PLMNMRBTS-255-sitecreation_netact1']/td/div/img [@src='/SiteCreation-Table-portlet/images/openDrillDown.png']
can be reduced to this more efficient CSS - #scTableTokyo-PLMN-PLMNMRBTS-400-sitecreation_netact1 > td  > div  > img +img
2. WebElement.click may not click if element is not visible in browser view port
If an  element is not visible in the browser viewport , clicking on it should not be possible and test case should fail. Earlier versions of webdriver used to do implicit scrolling. However this is not consistent and is being debated. It is better not to rely on this. One way to make sure the element is clicked is to use the Selenium feature of directly invoking JS on the browser
if(imageName.contains("openDrillDown.png")){
 rowElement.click();
 ((JavascriptExecutor) driver).executeScript("arguments[0].click();", rowElement); 
 return ;
 }

3. Design and model your Selenium Java Code
More often there is no structure or design applied to Unit test classes. This may be okay for JUnits testing Java classes, as the design of the Java class is reflected in the test cases. But when we write integration test cases or GUI test cases with JUnit or TestNG using Selenium WebDriver writing like this leads to un-maintainable and very brittle code. The application should be logically structured . This way there is no code duplication and code bloat which otherwise keeps on growing with IDs, CSS paths or xPath's everywhere
Some good links - PageObject Pattern
4. Dont Sleep-- for Long, if you need to, do sleep for a short time wake up check and retry ( Retry pattern)
Understand and use Selenium implicit waits (common for the whole webdriver instance) or explicit waits
Example - 
WebElement rowElement = (new WebDriverWait(driver, 10).until(
 ExpectedConditions.presenceOfElementLocated( 
 By.cssSelector(cssPath))));
Note - Wait Retry pattern is very important for Stability; All finds should be retired at least three times as a rule of thumb. Depends also on your test case and modelling context as well
Implicit and Explicit wait: check these links
In case you have to sleep create a helper that sleeps for say 100 ms , checks and sleeps (while loop with a retry count ) 50 *100 = .5 seconds , so that if an element appears before , time can be saved

Python Profiling - Some hints



No time to compose fully; Here are some links which helped me in CPU profiling


Python Profilers

http://www.vrplumber.com/programming/runsnakerun/

 python -m cProfile -o profile_out ANR_4G_IRAT.py

 pacman -S kdesdk-kcachegrind

 http://thirld.com/blog/2014/11/30/visualizing-the-results-of-profiling-python-code/

 runsnakerun http://wiki.wxpython.org/How%20to%20install%20wxPython

 pstats dechipering cprofile output
 https://pymotw.com/2/profile/
 https://docs.python.org/2/library/profile.html

 yappi https://code.google.com/archive/p/yappi/

 pyprof2calltree -k -i myscript.cprof
 https://julien.danjou.info/blog/2015/guide-to-python-profiling-cprofile-concrete-case-carbonara


 performance tips python
 https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code

Wednesday, May 18, 2016

Java Script Development Guidlines


The aim of the page is to give practical and widely adopted industry best practises and guidelines for all phases of JavaScript development lifecycle. All the necessary tools and frameworks like code analysis and test frameworks are already integrated to our CI system and is ready for use and is already in use by teams

Part 1: Sensitizing with JavaScript language

  Persons coming from other languages may not appreciate certain coding guidelines that is recommended and set in  SONAR JS Analysis and that is also checked by popular tools like JSHint and JSLint;
Example the function below will return 'undefined' and not  'Hello World'. 
function main() {
 return 
 'Hello, World!';
}
main(); ->>return's 'undefined' 
This is because in JS semi columns are not mandatory and  JS will add semi colon automatically during code interpretation; So it will add a semicolon after return making the function return nothing; Such and other such quirks are present in the language which entails the need for mandatory code analysis integration.
If not already done already, kindly go through the web technology competence development page - Competence Building in Web Technology. This will give you a idea about JS , its flexibility ,power and the way to use it effectively.

Part 2: JavaScript Basic Development Guidelines 

Code Structuring 

The NEED - There is no public , private concept in JS; Everything is attached to the global  window name space; So if you add some global variables or functions and a JS library that you are including is doing the same then this will cause name space collisions; which basically means that you code may work in undefined ways. With portlets it becomes worse as the JS in one portlet can have the same function/object names  as the JS in another and it is pretty common porblem.
Solution:   Module Pattern where-ever possible ; or Nested NameSpace  where you need to create new objects

Option1 :Module Pattern --> See below; basically simulates private and public accessors via JavaScript Closures
var ContenPackModuleName = (function(){
   //module variable; this will be retruned; see below
var my ={};
     //private variables
var map=null;
var maploaded=false;
var markersArray = new Object();
   //private methods
var panToSelection = function(sitename){
var cachedsite = markersArray[sitename];
if(!cachedsite){
                     console.log("Site no in map yet");
return;
              }
              map.panTo(cachedsite.latlong);   
       };
//public method/s
       //Getting the browser width and height
var getbrowserWidthandheight=  function(){
var winW = 630, winH = 460;
if (document.body && document.body.offsetWidth) {
                     winW = document.body.offsetWidth;
                     winH = document.body.offsetHeight;
              }
if (document.compatMode == 'CSS1Compat' && document.documentElement &&
                       document.documentElement.offsetWidth) {
                     winW = document.documentElement.offsetWidth;
                     winH = document.documentElement.offsetHeight;
              }
if (window.innerWidth && window.innerHeight) {
                     winW = window.innerWidth;
                     winH = window.innerHeight;
              }
       };
      //Associate methods that you want to be visbile outside the module , public methods to the object that you return from this
my.getbrowserWidthandheight =getbrowserWidthandheight; //public methods
   return my;
})(ContenPackModuleName  );

Invoke from Another JS /JSP or HTML 
ContenPackModuleName .getbrowserWidthandheight();
>>Result eamplpel o/p --> Object   { winHeight 513 winWidth 1218 }

More details regarding this pattern 
 Option 2: Using NameSpace- Use this if you need to create new objects which is not possible with closures/module pattern

var KKR= KKR|| {} ; //check if the variable with name KKRexists.  If not, create a new variable with empty value
KKR.SiteCreation = KKR.SiteCreation || {};
KKR.SiteCreation.statusTable = KKR.SiteCreation.statusTable || {};
//Declaring a global function.  
KKR.SiteCreation.statusTable.setId = function(id) {
    KKR.Portal.Page.id = id;
}
This function can be accessed from any js like //KKR.SiteCreation.statusTable.setId(123);

Do not put logic in JSP

  (Scriptlets is the Java Code you typeaway between <% %> tags in JSP; This is a very common mistake done by almost everybody starting our from Java background; BEWARE )
   I t is a  bad practice and a  common beginner's error; neither  generate the HTML page in the Servlet (which is okay for HelloWorld but not for anything more complicated)
Second: Use JSP minimally and only if really needed  
So what is the way ? Here is the gist --> Instead of using a JSP, GSP, or ERB to assemble a page server side and send back HTML, we have the server send back just the dynamic data as JSON and have the page assembled in the browser :You can leave or read the rest of the LinkedIn  story  
When using JSP's  (minimally  and when  needed) use with already available taglibs (JSTL) or liferay provided (mandatory rule)

Use JSON for Servlet to JS communication

JSON is wonderful . In the Servlet side you can use either Google's GSON library to create JSON objects and JS can read JSON objects as is as it is a JavaScript Object; I
f it comes encoded in a string use a library

Use JavaScript Libraries for DOM Manipulation

Browser implementation of JavaScript methods vary;  It is highly recommended to use a high level JavaScript library like JQuery  for DOM manipulation.
Make sure that the library used supports all popular A grade browsers and not just some browsers or maybe latest browsers etc. Hence the recommendation for JQuery , Alloy UI 

Code Analysis -Mandatory prerequisite for  Code Review

The NEED - As illustrated in the beginning, there are certain conventions that need to be mandatorily followed in JS , unlike in other languages. Since the rules are too many to write here it is best thatcode analysis tools catch these. 
During development - JSHint Eclipse Plugin.
 This will help in development and also show at a glance in code review too.  Kindly install this
SnapShot from Eclipse IDE on  one file where it can dynamically check
 
JS  SONAR Pluging for monitoring in CI
 JavaScript: The Good Parts Doug Crockford videobookblog his tool-JSLint which is the basis of most other rules and tools today
(Doug Crockfords has invented JSHint , JSON etc and his advice is pretty good; but we should be pragmatic about this)
Unit testing (where ever JS logic needs to be tested)
Unit testing JavaScript - JSTestdriver
Java Script can do DOM manipulation as well as do client side logic . The logic part should be kept testable as much a possible and not intermingled with DOM manipulation everywhere. Wherever the logic is trivial then there is no need to unit test it. If there is some logic that needs to be tested the JSTestdriver can be used. Jasmine is another popular framework  which I have not used
Adding details regarding this in another post

External Links

Integration Testing - Selenium WebDriver

Selenium 2 /Selenium WebDriver based Test cases with TestNG (JUnit does not have test case tagging feature, else it can only be used)
 Selenium WebDriver can support all Grade A browsers including iOS and Android browsers,
  Note : It is very easy to create fragile test cases if you don’t use the technology right; I have added a post to describe these

Part 3: JavaScript Advanced Development Guidelines (optional)

Minify and optimize your JavaScript 

Use Google's Closure compiler to minify and optionally to optimize the  JS code; This needs to be mandatory if the JS code size is huge and a practise that is followed in the industry. Be careful of adnaced optimization levels as it needs JS code to be structured according to Google's JavaScript guidelines (and has bugs in generated code )
 Closure Compiler Compilation Levels (Recommended - SIMPLE_OPTIMIZATIONS  from experience with trying out only) 
Minifying  can be done as a java jar command or via the maven plugin for this and use it
Online Generation: http://closure-compiler.appspot.com/home   for trying out
Note- Minifying makes it difficult to debug; One way out of that which is supported by Firefox , chrome etc is using source-map compiler option for the closure compiler when generating the minified code.with this during debugging via browser tools the original source code can be seen.
Performance Driven Development 
WebPage Tuning- Browser Plugins
Tuning your pages the Easy Way –   PageSpeed Insights (by Google) - 
Other options
PET Runs GUI measurement- JMeter with WebDriver Plugin



Tuesday, June 23, 2015

Long running Java process resource consumption monitoring , leak detection and GC tuning


For easily monitoring the JVM metrics there is no better tool than Java VisualVM or its older counterpart JConsole; These two tools comes with the Java JDK. So it is absolutely necessary that you copy a JDK version to some temp directory in which your server runs. {Make sure that you do not install the JDK / put the java executable in the path.}

Before going in JVM monitoring, it is essential to understand a little about Java's memory model.

Note - This article is written with  HotSpot JVM in mind,which is one of the most commonly used one ; implemented by OpenJDK and Oracle - formerly from Sun. This is not the only one , notable being JRockit and others; and some details would change with a different JVM.

When you are running a Java Process in a Server you allocate the Memory for the Java Virtual Machine. This memory is split to process heap { what you give withe the Xmx Xms command}where it is again subdivided on generational basis (Young, Old)  and the PermGen - which is gradually getting removed in Java 7 . Assume that you have not specified the stack size  here.


Have a look at the common parameters usually given below for a JVM running in the server mode,and find the relevance of each. Here you see that the Java heap is given as 31g +  Perm as 1g . So total JVM memory given is 32 g.


java -D[Server:testserver] -XX:PermSize=1024m -XX:MaxPermSize=1024m -Xms31232m -Xmx31232m -server -Xloggc:/opt/gclogs/testserver_gc.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:+PrintGCCause -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintClassHistogramAfterFullGC -XX:G1HeapRegionSize=32m -XX:HeapDumpPath=/opt/testserver/heapdump -XX:+HeapDumpOnOutOfMemoryError -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:StringTableSize=3000000 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintAdaptiveSizePolicy -XX:+ParallelRefProcEnabled -XX:GCHeapFreeLimit=15 -XX:GCTimeLimit=80 -Djavax.net.ssl.keyStore=/opt/certs/keystore.jks -Djava.net.preferIPv4Stack=true  -Djavax.net.ssl.trustStore=/opt/certs/trustore.jks -Djava.io.tmpdir=/opt/process/tmp -Djava.util.prefs.systemRoot=/opt/oodee/jboss-eap/.java/.prefs

Output of top for the above

PID   USER  PR  NI  VIRT  RES  SHR  S %CPU %MEM    TIME+  COMMAND
13168 root  20   0  41.0g  33g  65m S 97.7 78.5   8124:53 java

The JVM itself needs memory for managing the Java process, threads, garbage collection etc. So in the top command for the above mentioned Java process you can see the Resident memory as 33g and virtual memory as 41 g.  You need not be overly concerned about the Virtual memory. Linux is good in managing virtual memory. You need to keep a check only for the resident memory.

So the JVM process is asking Linux for 41g or 8g to do its work and really using(resident memory) about 33g currently.

The resident memory is the RAM occupied by objects in the Heap (biggest part), JVM internal data , meta data of classes and things like Stringtable -PermGen and Thread Stacks. You cannot map the resident memory directly only to heap usage of the application.

We can dig deeper and see how much heap is used by using tools like JConsole or JVisualVM.

There are also command line utilities from the JDK by which you can get the same information, However theses do not plot the values over time, so it is hard to get the trend from these utilities. However they are very useful for JVM heap/ garbage collection introspection,

You can also use nice utilities like JvmTop to get this information.
 JvmTop 0.8.0 alpha - 11:35:39,  amd64, 12 cpus, Linux 2.6.32-35, load avg 0.46

 http://code.google.com/p/jvmtop

 PID MAIN-CLASS      HPCUR HPMAX NHCUR NHMAX    CPU     GC    VM      USERNAME   #T DL
13168 oss-modules.jar 10172m 31232m  627m        1072m  8.80%  0.00% O7U71  root  1343

Currently the process is only using 10 g of the 31g allocated heap. So the rest of the 20g is permgen + threads *stacksize + jvms internal data.

As you can see JVM is no light weight process. Once upon a time when RAM was dear this would have been an issue; not now, at least not unless you start a lot of JVMs in one node.

The same or more indepth information can be got from the JDK tools.

These are jstack- to get Stack trace -for debugging CPU consuming threads, or thread locks; jmap to get the heap dump or heap histogram  and the jstat utility which gives the generational usage, and also the live GC events as they happen in the JVM.


./jstat -gccapacity 13168 20000

 NGCMN    NGCMX     NGC    S0C   S1C       EC      OGCMN      OGCMX       OGC         OC      PGCMN    PGCMX     PGC       PC     YGC    FGC
0.0 31981568.0 19333120.0  0.0 655360.0 18677760.0    0.0 31981568.0 12648448.0 12648448.0 1048576.0 1048576.0 1048576.0 1048576.0   1745     1
     0.0 31981568.0 19333120.0    0.0 655360.0 18677760.0        0.0 31981568.0 12648448.0 12648448.0 1048576.0 1048576.0 1048576.0 1048576.0   1745     1

./jstat -gccause 13168 20000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00 100.00  30.70  67.18  58.30   1745 1513.719     1   20.213 1533.932 G1 Evacuation Pause  No GC
  0.00 100.00  32.46  67.18  58.30   1745 1513.719     1   20.213 1533.932 G1 Evacuation Pause  No GC
./jstat -gcutil 13168 20000
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
  0.00 100.00  34.56  67.18  58.30   1745 1513.719     1   20.213 1533.932

Now moving on to the common use of the tools for other purposes.For heap dumps

su  proceeuser
 /bin/jmap -dump:format=b,file=/tmp/2930javaheap.hrpof 2930(pid of process)
For dumping a big heap make sure you have that much disk space , efficient FTP and powerful machine to analyze; Else anything over maybe 12 to 16 GB is useless Eclipse MAT is a good tool to analyze the heap dump; Also if you cannot transfer ,you can run Jhat on the heap on the same node and you will be able to use the web browser to browse the results generated ; Depending on the heap size and host system resources these tools take some time to analyze

/jdk/jdk1.6.0_38/bin/jhat -J-d64  -stack false -refs false -port 9191 /tmp/45133heap.bin
/jdk/jdk1.6.0_38/bin/jhat -J-d64  -port 9191 /tmp/45133heap.bin
A lighter version of this is class histogram; With the live option it will trigger a full GC and collect the class histogram; So after all tasks are done if you take a histogram and repeat it for one or two cycles and see which are the instances which go up ; can be very helpful in identifying leaks (see below a python script which can help you compare two histograms)

/jdk/jdk1.6.0_38/bin/jmap -histo:live 60030 > /tmp/60030istolive1330.txt

Output will be of the form

 015-06-07T10:58:00.653+0300: 179686.836: [Class Histogram (after full gc):
 num     #instances         #bytes  class name
----------------------------------------------
   1:       6591148     2747828352  [C
   2:        298130      237694584  [B
   3:       6572509      157740216  java.lang.String
   4:       5097003      151094560  [Ljava.lang.Object;
   5:       3810491      121935712  java.util.HashMap$Entry
   6:        617568       89067520  
   7:        617568       79061744  


Another important tool; which will help you see the GC process ongoing in the live JRE is jstat; This also has very low overhead

jboss-as@backendnode root]$  /jdk/jdk1.6.0_38/bin/jstat -gccause 60030 (jboss server pid)  5000 (will repeat every 5 seconds)
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00   0.00   9.54  15.05  46.86   4076  668.310    29  164.007  832.317 Heap Inspection Initiated GC No GC

Further; certain run time parameters can be added to the JRE to get more information; Some of these can be used in production; GC logs are one such ; Currently it is configured to be printed to opt/gclogs Do a ls –lrt and see the last modified file and tail it to see the GC logging happening Note – JDK has the Xrunhorf with options that I have been trying out in standalone clients; This prints after application stop or after cntrl break;

Any queries please do post; Here is a Python script to compare two histogram dumps;
histogramparser.py
__author__ = 'acp'

import re
import fileinput
import operator
import sys

objectschanged={}

def create_object_list(line2,mapofObjects,instance):
    container = line2.split()
    #print(container)
    if container[3] in mapofObjects:
        val=mapofObjects[container[3]]
        #print(container[3],"difference=",abs(int(container[2])-val))
        objectschanged[container[3]]=(int(container[instance])-val)
    else:
        mapofObjects[container[3]]=int(container[instance])


#Call the main function

print("A Python Script to parse the Jmap generated Histograms ")
print("Author - Alex.Punnen")
print("Usage- jmaphistoparser.py  filename1 filename2")
#print ("Where M is the difference in memory and I difference in isntance and fo")

instance=1 #for instance difference
#instance=2 #for memeory difference

mapofObjects={}
content=[]
for line in fileinput.input():
    content.append(line)

count=0
#skip lines
while len(content) >0 :
    line=content.pop()
    if(':' in  line):
        count+=1
        create_object_list(line,mapofObjects,instance)

print("------------------------Objects Changed Between Heaps-------------------------------------------")
sorted_x = sorted(objectschanged.items(), key=operator.itemgetter(1))
for(x,y) in sorted_x:
   print(x,y)


Total Pageviews