isenberg
I'm new here

Scheduled defragmentation of large Java heaps

The default FirstSpirit Server configuration selects the Concurrent Mark Sweep Garbage Collector (GMC GC) of the Java Virtual Machine (JVM) as default. The CMS GC offers the best tradeoff between application throughput, memory usage, CPU usage and application pauses in settings with large Java Heaps around 20 GByte for FirstSpirit. With more than 12 CPU cores, even larger heaps of 60 GByte are well usable with some few customers systems even reaching 120 GByte.

Normal operation of the CMS GC uses these three different modes of operation:

  1. Minor GC: Cleaning the memory pool reserved for temporary objects (NewGen) and moving the remaining tenured objects to the memory pool reserved for long living objects (OldGen). This processing stops the application for a few 1/100 of a second in intervals of every few seconds, imperceptible for the user.
  2. Major GC: Cleaning the memory pool reserved for long living objects (OldGen) by removing non-referenced objects. This process is done by "marking and sweeping" over the OldGen in parallel to the normal application operation for a few seconds in intervals of minutes to hours. As the processing is done in parallel no noticable impact to the application is detectable if enough CPU cores are present.
  3. Full GC: Defragmentation of the OldGen memory pool. Over long application uptimes of days to weeks, the OldGen becomes fragmented as the CMS GC is not compacting used blocks of memory which are separated by empty blocks of memory. It is the same effect as observed on some file systems. As the CMS GC needs at least free memory blocks the size of the block of objects to be moved from NewGen to OldGen, it will stop in this situation and triggering a Full GC. A Full GC stops the application for a longer time as the complete Java heap is inspected and substantial amount of data is moved around. With a heap size of 20 GByte, pauses of 20 seconds and more occur.

There are other modes of operation of the CMS GC regarding other memory pools like CodeCache and PermGen or Metaspace. These will be described in a separate article and also the situation of overloading the CMS GC, which triggers a Full GC, will be considered in the seperate article.

The GC log file of the FirstSpirit Server at firstspirit5/log/fs-gc.log provides information, which type of GC operation occured:

  1. Minor GC, causing an imperceptible application pause of only 0.03s:
    2016-03-15T08:30:17.806+0100: 3395.882:
    [GC (Allocation Failure) [ParNew: 1500718K->110462K(2730688K), 0.0295321 secs]
    1500718K->110462K(8874688K), 0.0297294 secs]
    [Times: user=0.33 sys=0.01, real=0.03 secs]
  2. Major GC, running in parallel (concurrent) to the application for about 5.57s:
    2016-04-01T16:42:37.901+0200: 170194.033: [CMS-concurrent-mark: 3.874/5.571 secs]
    [Times: user=8.27 sys=0.20, real=5.57 secs]
  3. Full GC, causing a long application pause of 9.34s:
    2016-04-01T17:02:12.104+0200: 171368.235:
    [Full GC: [CMS: 7995731K->8700441K(10121216K), 9.3396140 secs]
    11910824K->8700441K(14558592K), [CMS Perm : 76703K->76570K(1048576K)], 9.3397690 secs]
    [Times: user=9.29 sys=0.00, real=9.34 secs]
    The Full GC is also visible in the FirstSpirit Wrapper log file firstspirit5/log/fs-wrapper.log:
    INFO   | jvm 1    | 2016/04/01 17:02:21 | WrapperManager:
    The timer fell behind the system clock by 9.800 ms.

The time interval between the Full GCs is usually large in the magnitude of hours or days, but it cannot be avoided as it is a normal mode of operation of the CMS GC. However, it can be controlled in a way to mitigate disturbance of FirstSpirit as application, by triggering it externally and thereby enforcing a defragmentation at a preferred time, for instance outside of office hours.

For triggering the Full GC externally, beyond other, the following methods exists:

  • Invoking the utilit "jmap", which is provided by the Oracle JDK on the process ID of the FirstSpirit Server's JVM process:
    jmap -histo:live JAVAPID > /dev/null
    This method can be used as crontab entry on JVMs not controlled by the FirstSpirit Java Wrapper, for instance Tomcat. The JAVAPID can also be listed by the "jps" utility.
  • Using the FirstSpirit Java Wrapper's timer. This is preferred for FirstSpirit and usable since FirstSpirit 5.1.

For triggering the Full GC and thereby defragment the Java heap, some modifications of the FirstSpirit Java Wrapper configuration in file firstspirit5/conf/fs-wrapper.conf is required. Before changing this file, shut down the FirstSpirit Server.

Add the following lines at any location, preferred after the #*** comment header.
If a different time than daily 1am local server time is preffered, change it.

# controlled heap defragmentation to prevent Full GC during office hours

wrapper.timer.1.action=GC

wrapper.timer.1.interval=day-of-week=*; hour=1

Modify the following line

wrapper.java.additional.19=-XX:+DisableExplicitGC

by removing the flag DisableExplicitGC:

wrapper.java.additional.19=

Modify the following lines

wrapper.java.additional.26=-Djava.rmi.dgc.leaseValue=3600000

wrapper.java.additional.27=

wrapper.java.additional.28=

to

# disable automatic Full GC every minute caused by the RMI/JMX subsystem:

wrapper.java.additional.26=-Djava.rmi.dgc.leaseValue=9223372036854775807

wrapper.java.additional.27=-Dsun.rmi.dgc.server.gcInterval=9223372036854775807

wrapper.java.additional.28=-Dsun.rmi.dgc.client.gcInterval=9223372036854775807

Now the FirstSpirit Server can be started again.

The triggering will be written to firstspirit5/log/fs-wrapper.log as follows:

STATUS | wrapper  | 2016/04/02 01:00:00 | Timer #1.  Requesting GC...

With the Full GC shown in firstspirit5/log/fs-gc.log:

2016-04-02T01:00:00.047+0200: 1527978.123: [Full GC (System.gc())
[CMS:
2705398K->3094791K(6144000K), 1.2200197 secs] 4073662K->3094791K(8874688K),
[Metaspace: 86716K->86716K(647168K)],
1.2207459 secs]
[Times: user=1.23 sys=0.00, real=1.22 secs]

When increasing the JVM heapsize to more than 32 GByte you should skip the range between 32 GByte and 50 GByte as it even reduces the usable application memory! For details about the "turbo lag" see https://community.e-spirit.com/docs/DOC-1965

Labels (2)
Comments

With FirstSpirit 5.2R3 and 5.1R5 and older, the file fs-wrapper.conf needs to be changed directly on the file system with an external text editor as FirstSpirit ServerMonitoring rejects one of the new parameters with the following error message:

unexpected configuration property key 'wrapper.event.jvm_prelaunch.command.block.action'