Garbage collection is the single most important factor when tuning a JVM for long-running, serverside applications. Improperly tuned garbage collectors or applications that create unnecessarily large numbers of objects can significantly affect the efficiency of your application. It is not uncommon to find that garbage collection consumes a significant amount of the overall processing time in a server-side Java application. Proper tuning of the garbage collector can significantly reduce the garbage collector’s processing time and, therefore, can significantly improve your application’s throughput.
If you are looking for command line parameters head over to Java command line options for JVM performance tuning thread for more info on the same.
Understanding Garbage Collection
Garbage collection (GC) is the technique a JVM uses to free memory occupied by objects that are no longer being used by the application. The Java Language Specification does not require a JVM to have a garbage collector, nor does it specify how a garbage collector should work. Nevertheless, all of the commonly used JVMs have garbage collectors, and most garbage collectors use similar algorithms to manage their memory and perform collection operations.
Just as it is important to understand the workload of your application to tune your overall system properly, it is also important to understand how your JVM performs garbage collection so that you can tune it. Once you have a solid understanding of garbage collection algorithms and implementations, it is possible to tune application and garbage collection behavior to maximize performance. Some garbage collection schemes are more appropriate for applications with specific requirements. For example, near-real-time applications care more about avoiding garbage collection pauses whereas most OLTP applications care more about overall throughput. Once you have an understanding of the workload of the application and the different garbage collection algorithms your JVM supports, you can optimize the garbage collector configuration.
The purpose of the garbage collection in a JVM is to clean up objects that are no longer being used. Garbage collectors determine whether an object is eligible for collection by determining whether objects are being referenced by any active objects in the system. The garbage collector must first identify the objects eligible for collection. The two general approaches for this are reference counting and object reference traversal. Reference counting involves storing a count of all of the references to a particular object. This means that the JVM must properly increment and decrement the reference count as the application creates references and as the references go out of scope. When an object’s reference count goes to zero, it is eligible for garbage collection.
Although early JVMs used reference counting, most modern JVMs use object reference traversal. Object reference traversal simply starts with a set of root objects and follows every link recursively through the entire object graph to determine the set of reachable objects. Any object that is not reachable from at least one of these root objects is garbage collected. During this object traversal stage, the garbage collector must remember which objects are reachable so that it can remove those that are not; this is known as marking the object.
The next thing that the garbage collector must do is remove the unreachable objects. When doing this, some garbage collectors simply scan through the heap, removing the unmarked objects and adding their memory location and size to a list of available memory for the JVM to use in creating new objects; this is commonly referred to as sweeping. The problem with this approach is that memory can fragment over time to the point where there are a lot of small segments of memory that are not big enough to use for new objects but yet, when added all together, can make up a significant amount of memory. Therefore, many garbage collectors actually rearrange live objects in memory to compact the live objects, making the available heap space contiguous.
In order to do their jobs, garbage collectors usually have to stop all other activity for some portion of the garbage collection process. This stop-the-world approach means all application-related work stops while the garbage collector runs. As a result, any in-flight requests will experience an increase in their response time by the amount of time taken by the garbage collector. Other, more sophisticated collectors run either incrementally or truly concurrently to reduce or eliminate the application pauses. Some garbage collectors use a single thread to do their work; others employ multiple threads to increase their efficiency on multi-CPU machines. Look at a few of the garbage collectors used by modern JVMs. If you are interested for GC tuning parameters then check Java GC Tuning using GarbageCat Tool thread for more info.
This type of collector first traverses the object graph and marks reachable objects. It then scans the heap for unmarked objects and adds their memory to a list of available memory segments. This collector typically uses a single thread to do its work and is a stop-the-world collector.
A mark-and-compact collector, sometimes known as a marksweep-compact collector, uses the same marking phase as a mark-and-sweep collector. During the second phase, it compacts the heap by copying marked objects to a new area of the heap. These collectors are also stop-the-world collectors.
This type of collector divides the heap into two areas, commonly known as semi-spaces. It uses only one semi-space at a time; the JVM creates all new objects in one semi-space. When the garbage collector runs, it copies any reachable objects it finds to the other semi-space as it finds them, thus compacting the heap as it copies live objects. All dead objects are left behind. This algorithm works well for short-lived objects, but the expense of continually copying long-lived objects makes it less efficient. Again, this is a stop-the-world collector.
Incremental collectors basically divide the heap into multiple areas and collect garbage from only one area at a time. This can create much smaller, though more frequent, pauses in your application. Numerous approaches exist for defining how the actual collection is handled from traditional mark-and-sweep to algorithms designed explicitly for use with multiple smaller areas like the train algorithm. See ‘‘Incremental Mature Garbage Collection Using the Train Algorithm’’ by Jacob Seligmann and Steffen Grarup (see Link 13-13) for more information.
This type of collector divides the heap into two or more areas that it uses to store objects with different lifetimes. The JVM generally creates all new objects in one of these areas. Over time, the objects that continue to exist get tenure and move into another area for longer-lived objects. Generational collectors often use different algorithms for the different areas to optimize performance.
Concurrent collectors run concurrently with the application, typically as one or more background threads. These collectors typically have to stop-the-world at some point to complete certain tasks, but the amount of time they halt all processing is significantly reduced because of their other background work.
Parallel collectors typically use one of the traditional algorithms but use multiple threads to parallelize their work on multiprocessor machines. Using multiple threads on multi-CPU machines can dramatically improve the scalability of a Java application on multiprocessor machines.
Tuning the Sun HotSpot JVM Heap Size
Sun Microsystem’s HotSpot JVM uses a generational collector that partitions the heap into three main areas: the new generation area, the old generation area, and the permanent generation area. The JVM creates all new objects in the new generation area. Once an object survives a certain number of garbage collection cycles in the new generation area, it gets promoted, or tenured, to the old generation area. The JVM stores Class and Method objects for the classes it loads in a section of the heap known as the permanent generation area. From a configuration perspective, the permanent generation area in the Sun HotSpot JVM is a separate area that is not considered part of the heap.
You can use the -Xms and -Xmx flags to control the initial and maximum size of the entire heap, respectively. For example, the following command sets the initial size of the entire heap to 128 megabytes (MBs) and the maximum size to 256 MBs:
java -Xms128m -Xmx256m ...
To control the size of the new generation area, you can use the -XX:NewRatio flag to set the proportion of the overall heap that is set aside for the new generation area. For example, the following command sets the overall heap size to 128 MBs and sets the new ratio to 3. This means that the ratio of the new area to the old area is 1:3; the new area is one-fourth of the overall heap space, or 32 MBs, and the old area is three-fourths of the overall heap space, or 96 MBs.
java -Xms128m -Xmx128m -XX:NewRatio=3 ...
The initial and maximum sizes for the new area can be set explicitly using the -XX:NewSize and -XX:MaxNewSize flags or the -Xmn flag. For example, the command shown here sets the initial and maximum size to 64 MBs:
java -Xms256m -Xmx256m -Xmn64m ...
Configuration-wise, the permanent area is not considered part of the heap. By default, the initial size of the permanent area is 4 MBs. As your application loads and runs, the JVM will resize the permanent area as needed up to the maximum size for this area. Every time it resizes the permanent area, the JVM does a full garbage collection of the entire heap (and the permanent area). By default, the maximum size is 32 MBs. Use the -XX:MaxPermSize flag to increase the maximum size of the permanent area. When loading large numbers of classes in yourWebLogic Server application, it is not uncommon to need to increase the maximum size of this area. The number of objects stored in the permanent area will grow quickly while the JVM loads classes, and it may force the JVM to resize the permanent area frequently. To prevent this resizing, set the initial size of the permanent area using the -XX:PermSize flag. For example, here we have set the initial size to 64 MBs and the maximum size to 128 MBs:
java -Xms512m -Xmx512m -Xmn128m -XX:PermSize=64m -XX:MaxPermSize=128m ...
By default, HotSpot uses a copying collector for the new generation area. This area is actually subdivided into three partitions. The first partition, known as Eden, is where all new objects are created. The other two semi-spaces are also called survivor spaces. When Eden fills up, the collector stops the application and copies all reachable objects into the current from survivor space. As the current from survivor space fills up, the collector will copy the reachable objects to the current to survivor space. At that point, the from and to survivor spaces switch roles so that the current to space becomes the new from space and vice versa. Objects that continue to live are copied between survivor spaces until they achieve tenure, at which point they are moved into the old generation area.
Use the –XX:SurvivorRatio flag to control the size of these subpartitions. Like the NewRatio, the SurvivorRatio specifies the ratio of the size of one of the survivor spaces to the Eden space. For example, the following command sets the new area size to 64 MBs, Eden to 32 MBs, and each of the two survivor spaces to 16 MBs:
java -Xms256m -Xmx256m -Xmn64m -XX:SurvivorRatio=2 ...
Below figure shows overview of the HotSpot JVM heap layout
As we discussed previously, HotSpot defaults to using a copying collector for the new area and a marksweep-compact collector for the old area. Using a copying collector for the new area makes sense because the majority of objects created by an application are short-lived. In an ideal situation, all transient objects would be collected before making it out of the Eden space. If we were able to achieve this, and all objects that made it out of the Eden space were long-lived objects, then ideally we would immediately tenure them into the old space to avoid copying them back and forth in the survivor spaces.
Unfortunately, applications do not necessarily fit cleanly into this ideal model because they tend also to have a small number of intermediate-lived objects. It is typically better to keep these intermediatelived objects in the new area because copying a small number of objects is generally less expensive than compacting the old heap when they have to be garbage collected in the old heap. To control the copying of objects in the new area, use the -XX:TargetSurvivorRatio flag to control the desired survivor space occupancy after a collection. Don’t be misled by the name; this value is a percentage.
By default, the value is set to 50. When using large heaps in conjunction with a low SurvivorRatio, you should probably increase this value to somewhere in the neighborhood of 80 to 90 to better utilize the survivor space.
Use the -XX:MaxTenuringThreshold flag to control the upper threshold the copying collector uses before promoting an object. If you want to prevent all copying and automatically promote objects directly from Eden to the old area, set the value of MaxTenuringThreshold to 0. If you do this, you will in effect be skipping the use of the survivor spaces, so you will want to set the SurvivorRatio to a large number to maximize the size of the Eden area, as shown here:
java ... -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=50000 ...
The –verbose:gc switch gives you basic information about what the garbage collector is doing. By turning this switch on, you will get information about when major and minor collections occur, what the memory size before and after the collection was, and how much time the collection took. Look at some sample output from this switch.
[Full GC 21924K->13258K(63936K), 0.3854772 secs] [GC 26432K->13984K(63936K), 0.0168988 secs] [GC 27168K->13763K(63936K), 0.0068799 secs] [GC 26937K->14196K(63936K), 0.0139437 secs]
The first line that starts with Full GC is a major collection of the entire heap. The other three lines are minor collections, either of the new or the old area. The numbers before the arrow indicate the size of the heap before the collection, and the number after the arrow shows the size after the collection. The number in parentheses is the total size of the heap, and the time values indicate the amount of time the collection took.
By turning on the -XX:+PrintGCDetails switch, you can get a little more information about what is happening in the garbage collector. Output from this switch looks like this.
[Full GC [Tenured: 11904K->13228K(49152K), 0.4044939 secs] 21931K->13228K(63936K), 0.4047285 secs] [GC [DefNew: 13184K->473K(14784K), 0.0213737 secs] 36349K->23638K(63936K), 0.0215022 secs]
As with the standard garbage collection output, the Full GC label indicates a full collection. Tenured indicates that the mark-sweep-compact collector has run on the old generation; the old heap size went from 11904K to 13228K; and the total old area size is 49152K. The reason for this increase is that the new area is automatically purged of all objects during a full collection. The second set of numbers associated with the first entry represents the before, after, and total size of the entire heap. This full collection took 0.4047285 seconds. In the second entry, the GC label indicates a partial collection, and DefNew means that the collection took place in the new area; all of the statistics have similar meanings to the first except that they pertain to the new area rather than the old area.
By adding the -XX:+PrintGCTimeStamps switch, the JVM adds information about when these garbage collection cycles occur. The time is measured in seconds since the JVM started, shown in bold here.
21.8441: [GC 21.8443: [DefNew: 13183K->871K(14784K), 0.0203224 secs] 20535K->8222K(63936K), 0.0205780 secs]
Finally, you can add the -XX:+PrintHeapAtGC switch to get even more detailed information. This information will dump a snapshot of the heap as a whole.
To get more information on what is going on in the new area, you can print the object tenuring statistics by adding the -XX:+PrintTenuringDistribution switch, in addition to the -verbose:gc switch, to the JVM command line. The output that follows shows objects being promoted through the ages on their way to being tenured to the old generation.
java -Xms64m -Xmx64m -XX:NewRatio=3 -verbose:gc -XX:+PrintTenuringDistribution ...
[GC Desired survivor size 819200 bytes, new threshold 31 (max 31) - age 1: 285824 bytes, 285824 total 34956K->22048K(63936K), 0.2182682 secs] [GC Desired survivor size 819200 bytes, new threshold 31 (max 31) - age 1: 371520 bytes, 371520 total - age 2: 263472 bytes, 634992 total 35231K->22389K(63936K), 0.0413801 secs] [GC Desired survivor size 819200 bytes, new threshold 3 (max 31) - age 1: 436480 bytes, 436480 total - age 2: 203952 bytes, 640432 total - age 3: 263232 bytes, 903664 total 35573K->22652K(63936K), 0.0432329 secs]
Notice the desired survivor size of 819200 bytes. Why is that? Well, let’s do the math. If the overall heap is 64 MBs and the NewRatio is 3, this means that the new area is one-fourth of the total heap, or 16 MBs. Because we are using the client JVM, the default value of the SurvivorRatio is 8. This means that each survivor space is one-eighth the size of the Eden space. Because there are two survivor spaces, that means that each survivor space is one-tenth of the overall new area size, or 1.6 MBs. Because the default TargetSurivorRatio is 50 percent, this causes the desired survivor size to be about 800 KBs.
You will also notice that the maximum threshold is always 31. The threshold is the number of times the JVM will copy the object between the to and from spaces before promoting it to the old space. Because of the TargetSurvivorRatio discussion previously, the garbage collector will always try to keep the survivor space at or below the target size. The garbage collector will try to age (copy) the objects up to the threshold of 31 times before promoting them into the old area. The garbage collector, however, will recalculate the actual threshold for promotion after each garbage collection. Remember, any full garbage collection cycle will immediately tenure all reachable objects, so always try to tune the garbage collector — especially the PermSize — to prevent full garbage collection cycles from occurring.
In the last entry, you will notice that the garbage collector changed the threshold from the default of 31 to 3. This happened because the garbage collector is attempting to keep the occupancy of the survivor space at its desired survivor size. By adding the size of the objects in all three age categories you will get 903664 bytes, which exceeds the desired survivor size; therefore, the garbage collector reset the threshold for the next garbage collection cycle.
Sun’s JVM comes with several garbage collectors that allow you to optimize the garbage collector based on your application requirements.
Using Oracle JRockit JVM
The Oracle JRockit JVM was designed from the ground up to be a server-side JVM. Instead of lazily compiling the Java byte code into native code as HotSpot does, it precompiles every class as it loads. JRockit also provides more in-depth instrumentation to give you more insight into what is going on inside the JVM at runtime. It does this through Oracle JRockit Mission Control, which provides a standalone GUI console but can also be integrated within your Eclipse IDE.
JRockit supports both dynamic and static garbage collection modes. By default, JRockit dynamically selects a garbage collection strategy to optimize application throughput. Dynamic garbage collection supports three modes:
throughput: Optimizes for maximum throughput
pausetime: Optimizes for short and even pause times
deterministic: Optimizes for very short and deterministic pause times (requires Oracle JRockit Real Time)
JRockit also supports four static garbage collection models:
Single-spaced Parallel Collector
This collector stops the world but uses multiple threads to speed the collection process. It does not segment the heap into multiple areas. Though it will cause longer pauses than the rest, it generally provides better memory utilization and better throughput for applications that don’t allocate large numbers of short-lived objects.
Generational Parallel Collector
This collector stops the world but uses multiple threads to speed the collection process. It segments the heap into a nursery and an old area. New objects are allocated in the nursery and only promoted to the old area after two collection cycles in the nursery area. Though it will cause longer pauses than the rest, it generally provides better memory utilization and better throughput for applications that allocate large numbers of short-lived objects.
Single-spaced Mostly Concurrent Collector
This collector uses the entire heap and does its work concurrently using a background thread. Though this collector can virtually eliminate pauses, you are trading memory and throughput for pause-less collection because it will generally take the collector longer to find dead objects and the collector is constantly running during application processing.If this collector cannot keep up with the amount of garbage the application creates, it will stop the application threads while it finishes its collection cycle.
Generational Mostly Concurrent Collector
This collector uses a stop-the-world parallel collector on the nursery area and a concurrent collector on the old area. Because this collector has more frequent pauses than the single-spaced concurrent collector, it should require less memory and provide more throughput for applications that can tolerate short pauses. Remember that an undersized nursery area can cause large numbers of temporary objects to be promoted to the old area. This will cause the concurrent collector to work harder and may cause it to fall behind to the point where it has to stop the world to complete its cycle.
By default, JRockit uses the dynamic garbage collection strategy to optimize for throughput. To change to one of the other dynamic strategies, use the –XgcPrio:<mode> flag, where valid mode values are throughput and pausetime. JRockit Real Time adds a third value to this list: deterministic. To specify the collector statically, use the –Xgc:<gc_name> flag, where the valid values for the four collectors are singlepar, genpar, singlecon, and gencon, respectively. You can set the initial and maximum heap sizes using the same -Xms and -Xmx flags as you do for the HotSpot JVM. To set the nursery size, use the -Xns flag.
java -jrockit -Xms512m -Xmx512m -Xgc:gencon -Xns128m ...
Although JRockit recognizes the -verbose:gc switch, the information it prints will vary depending on which garbage collector you are using. JRockit also supports verbose output options of memory (same as gc), load, and codegen. Using the default dynamic throughput collector, the -verbose:memory output provides information on both nursery area (nursery GC) and old area (GC) collections, as shown here.
[INFO ][memory ] Running with 32 bit heap and compressed references. [INFO ][memory ] GC mode: Garbage collection optimized for throughput, initial strategy: Generational Parallel Mark & Sweep [INFO ][memory ] heap size: 262144K, maximal heap size: 524288K, nursery size: 131072K [INFO ][memory ] <s>-<end>: GC <before>K-><after>K (<heap>K), <pause> ms [INFO ][memory ] <s/start> - start time of collection (seconds since jvm start) [INFO ][memory ] <end> - end time of collection (seconds since jvm start) [INFO ][memory ] <before> - memory used by objects before collection (KB) [INFO ][memory ] <after> - memory used by objects after collection (KB) [INFO ][memory ] <heap> - size of heap after collection (KB) ... [INFO ][memory ] 6.924: parallel nursery GC 159174K->64518K (262144K), 34.992 ms ... [INFO ][memory ] 48.953-49.041: GC 262144K->81910K (262144K), 88.137 ms
Using the -XgcPause switch will cause JRockit to print output each time the JVM has to pause other threads to complete garbage collection. The output looks like this.
[INFO ][memory ] 28.787: parallel nursery GC 201290K->105482K (262144K), 30.931 ms [INFO ][gcpause] nursery collection pause time: 30.930677 ms [INFO ][memory ] 29.726: parallel nursery GC 223427K->130499K (262144K), 38.595 ms [INFO ][gcpause] nursery collection pause time: 38.594919 ms [INFO ][memory ] 30.297: parallel nursery GC 244085K->145013K (262144K), 22.180 ms [INFO ][gcpause] nursery collection pause time: 22.180263 ms [INFO ][memory ] 30.822: parallel nursery GC 258605K->159341K (262144K), 21.630 ms [INFO ][gcpause] nursery collection pause time: 21.629774 ms [INFO ][gcpause] Threads waited for memory 61.151 ms starting at 31.922 s [INFO ][gcpause] old collection phase 1-0 pause time: 69.134904 ms, (start time: 31.922 s) [INFO ][gcpause] (pause includes compaction: 3.539 ms (external), update ref: 9.769 ms) [INFO ][memory ] 31.922-31.991: GC 262144K->76156K (262144K), 69.135 ms
As we discussed, even the concurrent collector occasionally has to stop the application to do certain phases of its work. If you use the -XgcReport switch, JRockit will print out a summary of the garbage collection activity before it exits.
[INFO ][memory ] [INFO ][memory ] Memory usage report [INFO ][memory ] [INFO ][memory ] young collections [INFO ][memory ] number of collections = 10 [INFO ][memory ] total promoted = 2473233 (size 129116408) [INFO ][memory ] max promoted = 551062 (size 31540352) [INFO ][memory ] total GC time = 0.415 s [INFO ][memory ] mean GC time = 41.500 ms [INFO ][memory ] maximum GC Pauses = 54.765 , 58.583, 64.630 ms [INFO ][memory ] [INFO ][memory ] old collections [INFO ][memory ] number of collections = 2 [INFO ][memory ] total promoted = 0 (size 0) [INFO ][memory ] max promoted = 0 (size 0) [INFO ][memory ] total GC time = 0.142 s (pause 0.142 s) [INFO ][memory ] mean GC time = 71.009 ms (pause 71.007 ms) [INFO ][memory ] maximum GC Pauses = 0.000 , 69.135, 72.878 ms [INFO ][memory ] [INFO ][memory ] number of parallel mark phases = 2 [INFO ][memory ] number of parallel sweep phases = 2
What really makes the JRockit JVM so compelling is JRockit Mission Control. JRockit Mission Control is a management console for the JRockit JVM that contains the following tools.
JRockit Management Console: This provides a real-time view into the JVM’s operation by capturing and displaying live data on the CPU and memory usage, as well as garbage collection pauses. This console also gives you control over CPU affinity, garbage collection strategy, and memory pool sizes so that you can adjust settings without restarting the JVM.
JRockit Runtime Analyzer (JRA): The runtime analyzer allows you to make low overhead recordings of detailed information about what is happening inside the JVM. Those recordings can then be analyzed offline to get detailed information on garbage collection, object usage, method and lock profiling, and latency statistics.
JRockit Latency Analyzer: Using a JRA recording, the latency analyzer graphically shows you all the latency events occurring in your application. Through this tool, you can easily identify areas of contention where your application threads are blocked waiting on locks, database I/O, and any other type of event that may cause latency.
JRockit Memory Leak Detector: This tool allows you to find memory leaks in production applications with very low overhead without needing to restart the JVM. It can track down even the smallest memory leaks and presents the information in a style that simplifies the task of determining the exact cause of the leak.