The Serial GC
The configuration of the Serial GC is a young generation that operates as described earlier, over an old generation managed by a sliding compacting mark-sweep, also known as a mark-compact garbage collector.
Both minor and full garbage collections take place in a stop-the-world fashion (i.e., the application is stopped while a collection is taking place). Only after the garbage collection has finished is the application restarted.
The mark-compact garbage collector first identifies which objects are still live in the old generation. It then slides them toward the beginning of the heap, leaving any free space in a single contiguous chunk at the end of the heap. This allows any future allocations into the old generation, which will most likely take place as objects are being promoted from the young generation, to use the fast bump-the-pointer technique.
In below figure Objects marked with a gray X are assumed to be garbage. The shaded area at the end of the compacted space denotes reclaimed (e.g., free) space.
The Serial GC is the garbage collector of choice for most applications that do not have low pause time requirements and run on client-style machines. It takes advantage of only a single virtual processor for garbage collection work (hence, its name). Still, on today’s hardware, the Serial GC can efficiently manage a lot of nontrivial applications with a few 100MBs of Java heap, with relatively short worst-case pauses (around a couple of seconds for full garbage collections). Another popular use for the Serial GC is in environments where a high number of JVMs are run on the same machine (in some cases, more JVMs than available processors!). In such environments when a JVM does a garbage collection it is better to use only one processor to minimize the interference on the remaining JVMs, even if the garbage collection might last longer. And the Serial GC fits this trade-off nicely.
The Parallel GC: Throughput Matters!
These days, a lot of important Java applications run on (sometimes dedicated) servers with a lot of physical memory and multiple processors. Ideally, the garbage collector should take advantage of all available processing resources and not leave most of them idle while it is doing garbage collection work. To decrease garbage collection overhead and hence increase application throughput on server-style machines, the HotSpot VM includes the Parallel GC, also called the Throughput GC. Its operation is similar to that of the Serial GC (i.e., it is a stopthe-world GC with a copying young generation over a mark-compact old generation).
However, both the minor and full garbage collections take place in parallel, using all available processing resources, as illustrated in Figure above(Parallel GC (b) ).
Note that earlier version of this garbage collector actually performed old collections serially. This has been rectified since the introduction of the Parallel Old GC. Applications that can benefit from the Parallel GC are those that require high throughput and have pause time requirements that can be met by the worst-case stop-the-world induced full garbage collection durations along with being run on machines with more than one processor. Applications such as batch processing
engines, scientific computing, and so on are well suited for Parallel GC. The Parallel GC, compared to the Serial GC, improves overall garbage collection efficiency, and as a result also improves application throughput.
The Mostly-Concurrent GC: Latency Matters!
For a number of applications, end-to-end throughput is not as important as rapid response time. In the stop-the-world garbage collection model, when a garbage collection is taking place, the application threads are not running, and external requests will not be satisfied until the application threads are restarted at the end of a garbage collection. Minor garbage collections do not typically cause long pauses. However, full garbage collections or compacting garbage collections, even though infrequent, can impose long pauses, especially when large Java heaps are involved.
To deal with this, the HotSpot VM includes the Mostly-Concurrent GC, also known as the Concurrent Mark-Sweep GC (CMS). It manages its young generation the same way the Parallel and Serial GCs do. Its old generation, however, is managed by an algorithm that performs most of its work concurrently, imposing only two short pauses per garbage collection cycle. Figure 3-8a illustrates how a garbage collection cycle works in CMS. It starts with a short pause, called the initial mark, that identifies the set of objects that are immediately reachable from outside the old generation. Then, during the concurrent marking phase, it marks all live objects that are transitively reachable from this set.
Because the application is running and it might be updating reference fields (hence, modifying the object graph) while the marking phase is taking place, not all live objects are guaranteed to be marked at the end of the concurrent marking phase. To deal with this, the application is stopped again for a second pause, called the remark pause, which finalizes the marking information by revisiting any objects that were modified during the concurrent marking phase. The card table data structure is reused to also keep track of modified objects. Because the remark pause is more substantial than the initial mark, it is parallelized to increase its efficiency.
To reduce further the amount of work the remark pause has to do, the concurrent pre-cleaning phase was introduced. As above Figure “(a)” shows, it takes place after the concurrent marking phase and before the remark pause and does some of the work that would have been done during the remark pause, i.e., revisiting objects that were modified concurrently with the marking phase. Even though there is still a need for the remark pause to finalize marking (given that the application might update more objects during the pre-cleaning phase), the use of pre-cleaning can reduce, sometimes dramatically, the number of objects that need to be visited during the remark pause, and, as a result, it is very effective in reducing the duration of the remark pause.
At the end of the remark pause, all live objects in the Java heap are guaranteed to have been marked. Since revisiting objects during the pre-cleaning and remark phases increases the amount of work the garbage collector has to do (as compared to, say, the Parallel GC that only visits objects once during marking), the overall overhead of CMS also increases accordingly. This is a typical trade-off for most garbage collectors that attempt to reduce pause times.
Having identified all live objects in the old generation, the final phase of the garbage collection cycle is the concurrent sweeping phase, which sweeps over the Java heap, deallocating garbage objects without relocating the live ones. Previous figure illustrates the operation of the sweeping phase. Again, objects marked with a gray X are assumed to be garbage, and the shaded areas in the post-sweep space denote free space. In this case, free space is not contiguous, and the garbage collector needs to employ a data structure (free lists, in the case of the HotSpot VM) that records which parts of the heap contain free space. As a result, allocation into the old generation is more expensive, as allocation from free lists is not as efficient as the bump-the-pointer approach. This imposes extra overhead to minor garbage collections, as most allocations in the old generation take place when objects are promoted during minor garbage collections.
Another disadvantage that CMS has, that the previous two don’t, is that it typically has larger Java heap requirements. There are a few reasons for this. First, a concurrent marking cycle lasts much longer than that of a stop-the-world garbage collection. And it is only during the sweeping phase that space is actually reclaimed. Given that the application is allowed to run during the marking phase, it is also allowed to allocate memory, hence the occupancy of the old generation potentially increases during the marking phase and decreases only during the sweeping phase. Additionally, despite the garbage collector’s guarantee to identify all live objects during the marking phase, it doesn’t actually guarantee that it will identify all objects that are garbage. Objects that become garbage during the marking phase may or may not be reclaimed during the cycle. If they are not, then they will be reclaimed during the next cycle. Garbage objects that are not identified during a garbage collection are usually referred to as floating garbage.
Compared to the Parallel GC, CMS decreases old-generation pauses—sometimes dramatically—at the expense of slightly longer young generation pauses, some reduction in throughput, and extra heap size requirements. Due to its concurrency, it also takes CPU cycles away from the application during a garbage collection cycle. Applications that can benefit from it are ones that require rapid response times (such as data-tracking servers, Web servers, and so on), and it is in fact widely used in this context.
The Garbage-First GC: CMS Replacement
The Garbage-First GC (aka G1) is a parallel, concurrent, and incrementally compacting low-pause garbage collector intended to be the long-term replacement of CMS. G1 uses a drastically different Java heap layout to the other garbage collectors in the HotSpot VM. It splits the Java heap into equal-sized chunks called regions. Even though G1 is generational, it does not have physically separate spaces for the young and old generations. Instead, each generation is a set of (maybe noncontiguous) regions. This allows G1 to resize the young generation in a flexible way.
All space reclamation in G1 takes place by evacuating the surviving objects from one set of regions to another and then reclaiming the initial (and typically larger) set of regions. Most of the time such garbage collections collect only young regions (which make up G1’s young generation), and they are the equivalent of minor garbage collections. G1 also periodically performs concurrent marking cycles that identify which non-young regions are either empty or mostly empty. These are the regions that are the most efficient to collect (i.e., G1 gets back the most free space for the least amount of work), and they are scheduled for garbage collection in favor to the rest of the regions. This is where G1 gets its name from: It goes after regions with the most garbage objects in them.
For more information on G1, please listen to the talk located at http://developers.sun.com/learning/javaoneonline/j1sessn.jsp?sessn=TS-5419&yr=2008&track=javase
Comparison of Garbage Collectors
Typically garbage collection overhead can be reduced by reducing one or more of the preceding metrics. However, sometimes this is either impossible (e.g., it might not be possible to compress further the data that needs to be loaded into the Java heap; or it is difficult to write a useful application that does not update references at all), or even undesirable (reusing objects can reduce the allocation rate, but it is also more time-consuming to implement and maybe more error-prone too). But by avoiding some bad programming practices, it is possible to find a good balance between having low garbage collection overhead, as well as well-written, easily maintained code. Bad programming practices to avoid include object pooling (pooled objects are long-lived, hence they increase the live data size of the old generation and initializing writes to them can also increase the number of reference updates in the old generation), sloppy sizing of array-based data structures (e.g., if an ArrayList is initially sized too small, its backing array might subsequently need to be resized several times, causing unnecessary allocation), and so on.