Java Garbage Collection

A Definition of Java Garbage Collection

Java garbage col­lec­tion is the process by which Java pro­grams per­form auto­mat­ic mem­o­ry man­age­ment. Java pro­grams com­pile to byte­code that can be run on a Java Vir­tu­al Machine, or JVM for short. When Java pro­grams run on the JVM, objects are cre­at­ed on the heap, which is a por­tion of mem­o­ry ded­i­cat­ed to the pro­gram. Even­tu­al­ly, some objects will no longer be need­ed. The garbage col­lec­tor finds these unused objects and deletes them to free up mem­o­ry.

How Java Garbage Collection Works

Java garbage col­lec­tion is an auto­mat­ic process. The pro­gram­mer does not need to explic­it­ly mark objects to be delet­ed. The garbage col­lec­tion imple­men­ta­tion lives in the JVM. Each JVM can imple­ment garbage col­lec­tion how­ev­er it pleas­es ; the only require­ment is that it meets the JVM spec­i­fi­ca­tion. Although there are many JVMs, Oracle’s HotSpot is by far the most com­mon. It offers a robust and mature set of garbage col­lec­tion options.

While HotSpot has mul­ti­ple garbage col­lec­tors that are opti­mized for var­i­ous use cas­es, all its garbage col­lec­tors fol­low the same basic process. In the first step, unref­er­enced objects are iden­ti­fied and marked as ready for garbage col­lec­tion. In the sec­ond step, marked objects are delet­ed. Option­al­ly, mem­o­ry can be com­pact­ed after the garbage col­lec­tor deletes objects, so remain­ing objects are in a con­tigu­ous block at the start of the heap. The com­paction process makes it eas­i­er to allo­cate mem­o­ry to new objects sequen­tial­ly after the block of mem­o­ry allo­cat­ed to exist­ing objects.

All of HotSpot’s garbage col­lec­tors imple­ment a gen­er­a­tional garbage col­lec­tion strat­e­gy that cat­e­go­rizes objects by age. The ratio­nale behind gen­er­a­tional garbage col­lec­tion is that most objects are short-lived and will be ready for garbage col­lec­tion soon after cre­ation.

The heap is divid­ed into three sec­tions :

  • Young Gen­er­a­tion : New­ly cre­at­ed objects start in the Young Gen­er­a­tion. The Young Gen­er­a­tion is fur­ther sub­di­vid­ed into an Eden space, where all new objects start, and two Sur­vivor spaces, where objects are moved from Eden after sur­viv­ing one garbage col­lec­tion cycle. When objects are garbage col­lect­ed from the Young Gen­er­a­tion, it is a minor garbage col­lec­tion event.
  • Old Gen­er­a­tion : Objects that are long-lived are even­tu­al­ly moved from the Young Gen­er­a­tion to the Old Gen­er­a­tion. When objects are garbage col­lect­ed from the Old Gen­er­a­tion, it is a major garbage col­lec­tion event.
  • Per­ma­nent Gen­er­a­tion : Meta­da­ta such as class­es and meth­ods are stored in the Per­ma­nent Gen­er­a­tion. Class­es that are no longer in use may be garbage col­lect­ed from the Per­ma­nent Gen­er­a­tion.

Dur­ing a full garbage col­lec­tion event, unused objects in all gen­er­a­tions are garbage col­lect­ed.

HotSpot has four garbage col­lec­tors :

  • Ser­i­al : All garbage col­lec­tion events are con­duct­ed seri­al­ly in one thread. Com­paction is exe­cut­ed after each garbage col­lec­tion.
  • Par­al­lel : Mul­ti­ple threads are used for minor garbage col­lec­tion. A sin­gle thread is used for major garbage col­lec­tion and Old Gen­er­a­tion com­paction. Alter­na­tive­ly, the Par­al­lel Old vari­ant uses mul­ti­ple threads for major garbage col­lec­tion and Old Gen­er­a­tion com­paction.
  • CMS (Con­cur­rent Mark Sweep): Mul­ti­ple threads are used for minor garbage col­lec­tion using the same algo­rithm as Par­al­lel. Major garbage col­lec­tion is mul­ti-thread­ed, like Par­al­lel Old, but CMS runs con­cur­rent­ly along­side appli­ca­tion process­es to min­i­mize “stop the world” events (i.e. when the garbage col­lec­tor run­ning stops the appli­ca­tion). No com­paction is per­formed.
  • G1 (Garbage First): The newest garbage col­lec­tor is intend­ed as a replace­ment for CMS. It is par­al­lel and con­cur­rent like CMS, but it works quite dif­fer­ent­ly under the hood com­pared to the old­er garbage col­lec­tors.

Benefits of Java Garbage Collection

The biggest ben­e­fit of Java garbage col­lec­tion is that it auto­mat­i­cal­ly han­dles dele­tion of unused objects or objects that are out of reach to free up vital mem­o­ry resources. Pro­gram­mers work­ing in lan­guages with­out garbage col­lec­tion (like C and C++) must imple­ment man­u­al mem­o­ry man­age­ment in their code.

Despite the extra work required, some pro­gram­mers argue in favor of man­u­al mem­o­ry man­age­ment over garbage col­lec­tion, pri­mar­i­ly for rea­sons of con­trol and per­for­mance. While the debate over mem­o­ry man­age­ment approach­es con­tin­ues to rage on, garbage col­lec­tion is now a stan­dard com­po­nent of many pop­u­lar pro­gram­ming lan­guages. For sce­nar­ios in which the garbage col­lec­tor is neg­a­tive­ly impact­ing per­for­mance, Java offers many options for tun­ing the garbage col­lec­tor to improve its effi­cien­cy.

Java Garbage Collection Best Practices

For many sim­ple appli­ca­tions, Java garbage col­lec­tion is not some­thing that a pro­gram­mer needs to con­scious­ly con­sid­er. How­ev­er, for pro­gram­mers who want to advance their Java skills, it is impor­tant to under­stand how Java garbage col­lec­tion works and the ways in which it can be tuned.

Besides the basic mech­a­nisms of garbage col­lec­tion, one of the most impor­tant points to under­stand about garbage col­lec­tion in Java is that it is non-deter­min­is­tic, and there is no way to pre­dict when garbage col­lec­tion will occur at run­time. It is pos­si­ble to include a hint in the code to run the garbage col­lec­tor with the System.gc() or Runtime.gc() meth­ods, but they pro­vide no guar­an­tee that the garbage col­lec­tor will actu­al­ly run.

The best approach to tun­ing Java garbage col­lec­tion is set­ting flags on the JVM. Flags can adjust the garbage col­lec­tor to be used (e.g. Ser­i­al, G1, etc.), the ini­tial and max­i­mum size of the heap, the size of the heap sec­tions (e.g. Young Gen­er­a­tion, Old Gen­er­a­tion), and more. The nature of the appli­ca­tion being tuned is a good ini­tial guide to set­tings. For exam­ple, the Par­al­lel garbage col­lec­tor is effi­cient but will fre­quent­ly cause “stop the world” events, mak­ing it bet­ter suit­ed for back­end pro­cess­ing where long paus­es for garbage col­lec­tion are accept­able.

On the oth­er hand, the CMS garbage col­lec­tor is designed to min­i­mize paus­es, mak­ing it ide­al for GUI appli­ca­tions where respon­sive­ness is impor­tant. Addi­tion­al fine-tun­ing can be accom­plished by chang­ing the size of the heap or its sec­tions and mea­sur­ing garbage col­lec­tion effi­cien­cy using a tool like jstat.