How Java Garbage Collection WorksBy Justin Poirier
In the Classroom306 article Details of C/C++ Dereferences and C++ Calls to delete, it is explained that a C/C++ program's heap is contained in a contiguous block of memory, and calls to allocation and deallocation functions/operators (e.g. "new") are solely responsible for managing the heap's contents (dynamically-allocated variables). This article will build on the contents of that article, to explain how memory management works in Java.
The most basic function of the Java Virtual Machine (JVM) is to convert java bytecode to native machine code at run-time. This may be accomplished using interpreting, by which each line of bytecode is converted to machine code just before being executed; or just-in-time compilation, by which conversion is done on several lines of bytecode at a time, and such a block may be reused later in the execution without re-compiling. With either system, the JVM, as opposed to code written by the Java programmer, is frequently in control of the CPU. During certain bouts of control, the JVM performs the additional task of deleting items on the heap that are no longer referenced. This process is called garbage collection, and it does the work that calls to the deallocation functions/operators do in C/C++, making such calls unnecessary in Java.
If garbage collection is implemented with reference counting, the JVM is continuously at work monitoring each heap item, before the time comes to actually delete it. A count is kept for each heap item, of the number of references to it at a given time. The JVM must increment this count whenever a variable in the Java code is set equal to the item, and decrement the count when a variable that previously referenced the item is set equal to null or some other item, or goes out of scope. When this count becomes zero, the item can be deleted.
In modern JVM's, garbage collection is more commonly implemented using tracing. Instead of keeping track of the number of references to an item over time, a tracing collector periodically identifies all references that have been created by the Java program and might be used again (live references), and deletes all items that are not the object of such a reference. The first phase is called marking because referenced objects must be marked in some way--for example, by setting a bit flag in the actual heap space used to store an object. The majority of live references will typically be found in the form of variables local to some active function (ie. existing on the call stack), or as member variables of class instances that are themselves objects of other live references. Therefore a JVM must search for references in both these areas of memory. To do so it must have knowledge of how the raw bits within each function's typical stack frame and each class's typical instance are divided up into variables. However, a JVM may not always distinguish between the types of variables it encounters, and may therefore ignore some live references in favour of discerning whether they are references or other primitive types.
The second phase of tracing, where unreferenced objects are deleted, is called sweeping. As with C/C++, a Java program's heap may suffer from fragmentation after objects are allocated and then deleted. JVM's may try to reduce this effect. In addition to combining adjacent free blocks of memory like C/C++ systems, JVM's may combine non-adjacent free blocks by moving allocated items around using compacting or copying. In order to avoid having to update every reference to an item that has been moved, Java is typically implemented such that a reference points not to the allocated block of an object itself, but to an entry in a lookup table that contains the address of each object referenced. Each access to an object implicitly uses this table. This is similar to the technique used in the dynamically-allocated memory system presented in the Classroom306 article A System to Dynamically-Allocate Memory With Minimal Fragmentation.