Automatic Garbage Collection and Finalization

In languages like C and C++, the program is required to manage the object "lifecycle," from allocating memory for objects to deleting them when they're no longer needed. TADS 3 lets you create objects dynamically, but eliminates the need for you to manage their memory after that point by introducing automatic "garbage collection."

The T3 VM automatically keeps track of which objects can still be used and which have become inaccessible, and from time to time deletes the inaccessible objects, making the memory they were using available for re-use. (We refer to inaccessible objects as "garbage," because they're just taking up memory without being of any further use to the program; and we refer to the process of recognizing and deleting these inaccessible objects as "garbage collection.")

For the most part, the garbage collector is invisible, so you can ignore it when writing your program. However, in some cases you might wish to be notified when the garbage collector is about to delete one of your objects. When you want such notification, you can use a "finalizer."

A finalizer is a special method, whose name is always finalize(); this method takes no arguments. When the garbage collector determines that an object has become unreachable, it checks to see if the object has a finalize() method. If the object does not have a finalize() method, the garbage collector can simply delete the object at any subsequent time. If the object does have a finalize() method, the garbage collector marks the object as "finalizable." Once an object is marked finalizable, the garbage collector can call the object's finalize() method at any subsequent time. Once this method returns, the garbage collector marks the object as "finalized." Once the object is marked as finalized, the garbage collector re-considers the object's reachability; at any subsequent time that the garbage collector determines that the object is unreachable, the collector can delete the object.

Note that the garbage collector must determine that an object with a finalize() method is unreachable twice before it can actually delete the object: the object must become unreachable once before the finalize() method can be called, and then must either remain unreachable or once again become unreachable before it can be deleted. The reason for the second reachability check is that the finalize() method could potentially make the object reachable again. Consider this example:

MyGlobals: object
  finalizedList = []
;

class MyClass: object
  finalize()
  {
    MyGlobals.finalizedList += self;
  }
;

When an instance of MyClass becomes unreachable, the garbage collector will at some point call the instance's finalize() method, which adds a reference to the instance to MyGlobals.finalizedList. Since MyGlobals is a named object, it's always reachable, hence anything that MyGlobals.finalizedList refers to is reachable - this means that the instance being finalized once again becomes reachable. So, the garbage collector cannot actually delete this object until the reference is removed from MyGlobals.finalizedList, at which point the instance once again becomes unreachable (assuming it hasn't been referenced anywhere else in the meantime).

The garbage collector calls an object's finalizer only once, even if the object becomes reachable again while it's being finalized. The single finalizer call is enforced by the state transitions: an object is initially unfinalized; after the garbage collector first notices that it is unreachable it becomes finalizable; after the garbage collector calls the finalizer the object becomes finalized. Once an object is finalized, it is deleted as soon as the collector notices that the object is unreachable. The garbage collector can only call the finalizer on a finalizable object, and once the finalizer is called the object becomes finalized; it cannot return to the finalizable state from the finalized state.

Note that the garbage collector does not run continuously, but only at certain times; exactly when the collector will run is unpredictable, because it depends on what memory operations the program performs, but it's also not usually important, since the program can largely ignore the collector's operation. Because of the unpredictable timing of garbage collection, the timeline descriptions above are intentionally a little vague; the only thing that's certain is the order of events, not their exact timing. So, an object might be finalized very quickly after it becomes unreachable, or it might sit in memory for a long time before the garbage collector gets around to finalizing the object.

Note also that you can explicitly invoke the garbage collector with the t3RunGC() function in the t3vm function set.

firstObj()/nextObj()

The firstObj() and nextObj() functions let you visit all of the objects currently in memory. This can have the sometimes surprising effect of retrieving objects that aren't currently reachable by any other means. Unreachable objects are only removed when the garbage collector runs, which only happens intermittently. Between runs, "dead" objects remain in memory, and TADS doesn't even know they're dead, because that determination is only made when the garbage collector runs. As a result, firstObj() and nextObj() simply visit all objects currently in memory, whether they're reachable or not.

If you want to ensure that existing dead objects are removed from memory before visiting objects with firstObj() and nextObj(), you can call t3RunGC() just before your object loop. This will ensure that any objects that are unreachable when the loop is about to start will be removed from memory and thus excluded from the object loop. This won't absolutely guarantee that you won't retrieve any dead objects, though, since other objects could become newly unreachable in the course of the loop itself.

Implementation Details

For those interested in academic details, the T3 VM implementation in TADS 3 uses a synchronous tracing garbage collector.

A tracing collector traverses the entire set of accessible objects, starting with the "root set." The root set is the set of objects that are directly reachable to the program, such as local variables and static objects defined in the source code. The garbage collector marks each root set object as reachable, then marks as reachable each object to which a root set object refers, then marks as reachable each object to which those objects refer, and so on. This process continues until the collector has marked every object that can be reached directly or indirectly through references from root set objects. Any objects not marked during this tracing process are unreachable, and hence can be deleted.

A synchronous collector runs in the same thread as the rest of the program, meaning that other operations halt while the collector runs.

It's possible in principle for the TADS garbage collector to in the background while waiting for input through the user interface, such as reading a command line or awaiting a mouse click. The VM is blocked waiting for the UI anyway in these cases, so running the garbage collector during UI waits is consistent with the synchronous design. The advantage of this approach is that UI waits are generally idle times for the CPU, so we're essentially using "free" CPU time. It's up to individual VMs to take advantage of this, and whether they do or not makes no difference to the program.