A question on the GLASS mailing list raised some interesting questions about garbage collection and persistent executable blocks. Some objects, such as instances of SortedCollection, may contain a reference to an executable block and this persistent reference may be difficult to update and may hold references to other objects that otherwise could be removed by the garbage collection process.

Simple vs. Complex Blocks

The first issue raised by the question is the distinction between a “simple block” and a “complex block.” In Smalltalk a block may reference variables outside the block and, if it does so, then the virtual machine needs to create a more complex data structure to capture the referenced objects (making it a “complex” block). If the block does not reference any variables outside its immediate scope (beyond the block arguments and block temporaries), then the virtual machine can typically use a much simpler implementation (making it a “simple” block).

Because of the extra overhead, complex blocks tend to take more room and be slower to execute. Thus, a common performance tuning activity is replacing complex blocks with simple blocks when possible, and this can occur in application code as well as vendor-supplied libraries. For example, in GemStone/S 64 Bit version 2.x, the implementation of SortedCollection>>#’addAll:’ is essentially the following:

aCollection do: [:each | self add: each].

The problem with this implementation is that the block references self, making it a complex block. In version 3.x the implementation changed (influenced in part by Paul Baumann’s work), and the code is essentially the following:

aCollection accompaniedBy: self do: [:me :each | me add: each].

The block in this code is a simple block since it does not reference self (or anything outside the block), and performance improved.

While this is all very interesting (and in some cases quite useful), I think that the underlying garbage collection problem is not really with simple vs. complex blocks, but with something else.

Class References in Blocks

A code block in a method will contain a reference to the method and thus to a class (or metaclass for a class-side method). Once the code block is persistent, there is a hard reference to the class (and its class variables, class instance variables, and pool variables). Changes to the class schema will create a new “version” of the class but the old version will remain and be referenced by the code block. Even if the old class version is removed from the ClassHistory and all instances of the old version are migrated to the new version, until the code block is replaced in the persistent object, there will be a hard reference to the old class version and its space cannot be reclaimed by the garbage collection process.

To see how this works consider the method SortedCollection>>#’_defaultBlock’:

| block |
block := SortedCollection new _defaultBlock. "anExecBlock2"
block _debugSourceForBlock. "'[ :a :b | a <= b ]'"
block method. "aGsNMethod"
block method isMethodForBlock. "true"
block method inClass. "SortedCollection"

Note that this situation exists whether or not the block is simple or complex and whether or not the block comes from an instance-side method or a class-side method. Thus, this is an orthogonal discussion or a unique concern.

Alternatively, note how an equivalent block can be created by evaluating a String, but does not contain a reference to a Class:

| block |
block := '[ :a :b | a <= b ]' evaluate. "anExecBlock"
block _debugSourceForBlock. "'[ :a :b | a <= b ]'"
block method. "aGsNMethod"
block method isMethodForBlock. "true"
block method inClass. "nil"

So, as suggested one could strip the class reference with the following code:

^ aBlock _sourceString evaluate

Alternatively, one could avoid the private method (#’_sourceString’) by making the String explicit:

^ '[ :a :b | a <= b ]' evaluate

While this avoids the private method, it hides the code in a string making it much more difficult for tools to recognize that this is code (and catch compile errors, support senders/implementors, provide for refactorings, etc.). Further, each of these approaches will create a new instance each time it is called, creating unnecessary objects.

As Dale noted in the email chain referenced above, we could modify the compiler so that a block did not know the class in which it was created. This might improved the garbage collection situation, but would make it more difficult to debug code and track a block back to its source. And, of course, you would need to wait till that feature was added to the product.

One way to address these issues is to create a new class with no instance variables (so that the schema would never change) and use class-side methods to return the desired blocks. The method that would otherwise hold the code block would use a level of indirection to return the actual block. If a code block needed to change, you would create a new class-side method in the special class and edit the indirecting method (so that references to the old code block could be found and modified). The original method would remain as long as there were references to it.

At the cost of a level of indirection (and reduced encapsulation), this gives you actual source code that can be recognized by tools, a single instance, and no stray references from the class (since there is only one class version).

[Note: I found this in “Drafts” and decided to publish it even though the original question was a couple years ago.]

 

Advertisements