Skip to content

Remove is_live in favor for is_reachable. #1275

Open
@wks

Description

@wks

TL;DR: MMTk has both is_live and is_reachable, and the only difference is that is_live always returns true for objects in ImmortalSpace. This is wrong, and has consequences. We should remove is_live.

Definition

According to the GC Handbook, the precise definition of live is that "an object is live if it will be used by a mutator". Because it is undecidable, GC systems use reachability instead, i.e. "an object is reachable if there is a path from roots to that object following references". in GC, "live" and "reachable" are used interchangeably, and "dead" is a synonym of "unreachable".

Actual behavior

However, in MMTk, Objects in the ImmortalSpace are erroneously considered "immortal", i.e. always live. The object.is_live() function always returns true if object is in the ImmortalSpace. Historically, this behavior can be traced back to the very first commit in JikesRVM when the ImmortalSpace was introduced. A subsequent commit in JikesRVM introduced isReachable which checks the mark bit for ImmortalSpace, and it really checks reachability. Since then, JikesRVM had both isLive and isReachable. Both of them were ported to the Rust MMTk we have today, and they behave just like in JikesRVM.

This behavior contradicts with the definition of "live" and "reachable". Objects in the ImmortalSpace can become both unused by the mutator and unreachable from roots. Such objects are not "live" in either sense, yet is_live() still returns true.

Consequences

If an object is not traced during tracing, it will not be scanned, and its fields will not be forwarded. If the GC is a copying GC (such as SemiSpace), the object will contain dangling references. This is OK if the object dies (in which case the VM will never touch the object again). But since ImmortalSpace erroneously considers such objects "live", there will be consequences.

Weak reference processing

In the built-in ReferenceProcessor and FinalizableProcessor, is_live() is used to test if a weak reference or a finalizable reference is reached by stronger references.

Suppose in a JVM there is a WeakReference named a which refers to an object b in the ImmortalSpace. In a GC, a is traced, but b is not traced, and b contains references to objects that have been moved. In the WeakRefClosure stage, ReferenceProcessor will inspect a. is_live() will show that b is "live", and it will retain the weak reference from a to b. After GC, the mutator can call a.get() and upgrade the weak reference to a strong reference. Now the strong reference points to an object b that contains dangling references. When the mutator attempts to follow those references, it will crash.

It will be similar if the VM binding uses the Scanning::process_weak_ref and uses is_live to test if the referent is live.

The is_reachable method is unaffected because it actually checks the mark bit. Unreachable objects will be unmarked, and is_reachable will return false.

VO bits

This is not related to the is_live method, but the VO bits in ImmortalSpace is never cleared, as if the objects never die. It mainly impacts conservative stack scanning. It has several solutions and I have elaborated in #1274

What should we do?

We should just remove is_live. It's simply wrong.

We should use is_reachable in places where is_live is used. Particularly, is_reachable should be used when processing weak references.

And we should clarify that is_reachable does not return true root-reachability because that's too expensive to compute. It returns true if the current plan/space considers the object is reachable at the time it is called. It will return true if the object is marked and/or forwarded, and it is the same for ImmortalSpace. It will also return true for objects in the mature space and objects in the nursery that are reachable from the remembered set.

Related issues

VO bits and Immortal spaces: #1274

I previously considered renaming is_reachable and is_live in #1271 But after discussion, I think we don't need to rename is_reachable. We just need to clarify its semantics. More discussions can be found in the comments of #1271

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions