Skip to content

Formula graph design and gc#3118

Draft
kumavis wants to merge 1 commit intollmfrom
cursor/formula-graph-design-and-gc-e885
Draft

Formula graph design and gc#3118
kumavis wants to merge 1 commit intollmfrom
cursor/formula-graph-design-and-gc-e885

Conversation

@kumavis
Copy link
Member

@kumavis kumavis commented Mar 6, 2026

Description

This PR introduces a new markdown document, FORMULA-GRAPH.md, to the packages/daemon directory. This document provides a comprehensive design overview of the Endo daemon's formula graph, covering:

  • What the Graph Encodes: Nodes (formula types), identifiers, static and dynamic dependency edges, union-find groups, permanent roots, and persistence layout.
  • Operations: preformulate, formulate, provide/provideController, cancelValue, pet name remove, and restart recovery.
  • Concurrency: The formulaGraphJobs mutex, withFormulaGraphLock, synchronous controller creation, and per-resolver serial queues.
  • Garbage Collection: The full 12-step collection algorithm, mechanisms for keeping formulas alive, transient root pinning, locking interactions, and promise resolution ordering.

This document aims to provide a clear understanding of the core design principles and operational mechanics of the daemon's formula graph.

Security Considerations

This change is purely documentation and does not introduce new code, dependencies, or authorities, thus having no direct security implications.

Scaling Considerations

This change is purely documentation and does not impact resource consumption.

Documentation Considerations

This PR significantly enhances the internal documentation of the Endo daemon by detailing a critical component: the formula graph. It provides a foundational understanding for developers working on or debugging the daemon. There are no backwards compatibility issues or upgrade instructions needed for users.

Testing Considerations

This change is purely documentation and does not introduce new functionality requiring unit tests.

Compatibility Considerations

This change is purely documentation and does not affect prior usage patterns or introduce breaking changes.

Upgrade Considerations

This change is purely documentation and has no impact on upgrading live production systems.


Open in Web Open in Cursor 

Covers the formula graph's structure (nodes, edges, identifiers,
union-find groups), operations (formulate, provide, cancel, remove),
concurrency model (serial mutex, re-entrancy, controller creation
ordering), and garbage collection (group-level reference-counting
sweep, transient roots, withCollection wrapper, promise resolution
ordering, residence tracking, cancellation cascading).

Co-authored-by: kumavis <kumavis@users.noreply.github.com>
@cursor
Copy link

cursor bot commented Mar 6, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@changeset-bot
Copy link

changeset-bot bot commented Mar 6, 2026

⚠️ No Changeset found

Latest commit: 57f5ff1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@kumavis
Copy link
Member Author

kumavis commented Mar 6, 2026

the intern (non-human) said:

Correctness errors

  1. Section 3.3 — provideController is synchronous, formulate is not.

The document says:

When formulate or provideController creates a controller, it does so synchronously — the controller (with its promise) is placed in controllerForId before any await.

This is accurate for provideController, which is a plain (non-async) function — it creates the controller and calls resolve(evaluateFormulaForId(...)) all synchronously. But formulate is async and the controller creation happens after two awaits (disk write, then lock acquisition). The invariant is narrower: there is no await between controller registration and the start of evaluation. The controller is not created "before any await" — it is created before any subsequent await after the lock releases. Proposed fix: rewrite to say that the controller is registered and evaluation begins in the same synchronous segment (same microtask), with no intervening await, so that recursive provide calls from evaluation always find the controller already in place.

  1. Section 3.5 — Concurrent provide-or-create does not produce an assertion error.

The document says:

at worst one caller will fail with an assertion error rather than creating a duplicate.

This is wrong. Two concurrent callers that both see identify return undefined for the same pet name will each call formulate with different random formula numbers, producing different formula identifiers. The assertion in formulate (formulaForId.has(id)) checks for an ID collision, not a logical duplicate — so both succeed. The second petStore.write overwrites the first's mapping, orphaning the first formula. The orphan is then collected by GC. Proposed fix: replace the sentence with an explanation that both callers succeed, both formulas are created, the second pet name write wins, and the orphaned formula is collected by GC. No assertion error occurs.

  1. Section 4.2 — revivePins runs outside the critical section.

Steps 1–11 run inside the formulaGraphJobs.enqueue callback. Step 12 (revivePins) runs after the enqueue callback returns, outside the mutex. The document presents all 12 steps as "the collection algorithm" running "entirely under formulaGraphJobs.enqueue", which is inaccurate. Proposed fix: separate pin revival from the numbered algorithm steps and note that it runs after the lock is released.

  1. Section 4.8 — disconnectRetainersHolding only acts on worker retainers.

The document says it "close[s] CapTP connections and terminate[s] workers that were exporting collected formulas." Looking at the code:

packages/daemon/src/residence.js
Lines 142-167

  const disconnectRetainersHolding = ids => {
    const collected = new Set(ids);
    for (const [retainerId, retainees] of retaineesByRetainer.entries()) {
      for (const id of retainees.keys()) {
        if (collected.has(id)) {
          const workerId = workerForRetainer.get(retainerId);
          if (!workerId) {
            break;
          }
          const formula = getFormula(id);
          if (!formula || formula.type === 'invitation') {
            break;
          }
          // ... close and terminate ...
          break;
        }
      }
    }
  };

If the retainer is not a worker (!workerId), the inner loop breaks and nothing happens. And invitation-type formulas are explicitly skipped. So non-worker CapTP connections holding collected formulas are left alone. Proposed fix: qualify the statement — only worker retainers are disconnected and terminated, and invitation formulas are exempt.

Imprecisions worth correcting
5. Section 3.2 — "ancestor frame" is misleading.

The document says collectIfDirty bypasses withFormulaGraphLock because the depth counter could be "incremented by an ancestor frame." The formulaGraphLockDepth counter is module-scoped, not stack-scoped. The actual risk is that a concurrent operation (interleaved at an await boundary) holds the lock, incrementing the global counter. Any other operation that resumes after its await null would see depth > 0 and skip the mutex. This is not call-stack ancestry — it is async interleaving. Proposed fix: replace "ancestor frame" with a description of the actual mechanism: a concurrent operation holds the lock across an await, and an interleaved caller sees the global depth counter > 0.

  1. Section 4.9 — Context deps and graph deps are different edge systems flowing in opposite directions.

The document mentions both but does not clearly distinguish them. Graph dependency edges go from a formula to its dependencies (eval → worker: "eval depends on worker"). They determine GC reachability. Context cancellation edges, registered via thisDiesIfThatDies, go in the opposite direction (worker → eval: "if worker dies, eval dies"). They determine runtime cancellation cascading. When GC cancels a collected formula, the cancellation can cascade through context deps to formulas that are NOT themselves in the collected set — for example, an eval whose worker is collected would be cancelled even if the eval itself was reachable (if it is reachable through some other path but its worker is not, this scenario is impossible by the graph structure, but the distinction is still worth making explicit). Proposed fix: add a sentence noting the directional difference and that context deps are a runtime mechanism independent of the structural graph deps.

  1. Section 4.10 — "correct" overstates things; the algorithm cannot collect cycles.

The document says the full-pass approach is "simple and correct." The algorithm is Kahn's algorithm (topological sort of zero-in-degree nodes). It correctly identifies unreachable groups in a DAG but cannot collect cycles of mutually-unreachable groups. Static dependency edges are acyclic by construction (a formula can only reference already-existing formulas). But pet store dynamic edges can in principle create group-level cycles — e.g., a child host's pet store names a formula that (transitively) depends back on the child host's group. In practice such cycles are unlikely because guests depend on their parent host's handle (hostHandle/hostAgent), not on the creating host. But the limitation exists. Proposed fix: replace "correct" with "correct for acyclic group-level graphs" and add a note about the theoretical cycle limitation.

Minor observations (no correction needed but worth noting)
8. Union-find is never cleaned up. onFormulaRemoved deletes entries from formulaDeps and promiseResolverByStore but does not remove entries from the union-find's parent and size maps. These grow monotonically over the daemon's lifetime. The practical impact is small (bounded by total formulas ever created), but it is a structural observation about the graph implementation.

  1. The withCollection-excluded methods list includes reverseIdentify, which is a read-only operation. The document's explanation (async resolver hasn't written the pet store entry yet) applies to the formula-creating methods in the list but not to reverseIdentify. It appears to be excluded simply because it does not modify the graph and gains nothing from triggering collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants