Conversation
Covers the formula graph's structure (nodes, edges, identifiers, union-find groups), operations (formulate, provide, cancel, remove), concurrency model (serial mutex, re-entrancy, controller creation ordering), and garbage collection (group-level reference-counting sweep, transient roots, withCollection wrapper, promise resolution ordering, residence tracking, cancellation cascading). Co-authored-by: kumavis <kumavis@users.noreply.github.com>
|
Cursor Agent can help with this pull request. Just |
|
|
the intern (non-human) said: Correctness errors
The document says: When formulate or provideController creates a controller, it does so synchronously — the controller (with its promise) is placed in controllerForId before any await. This is accurate for provideController, which is a plain (non-async) function — it creates the controller and calls resolve(evaluateFormulaForId(...)) all synchronously. But formulate is async and the controller creation happens after two awaits (disk write, then lock acquisition). The invariant is narrower: there is no await between controller registration and the start of evaluation. The controller is not created "before any await" — it is created before any subsequent await after the lock releases. Proposed fix: rewrite to say that the controller is registered and evaluation begins in the same synchronous segment (same microtask), with no intervening await, so that recursive provide calls from evaluation always find the controller already in place.
The document says: at worst one caller will fail with an assertion error rather than creating a duplicate. This is wrong. Two concurrent callers that both see identify return undefined for the same pet name will each call formulate with different random formula numbers, producing different formula identifiers. The assertion in formulate (formulaForId.has(id)) checks for an ID collision, not a logical duplicate — so both succeed. The second petStore.write overwrites the first's mapping, orphaning the first formula. The orphan is then collected by GC. Proposed fix: replace the sentence with an explanation that both callers succeed, both formulas are created, the second pet name write wins, and the orphaned formula is collected by GC. No assertion error occurs.
Steps 1–11 run inside the formulaGraphJobs.enqueue callback. Step 12 (revivePins) runs after the enqueue callback returns, outside the mutex. The document presents all 12 steps as "the collection algorithm" running "entirely under formulaGraphJobs.enqueue", which is inaccurate. Proposed fix: separate pin revival from the numbered algorithm steps and note that it runs after the lock is released.
The document says it "close[s] CapTP connections and terminate[s] workers that were exporting collected formulas." Looking at the code: packages/daemon/src/residence.js const disconnectRetainersHolding = ids => {
const collected = new Set(ids);
for (const [retainerId, retainees] of retaineesByRetainer.entries()) {
for (const id of retainees.keys()) {
if (collected.has(id)) {
const workerId = workerForRetainer.get(retainerId);
if (!workerId) {
break;
}
const formula = getFormula(id);
if (!formula || formula.type === 'invitation') {
break;
}
// ... close and terminate ...
break;
}
}
}
};If the retainer is not a worker (!workerId), the inner loop breaks and nothing happens. And invitation-type formulas are explicitly skipped. So non-worker CapTP connections holding collected formulas are left alone. Proposed fix: qualify the statement — only worker retainers are disconnected and terminated, and invitation formulas are exempt. Imprecisions worth correcting The document says collectIfDirty bypasses withFormulaGraphLock because the depth counter could be "incremented by an ancestor frame." The formulaGraphLockDepth counter is module-scoped, not stack-scoped. The actual risk is that a concurrent operation (interleaved at an await boundary) holds the lock, incrementing the global counter. Any other operation that resumes after its await null would see depth > 0 and skip the mutex. This is not call-stack ancestry — it is async interleaving. Proposed fix: replace "ancestor frame" with a description of the actual mechanism: a concurrent operation holds the lock across an await, and an interleaved caller sees the global depth counter > 0.
The document mentions both but does not clearly distinguish them. Graph dependency edges go from a formula to its dependencies (eval → worker: "eval depends on worker"). They determine GC reachability. Context cancellation edges, registered via thisDiesIfThatDies, go in the opposite direction (worker → eval: "if worker dies, eval dies"). They determine runtime cancellation cascading. When GC cancels a collected formula, the cancellation can cascade through context deps to formulas that are NOT themselves in the collected set — for example, an eval whose worker is collected would be cancelled even if the eval itself was reachable (if it is reachable through some other path but its worker is not, this scenario is impossible by the graph structure, but the distinction is still worth making explicit). Proposed fix: add a sentence noting the directional difference and that context deps are a runtime mechanism independent of the structural graph deps.
The document says the full-pass approach is "simple and correct." The algorithm is Kahn's algorithm (topological sort of zero-in-degree nodes). It correctly identifies unreachable groups in a DAG but cannot collect cycles of mutually-unreachable groups. Static dependency edges are acyclic by construction (a formula can only reference already-existing formulas). But pet store dynamic edges can in principle create group-level cycles — e.g., a child host's pet store names a formula that (transitively) depends back on the child host's group. In practice such cycles are unlikely because guests depend on their parent host's handle (hostHandle/hostAgent), not on the creating host. But the limitation exists. Proposed fix: replace "correct" with "correct for acyclic group-level graphs" and add a note about the theoretical cycle limitation. Minor observations (no correction needed but worth noting)
|
Description
This PR introduces a new markdown document,
FORMULA-GRAPH.md, to thepackages/daemondirectory. This document provides a comprehensive design overview of the Endo daemon's formula graph, covering:preformulate,formulate,provide/provideController,cancelValue, pet nameremove, and restart recovery.formulaGraphJobsmutex,withFormulaGraphLock, synchronous controller creation, and per-resolver serial queues.This document aims to provide a clear understanding of the core design principles and operational mechanics of the daemon's formula graph.
Security Considerations
This change is purely documentation and does not introduce new code, dependencies, or authorities, thus having no direct security implications.
Scaling Considerations
This change is purely documentation and does not impact resource consumption.
Documentation Considerations
This PR significantly enhances the internal documentation of the Endo daemon by detailing a critical component: the formula graph. It provides a foundational understanding for developers working on or debugging the daemon. There are no backwards compatibility issues or upgrade instructions needed for users.
Testing Considerations
This change is purely documentation and does not introduce new functionality requiring unit tests.
Compatibility Considerations
This change is purely documentation and does not affect prior usage patterns or introduce breaking changes.
Upgrade Considerations
This change is purely documentation and has no impact on upgrading live production systems.