refactor(workspace): component loader rewrite — diff harness + spike + first seam#10349
Draft
davidfirst wants to merge 19 commits intomasterfrom
Draft
refactor(workspace): component loader rewrite — diff harness + spike + first seam#10349davidfirst wants to merge 19 commits intomasterfrom
davidfirst wants to merge 19 commits intomasterfrom
Conversation
Builds the safety net before rewriting the component loader: a deterministic component snapshot + a wrapper that runs two loader instances in parallel and logs diffs to JSONL. Off by default, opt-in via BIT_LOADER_DIFF=1.
The env↔component "recursion" is a DAG topological-ordering problem, not a true cycle. PR #10086 failed because it collapsed V1's two-pass load into a single inline pipeline and lost the ordering guarantee. Recorded in D-001 with file:line evidence.
Behavior-preserving extraction of regroupEnvsIdsFromTheList into groupEnvsByDepLayer with unit tests. Sets up the seam to upgrade to a proper recursive topological sort in a follow-up — see D-001.
Replaces V1's one-level partition with a proper topological sort over the env-of-env DAG. Cycles fall back to a single layer defensively. Per D-001, this is the load-order primitive the rewrite depends on.
Adds BIT_LOADER_DIFF_SAMPLE=N to skip the partner on most calls. Lets the harness run on workspaces big enough that V1+V1 in parallel would OOM the default 4GB heap. Adds a bit-show e2e (the previous "bit show isn't covered" report turned out to be a stale-compile artifact). Documents actual command coverage in SNAPSHOT-CONTRACT.md.
…uildLoadGroups D-001 predicted that the recursive env-DAG sort would make the prepended 'always load core-aspect-env first' workaround unnecessary. Verified: 311-component bit7 workspace (the workspace the special case was written for) loads cleanly without it.
Extracts a generic topo-sort helper (topoLayerByDeps) and uses it for both envs (refactored to share) and extensions. regroupExtIdsFromTheList was a documented no-op TODO; this implements it. Same shape fix as #10350 for envs, applied to ext-of-ext chains.
…onfig only) Adds extensions[].config and aspects[].config to the diff snapshot, sorted by id with stable JSON canonicalization. Excludes data because it depends on cache warmth — the partner's cold cache eagerly computes deps that the primary's warm cache short-circuits, producing false positives in sampled mode. Documented the trade-off in SNAPSHOT-CONTRACT.md.
Pulls the canonical group-ordering logic out of WorkspaceComponentLoader as buildLoadPlanGroups in load-plan.ts. The loader retains the side-effecting parts (warming the extensions cache iteratively, registering newly-discovered extension comp ids) and calls into the pure function for the actual ordering. Removes the now-redundant regroupEnvsIdsFromTheList / regroupExtIdsFromTheList wrappers — load-plan.ts calls groupEnvsByDepLayer / groupExtsByDepLayer directly.
…re function Pulls the workspace-vs-scope partitioning logic out of groupAndUpdateIds as classifyIds in discovery.ts, with 7 unit tests covering ws/scope mixes, version resolution, and append-to-existing-result. Side effect: the workspace's id surface (listIds + locallyDeletedIds) is now fetched once per call instead of once per id — V1 fanned this out via Promise.all but the result was always identical for every parallel call within one invocation, so caching upfront is behavior-preserving.
…ments The original RFC called the loader's caching "the caching nightmare". After tracing the actual behavior, the rules are coherent — they were just undocumented. D-002 enumerates the four caches, the cache-key composition (four flags that genuinely produce a different Component), and the "exact-match-or-fully-loaded" lookup rule. Inline comments at the saveInCache / getFromCache / createComponentCacheKey touchpoints point back to D-002 so a reader of the source can find the invariants. No behavior change.
Single env var: BIT_LOADER_DIFF=N. Unset/0 = off. 1 = on, every call. N>1 = sample every Nth call. Drops BIT_LOADER_DIFF_SAMPLE (folded into BIT_LOADER_DIFF) and the user-facing BIT_LOADER_DIFF_OUT (kept internal-only for e2e tests). The harness prints the log path to stderr at startup so the user can find the output without consulting docs.
The flag tracked "we attempted to load this as an aspect" to suppress retry of failures. AspectLoaderMain.isAspectLoaded already returns true for both successfully-loaded aspects (via harmony.get) and previously-failed ones (via the failedAspects registry populated in requireAspects). The local memoization flag was redundant — removed, along with its references in clearCache, clearComponentCache, the constructor, and loadCompsAsAspects.
…instance
Two of the loader's caches (scopeComponentsCache, componentsExtensionsCache)
were used as scratch state for one load operation but lived as instance
fields, accumulating across calls. Now they're a per-operation LoadScratch
value created at the top of getAndLoadSlotOrdered and threaded through
buildLoadGroups → getAndLoadSlot → getComponentsWithoutLoadExtensions →
this.get → loadOne. Reads fall back to direct workspace lookups when scratch
is absent (single-component get() path), so the public surface is unchanged.
Net effect on the class:
Before: 4 instance caches, instance state outliving any single operation.
After: 1 instance cache (componentsCache, the actual output cache).
clearCache touches one thing. Scratch lifetime is visible at every
call site instead of hidden in instance fields.
…ession) The earlier "drop core-aspect-env special case" claim was wrong. The recursive env-DAG sort handles load *ordering* correctly, but the special case was also marking these envs with `core: true` — the flag that `getAndLoadSlot` checks before triggering `loadCompsAsAspects`. Without it, `bit status` on bit7 prints "environment with ID: teambit.harmony/envs/core-aspect-env was not loaded (run 'bit install')" because the env aspect wasn't registered with Harmony when components configured to use it were loaded. Restored in load-plan.ts (Step 7b) and pinned by a unit test. D-001 updated with a correction noting the misprediction. The diff harness didn't catch this because V1-vs-V1 mode compares the modified V1 against itself — they're equally wrong, so they match. Real coverage gap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Take 2 of the component-loading rewrite (PR #10086 was abandoned). This PR sets up the safety net and extracts the first seam.
Why this is a fresh start, not a rebase of #10086
See `docs/rfcs/component-loading-rewrite/README.md` and `DECISIONS.md` for the full analysis. Short version: the previous attempt collapsed V1's two-pass loading into a single inline pipeline and lost the ordering guarantee that prevents the env↔component recursion. Stubbing out Enrichment to "fix" it just deferred the work that needed to happen.
What's in this PR
1. Diff harness (`BIT_LOADER_DIFF=1`) — wraps `workspace.componentLoader` so two loader instances run side-by-side and any difference in their output is logged to JSONL. Off by default. Snapshot fields are deliberately small at the start; expand as the rewrite proceeds. e2e test verifies V1-vs-V1 produces zero diffs on a 2-component workspace.
2. Recursion root-cause spike (`DECISIONS.md` D-001) — traces the env↔component "recursion" through three call paths, identifies the one real DAG-ordering problem, and rules out the approaches the previous PR took (single-pass pipeline, lazy env binding, cycle detection).
3. First seam: env-DAG layering extracted as a pure function (`env-dag-sort.ts`). Behavior-preserving — the existing one-level limitation is pinned by a unit test. Recursive upgrade is in a follow-up PR.
Verification
Known limitations (for follow-up)