Skip to content

Commit 7f0ab87

Browse files
magyargergoclaude
andauthored
fix(embeddings): resolve onnxruntime-common under pnpm-strict / pnpm dlx (#307) (#2139)
* fix(embeddings): resolve onnxruntime-common under pnpm-strict / pnpm dlx (#307) `@huggingface/transformers` does a bare `import 'onnxruntime-common'` from its shipped `dist/transformers.node.mjs`, but never declares onnxruntime-common in its own `dependencies`. npm's flat node_modules (and pnpm with hoisting) place it on transformers' resolution path by accident; pnpm's isolated store only links a package's declared deps into its scope, so under pnpm-strict / `pnpm dlx` / `pnpx` the import dies with ERR_MODULE_NOT_FOUND before `analyze --embeddings` can run. Declaring onnxruntime-common in gitnexus' own deps (#2074) does not fix this under pnpm: Node resolves the bare specifier from transformers' module scope, not ours, and overrides/resolutions can only re-version an existing edge, never add the missing one (verified against a real `hoist=false` install — the declaration only changes which version wins the hoist, never whether the import resolves). Fix: install a synchronous, in-thread ESM resolution hook (`module.registerHooks`) right before the lazy transformers import that redirects `onnxruntime-common` to the copy gitnexus depends on — but only when the default resolver fails. On npm / hoisted layouts the default resolver succeeds first and the hook never fires, so working setups are unchanged. The hook only intercepts the exact `onnxruntime-common` specifier on failure, so it can never mask an unrelated resolution error; onnxruntime-node's native binding still loads normally from transformers' own scope. `registerHooks` (sync, in-thread, single inline closure) is preferred over the older `module.register` (async, off-thread, now deprecated — DEP0205, removed in Node 26): the redirect is a one-line conditional that needs no worker thread, no separate hook module, and no `data` marshalling. It is available on Node >= 22.15; on older runtimes the helper is a graceful no-op (the gitnexus engines floor is >= 22.0.0, and the import still resolves on hoisted layouts there). Chosen over bundling transformers (the build is tsc-only, and transformers carries native onnxruntime-node + WASM onnxruntime-web assets that bundle poorly). Installation is idempotent, best-effort, and lazy — only on the local-embedding path, so it never affects analysis, the parse workers, or HTTP embedding mode. Validated end-to-end: the compiled resolver fixes a real pnpm `hoist=false` transformers install (ERR_MODULE_NOT_FOUND -> resolved). The separate `@ladybugdb/core` native-binary path under pure `pnpm dlx` is unchanged (#1967 handles that gracefully). Refs #307, #2069 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(embeddings): version-match the onnxruntime-common redirect target (#307) Prefer the onnxruntime-common that onnxruntime-node (the native binding transformers actually loads) depends on, so the redirected copy is version- matched to that binding even under `pnpm dlx` — where gitnexus' npm-style `overrides` block does not apply, because it is honoured only from a root manifest and gitnexus is a transitive dependency there. The walk resolves transformers' main entry (not its `exports`-blocked package.json) -> onnxruntime-node -> its onnxruntime-common, and falls back to gitnexus' own direct dependency when the chain can't be walked. Also corrects the doc comment that claimed the gitnexus copy was already "version-aligned". Addresses a PR #2139 tri-review finding (P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(embeddings): narrow the onnxruntime-common resolve fallback to absence errors (#307) The resolve closure's `catch` swallowed every error from `nextResolve` and redirected, which would silently paper over a genuinely present-but-broken onnxruntime-common install. Only substitute gitnexus' copy when the specifier is actually absent (ERR_MODULE_NOT_FOUND, or ERR_PACKAGE_PATH_NOT_EXPORTED for an exports-broken copy); rethrow anything else. Adds a test that an unrelated error code rethrows. Addresses a PR #2139 tri-review finding (P3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(embeddings): cover the onnxruntime-common resolver best-effort swallow path (#307) The outer try/catch in ensureOnnxRuntimeCommonResolvable() was untested. A throwing registerHooks spy drives it; the call must not throw (initEmbedder does not guard the return, so a throw would break `analyze --embeddings`). The vitest quirk that surfaced an earlier attempt applies to throwing mock factories, not a throwing spy implementation, so this is testable cleanly. Addresses a PR #2139 tri-review finding (P2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(embeddings): tighten the onnxruntime-common redirect-URL assertion (#307) `/^file:\/\/.*onnxruntime-common/` matched the substring anywhere, so a lookalike path (e.g. `/x/onnxruntime-common-fake/`) would pass. Require an actual `/node_modules/onnxruntime-common/...js` segment so the assertion proves the redirect resolves to the real package, not just a string match. Addresses a PR #2139 tri-review finding (P3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(embeddings): drop the no-op __resetOnnxRuntimeCommonResolverForTests seam (#307) The test helper reloads the resolver via vi.resetModules() + a fresh import(), which already re-initialises the module-level one-shot `attempted` flag to false. The __reset export it then called was therefore a no-op. Remove the test-only export and its call; isolation now rests solely on vi.resetModules(). Addresses a PR #2139 tri-review finding (P3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(embeddings): correct the onnxruntime-common resolver isolation comment (#307) The doc comment claimed the hook "never affects other tools' resolution". Once installed, `module.registerHooks` is process-global and its resolve closure runs for every subsequent resolution — it passes them all through untouched and only substitutes the exact `onnxruntime-common` specifier on genuine absence, at a cost of one string comparison. Also note `registerHooks` is @experimental and requires Node >= 22.15 (graceful no-op below that). Comment-only. Addresses a PR #2139 tri-review finding (P3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ae5ec94 commit 7f0ab87

4 files changed

Lines changed: 273 additions & 0 deletions

File tree

gitnexus/src/core/embeddings/embedder.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ import { isHttpMode, getHttpDimensions, httpEmbed } from './http-client.js';
2828
import { resolveEmbeddingConfig } from './config.js';
2929
import { applyHfEnvOverrides, isHfDownloadFailure, withHfDownloadRetry } from './hf-env.js';
3030
import { getLocalEmbeddingRuntimeBlocker } from './runtime-support.js';
31+
import { ensureOnnxRuntimeCommonResolvable } from './onnxruntime-common-resolver.js';
3132
import { logger } from '../logger.js';
3233

3334
/**
@@ -179,6 +180,9 @@ export const initEmbedder = async (
179180
try {
180181
// Lazy-load transformers.js only after the runtime guard has passed, so
181182
// unsupported platforms never reach the native ONNX import (#1515).
183+
// Under pnpm-strict / `pnpm dlx`, transformers' phantom `onnxruntime-common`
184+
// import is unresolvable; register the fallback resolver first (#307).
185+
ensureOnnxRuntimeCommonResolvable();
182186
const { pipeline, env } = await import('@huggingface/transformers');
183187

184188
// Configure transformers.js environment
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
/**
2+
* Make `@huggingface/transformers`' phantom `onnxruntime-common` import
3+
* resolvable under strict package-manager layouts (#307, #2069).
4+
*
5+
* ## Why
6+
* transformers' shipped `dist/transformers.node.mjs` does a bare
7+
* `import 'onnxruntime-common'`, but transformers' `package.json` never declares
8+
* onnxruntime-common (it lists onnxruntime-node / onnxruntime-web / sharp). With
9+
* npm's flat `node_modules` — or pnpm with hoisting — the package is hoisted to
10+
* a directory on transformers' resolution path and the import resolves by
11+
* accident. Under pnpm's isolated store (and therefore `pnpm dlx` / `pnpx`), a
12+
* package only sees its *declared* deps, so the import dies with
13+
* `ERR_MODULE_NOT_FOUND` before `analyze --embeddings` can run.
14+
*
15+
* Declaring onnxruntime-common in gitnexus' own dependencies (#2074) does NOT
16+
* fix this under pnpm: Node resolves the bare specifier from *transformers'*
17+
* module scope, not ours, and overrides/resolutions can only re-version an
18+
* existing edge, never add the missing one.
19+
*
20+
* ## What this does
21+
* Install a synchronous, in-thread ESM resolution hook (`module.registerHooks`,
22+
* Node >= 22.15) that redirects `onnxruntime-common` to a copy gitnexus can
23+
* resolve — but only when the default resolver fails. The redirect target is
24+
* preferentially the `onnxruntime-common` that `onnxruntime-node` (the native
25+
* binding transformers actually loads) itself depends on, so the redirected copy
26+
* is version-matched to that binding even under `pnpm dlx` — where gitnexus'
27+
* npm-style `overrides` block does NOT apply, because it is honoured only from a
28+
* root manifest and gitnexus is a transitive dependency there. It falls back to
29+
* gitnexus' own direct `onnxruntime-common` dependency when that chain can't be
30+
* walked. onnxruntime-common is a stable, pure-JS package whose `Tensor` surface
31+
* is unchanged across 1.24–1.26, so either target is API-compatible. On working
32+
* layouts the default resolver succeeds first and the hook never fires, so
33+
* behaviour is unchanged.
34+
*
35+
* `registerHooks` (synchronous, in-thread) is preferred over the older
36+
* `module.register` (async, off-thread, now deprecated — DEP0205, removed in
37+
* Node 26): the redirect is a one-line conditional that needs no worker thread,
38+
* no separate hook module, and no `data` marshalling.
39+
*
40+
* ## Safety
41+
* Best-effort and idempotent. The hook is installed lazily, only on the
42+
* local-embedding code path (after parsing), so it is never registered during
43+
* analysis, in the parse workers, or in HTTP embedding mode. Once installed it
44+
* is process-global: its resolve closure runs for every subsequent module
45+
* resolution, but it passes all of them through untouched and only substitutes a
46+
* result for the exact `onnxruntime-common` specifier when that specifier is
47+
* genuinely absent — so it cannot mask an unrelated resolution error, and the
48+
* per-resolution cost is a single string comparison.
49+
*
50+
* `module.registerHooks` is marked `@experimental` and requires Node >= 22.15
51+
* (the gitnexus engines floor is >= 22.0.0). On older runtimes it is absent and
52+
* this is a graceful no-op: embeddings then resolve onnxruntime-common exactly
53+
* as before — fine on hoisted layouts. Any failure during installation is
54+
* swallowed.
55+
*/
56+
import { registerHooks, createRequire } from 'node:module';
57+
import { pathToFileURL } from 'node:url';
58+
import { logger } from '../logger.js';
59+
60+
let attempted = false;
61+
62+
/**
63+
* Compute the file: URL the hook redirects `onnxruntime-common` to.
64+
*
65+
* Prefer the copy `onnxruntime-node` (the native binding transformers loads)
66+
* depends on, so the redirected module is version-matched to the binding even
67+
* under `pnpm dlx`, where transformers keeps its own pinned onnxruntime-node.
68+
* The walk resolves transformers' MAIN entry — NOT `@huggingface/transformers/
69+
* package.json`, which transformers' `exports` map blocks
70+
* (`ERR_PACKAGE_PATH_NOT_EXPORTED`) — then onnxruntime-node, then its
71+
* onnxruntime-common. Falls back to gitnexus' own direct dependency (always
72+
* resolvable from our scope) when any step fails.
73+
*/
74+
const resolveOnnxRuntimeCommonUrl = (): string => {
75+
const require = createRequire(import.meta.url);
76+
try {
77+
const transformersMain = require.resolve('@huggingface/transformers');
78+
const ortNodePkg = createRequire(transformersMain).resolve('onnxruntime-node/package.json');
79+
const common = createRequire(ortNodePkg).resolve('onnxruntime-common');
80+
return pathToFileURL(common).href;
81+
} catch {
82+
return pathToFileURL(require.resolve('onnxruntime-common')).href;
83+
}
84+
};
85+
86+
/**
87+
* Idempotently install the onnxruntime-common resolution fallback. Call once
88+
* immediately before the dynamic `import('@huggingface/transformers')` on the
89+
* local-embedding path.
90+
*/
91+
export const ensureOnnxRuntimeCommonResolvable = (): void => {
92+
if (attempted) return;
93+
// Mark attempted up-front: a failed attempt must not retry on every
94+
// initEmbedder() call, and the hook is process-global — once is enough.
95+
attempted = true;
96+
97+
try {
98+
// Node < 22.15 (the gitnexus engines floor is >= 22.0.0): no synchronous
99+
// hooks API. Degrade gracefully — the import still works on hoisted layouts.
100+
if (typeof registerHooks !== 'function') return;
101+
102+
const redirectUrl = resolveOnnxRuntimeCommonUrl();
103+
104+
registerHooks({
105+
resolve(specifier, context, nextResolve) {
106+
if (specifier !== 'onnxruntime-common') return nextResolve(specifier, context);
107+
// Honour a real, package-manager-provided copy when one is on the path
108+
// (npm / hoisted pnpm); only substitute ours when the specifier is
109+
// genuinely absent.
110+
try {
111+
return nextResolve(specifier, context);
112+
} catch (err) {
113+
// The phantom import surfaces as ERR_MODULE_NOT_FOUND (or, for a
114+
// present-but-exports-broken copy, ERR_PACKAGE_PATH_NOT_EXPORTED).
115+
// Rethrow anything else so a genuinely broken install is not masked.
116+
const code = (err as { code?: string } | null | undefined)?.code;
117+
if (code === 'ERR_MODULE_NOT_FOUND' || code === 'ERR_PACKAGE_PATH_NOT_EXPORTED') {
118+
return { url: redirectUrl, shortCircuit: true };
119+
}
120+
throw err;
121+
}
122+
},
123+
});
124+
logger.debug({ redirectUrl }, 'Installed onnxruntime-common resolution fallback (#307)');
125+
} catch (err) {
126+
// Never block embeddings on the fallback. On layouts where the package
127+
// manager already resolves onnxruntime-common this is unnecessary anyway.
128+
logger.debug(
129+
{ err: err instanceof Error ? err.message : String(err) },
130+
'onnxruntime-common resolution fallback not installed',
131+
);
132+
}
133+
};

gitnexus/src/mcp/core/embedder.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import {
2222
withHfDownloadRetry,
2323
} from '../../core/embeddings/hf-env.js';
2424
import { getLocalEmbeddingRuntimeBlocker } from '../../core/embeddings/runtime-support.js';
25+
import { ensureOnnxRuntimeCommonResolvable } from '../../core/embeddings/onnxruntime-common-resolver.js';
2526
import { silenceStdout, restoreStdout, realStderrWrite } from '../../core/lbug/pool-adapter.js';
2627

2728
import { logger } from '../../core/logger.js';
@@ -65,6 +66,9 @@ export const initEmbedder = async (): Promise<FeatureExtractionPipeline> => {
6566
try {
6667
// Lazy-load transformers.js only after the runtime guard has passed, so
6768
// unsupported platforms never reach the native ONNX import (#1515).
69+
// Under pnpm-strict / `pnpm dlx`, transformers' phantom `onnxruntime-common`
70+
// import is unresolvable; register the fallback resolver first (#307).
71+
ensureOnnxRuntimeCommonResolvable();
6872
const { pipeline, env } = await import('@huggingface/transformers');
6973

7074
env.allowLocalModels = false;
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
import { describe, it, expect, vi, afterEach } from 'vitest';
2+
3+
/**
4+
* Tests for the #307 pnpm-strict / `pnpm dlx` fix: a synchronous in-thread ESM
5+
* resolution hook (`module.registerHooks`) that redirects @huggingface/
6+
* transformers' phantom `onnxruntime-common` import to gitnexus' own copy.
7+
*
8+
* Each test mocks `node:module` with a chosen `registerHooks` (a spy, or
9+
* `undefined` to simulate Node < 22.15) so we can assert one-shot installation,
10+
* graceful degradation, and the redirect/passthrough/rethrow logic of the
11+
* resolve closure — without mutating the real process loader.
12+
*/
13+
14+
const RESOLVER = '../../src/core/embeddings/onnxruntime-common-resolver.js';
15+
16+
/**
17+
* (Re)load the resolver with a chosen `registerHooks` mocked into node:module.
18+
* `vi.resetModules()` + the fresh `import()` re-initialises the module-level
19+
* one-shot guard, so each test gets a pristine resolver with no shared state.
20+
*/
21+
async function loadResolver(registerHooks: unknown) {
22+
vi.resetModules();
23+
vi.doMock('node:module', async (importOriginal) => {
24+
const orig = await importOriginal<typeof import('node:module')>();
25+
return { ...orig, registerHooks };
26+
});
27+
return import(RESOLVER);
28+
}
29+
30+
const ctx = { conditions: [], importAttributes: {} } as never;
31+
const moduleNotFound = (): Error => {
32+
const e = new Error("Cannot find package 'onnxruntime-common'") as Error & { code: string };
33+
e.code = 'ERR_MODULE_NOT_FOUND';
34+
return e;
35+
};
36+
37+
afterEach(() => {
38+
vi.doUnmock('node:module');
39+
});
40+
41+
describe('ensureOnnxRuntimeCommonResolvable — installation', () => {
42+
it('installs the resolve hook exactly once (idempotent)', async () => {
43+
const spy = vi.fn();
44+
const mod = await loadResolver(spy);
45+
46+
mod.ensureOnnxRuntimeCommonResolvable();
47+
mod.ensureOnnxRuntimeCommonResolvable(); // second call is a no-op
48+
49+
expect(spy).toHaveBeenCalledTimes(1);
50+
expect(typeof spy.mock.calls[0][0].resolve).toBe('function');
51+
});
52+
53+
it('no-ops gracefully when registerHooks is unavailable (Node < 22.15)', async () => {
54+
const mod = await loadResolver(undefined);
55+
// Must not throw even though there is no synchronous-hooks API to call.
56+
expect(() => mod.ensureOnnxRuntimeCommonResolvable()).not.toThrow();
57+
});
58+
59+
it('is best-effort: swallows a registerHooks() failure instead of throwing into the embedder', async () => {
60+
const mod = await loadResolver(
61+
vi.fn(() => {
62+
throw new Error('hook-install-failed');
63+
}),
64+
);
65+
// The call site (initEmbedder) does not guard the return; a throw here would
66+
// break `analyze --embeddings`. The outer try/catch must absorb it.
67+
expect(() => mod.ensureOnnxRuntimeCommonResolvable()).not.toThrow();
68+
});
69+
});
70+
71+
describe('ensureOnnxRuntimeCommonResolvable — resolve hook behaviour', () => {
72+
/** Install the fallback and return the resolve closure handed to registerHooks. */
73+
async function captureResolve() {
74+
const spy = vi.fn();
75+
const mod = await loadResolver(spy);
76+
mod.ensureOnnxRuntimeCommonResolvable();
77+
return spy.mock.calls[0][0].resolve as (
78+
s: string,
79+
c: never,
80+
n: (s: string, c: never) => unknown,
81+
) => unknown;
82+
}
83+
84+
it('passes a successful default resolution through unchanged (no-op on hoisted layouts)', async () => {
85+
const resolve = await captureResolve();
86+
const real = { url: 'file:///real/onnxruntime-common/index.js', shortCircuit: true };
87+
const next = vi.fn(() => real);
88+
89+
const res = resolve('onnxruntime-common', ctx, next);
90+
91+
expect(next).toHaveBeenCalledTimes(1);
92+
expect(res).toBe(real); // the real resolution, NOT a redirect
93+
});
94+
95+
it('redirects onnxruntime-common to the gitnexus copy when default resolution fails', async () => {
96+
const resolve = await captureResolve();
97+
const next = vi.fn(() => {
98+
throw moduleNotFound();
99+
});
100+
101+
const res = resolve('onnxruntime-common', ctx, next) as { url: string; shortCircuit: boolean };
102+
103+
expect(res.shortCircuit).toBe(true);
104+
// The real resolved onnxruntime-common in node_modules (require.resolve runs
105+
// for real here) — not just any path containing the substring.
106+
expect(res.url).toMatch(/^file:\/\/.*\/node_modules\/onnxruntime-common\/.*\.js$/);
107+
});
108+
109+
it('never masks an unrelated resolution failure (other specifiers rethrow)', async () => {
110+
const resolve = await captureResolve();
111+
const err = moduleNotFound();
112+
const next = vi.fn(() => {
113+
throw err;
114+
});
115+
116+
expect(() => resolve('some-other-package', ctx, next)).toThrow(err);
117+
});
118+
119+
it('rethrows when onnxruntime-common fails for a non-absence reason', async () => {
120+
const resolve = await captureResolve();
121+
// A present-but-otherwise-broken resolution (not a missing package) must
122+
// surface, not be silently papered over with gitnexus' copy.
123+
const err = Object.assign(new Error('bad specifier'), {
124+
code: 'ERR_INVALID_MODULE_SPECIFIER',
125+
});
126+
const next = vi.fn(() => {
127+
throw err;
128+
});
129+
130+
expect(() => resolve('onnxruntime-common', ctx, next)).toThrow(err);
131+
});
132+
});

0 commit comments

Comments
 (0)