security: harden input validation#2208
Conversation
- analyze-job: sanitize update objects to prevent prototype pollution - validation: add length limits to escapeRegExp for ReDoS protection - api: validate node labels before Cypher query construction - local-backend: add NodeLabelValidator and query templates
|
Someone is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
magyargergo
left a comment
There was a problem hiding this comment.
PR #2208 Tri-Review — "security: harden input validation"
Methods: 3 — GitNexus swarm (risk, security-boundary, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) + Codex. That's 7 Claude lanes + Codex (Codex is the one independent engine, and it was live). The headline finding was reached independently by Codex and all 7 Claude lanes — the strongest corroboration available here. Two-thirds of the lanes are Claude, so Claude-only agreement is "consistent across personas," not independent confirmation.
Bottom line: No regression and no new vulnerability is introduced, and several suspected vulns were probed and refuted (good). But for a PR titled "harden input validation," a large share of the added code is dead / no-op: the entire new block in local-backend.ts never executes, and api.ts ships two unused security artifacts. The one genuinely-live hardening (api.ts /api/search) is functionally equivalent to a whitelist that already existed. Net security gain is small; the main cost is misleading code + missing tests. Recommend changes before merge.
Headline — local-backend.ts hardening is entirely dead code [P2 · code-read · Codex + 7 Claude lanes]
NodeLabelValidator, FROZEN_VALID_NODE_LABELS, and CYPHER_TEMPLATES (lines 163–258) are referenced only by each other — CYPHER_TEMPLATES has zero external call sites. The real Cypher query sites in this file (lines 2044, 2496) are untouched and still rely on the pre-existing VALID_NODE_LABELS.has(label) whitelist + backtick-quoting. The MCP path gets no hardening from this PR even though the new infrastructure implies it does. Latent extra hazard if it's ever wired up: this file's copied VALID_NODE_LABELS is missing Section/Variable/BasicBlock that canonical NODE_TABLES has — it would silently under-validate.
Scope note: this is the local-backend.ts copy only.
api.tshas its ownCYPHER_TEMPLATES(line 87) that is live — called at 1283/1286/1289 (see Credit). Don't read this as the whole pattern being inert.
Inline findings (anchored on their lines, each with fix + merge-impact)
- [P2]
local-backend.ts:209— deadCYPHER_TEMPLATES/validator block. Fix: delete it, or wire sites 2044/2496 through it. Should block merge — the PR's stated MCP hardening isn't delivered. - [P3]
api.ts:50—DANGEROUS_KEYSdeclared, never referenced in api.ts (live copy is in analyze-job.ts). Fix: delete. Non-blocking hygiene. - [P3]
api.ts:1355—GREP_CONFIG.TIMEOUT_MS // 30 second timeoutis never read; the grep scan loop has no timeout/AbortController. Fix: implement the deadline or drop field+comment. Non-blocking, but this PR introduces the misleading constant. - [P3]
validation.ts:105—MAX_REPLACEMENTS = 10000branch is unreachable (sole caller pre-caps length at 200). Fix: drop it or tie the cap tomaxLength. Non-blocking.
Credit where due (validated, not just asserted)
- The
api.ts/api/searchenrichment refactor is real and correctly wired:NodeLabelValidator.validate()atapi.ts:1276gates the live path, and api.ts's ownCYPHER_TEMPLATES(line 87) is called at 1283/1286/1289. It's a cleaner, slightly stricter version of the pre-existing inlinevalidLabel()whitelist that lived in the same enrichment block — sameNODE_TABLESsource, identical query bodies → behavior-equivalent, no regression. - No regression in
updateJob:ALLOWED_UPDATE_KEYSmatches the function'sPartial<Pick<…>>type exactly, and all ~20 callers pass literal keys → no legitimate field is dropped. - ReDoS genuinely mitigated at
/api/grep: literal mode is forced andescapeRegExpescapes all metacharacters beforenew RegExp(…,'gim'), with length capped at 200.
Probed and refuted (validation is a feature)
- Prototype pollution via
updateJob— no reachable path: every caller passes statically-typed literal keys; no endpoint spreads request-body /JSON.parsekeys into the update.sanitizeUpdateObjectis harmless defense-in-depth against a non-existent vector — it should not be counted as fixing a live vuln. CYPHER_TEMPLATESthrow escaping.catch(()=>[])and rejectingPromise.all(500) — refuted: the label is already validated atapi.ts:1276, so the template's internal re-validation throw is unreachable.- Cypher injection in api.ts enrichment — refuted: whitelist-guarded, equivalent to pre-PR.
- escapeRegExp throw → 500 — refuted: maps to 400 via the global
BadRequestErrorhandler, and is unreachable behind the 200-char pre-check anyway.
Test coverage — zero test files changed (this is a security-labeled PR)
None of the new behavior is covered: escapeRegExp throw + complexity cap, sanitizeUpdateObject key-dropping, and NodeLabelValidator reject branches are all untested. The existing escapeRegExp tests cover only the happy/escaping path, not the length/complexity-cap branches added by this PR — so a green suite gives false confidence. Besides the headline, this is the most actionable gap.
CI
gh pr checks: Vercel = "Authorization required to deploy" (preview-deploy infra auth, not a code-results gate); Apply conventional label passed; Validate PR title skipped. No unit-test job ran the new code (no tests added).
Coverage note
4-file PR, read in full. The ~6000 untouched lines of local-backend.ts were not exhaustively audited for other label-interpolation sites; the two that exist (2044, 2496) are whitelist-guarded.
Automated multi-tool digest (GitNexus swarm + CE personas + Codex). Verify each item before acting.
| * Pre-defined Cypher query templates for safe query construction | ||
| * Uses template functions that validate labels before interpolation | ||
| */ | ||
| const CYPHER_TEMPLATES = Object.freeze({ |
There was a problem hiding this comment.
[P2 · should block merge · code-read] Dead code — this CYPHER_TEMPLATES block (and NodeLabelValidator / FROZEN_VALID_NODE_LABELS above it, lines 163–258) is never called.
CYPHER_TEMPLATES has zero references outside its own definition, and NodeLabelValidator/FROZEN_VALID_NODE_LABELS are used only inside it. The actual Cypher query sites in this file are untouched and still use the pre-existing guard — e.g. local-backend.ts:2038 if (!VALID_NODE_LABELS.has(label)) continue; then inline MATCH (n:\${label}` {id: $nodeId})` at 2044 (also 2496). So the MCP path gets no hardening from this PR despite the new infrastructure implying it does.
Latent hazard if ever wired up: this file's copied VALID_NODE_LABELS is missing Section/Variable/BasicBlock that the canonical NODE_TABLES has → it would silently under-validate.
Fix: delete lines 163–258, or actually route the query sites (2044, 2496) through CYPHER_TEMPLATES. (Note: the parallel copy in api.ts:87 is live and called — this finding is specific to local-backend.ts.)
Reached independently by Codex + all 7 Claude review lanes.
| /** | ||
| * Dangerous keys that could lead to prototype pollution or injection | ||
| */ | ||
| const DANGEROUS_KEYS = new Set(['__proto__', 'constructor', 'prototype']); |
There was a problem hiding this comment.
[P3 · non-blocking · code-read] DANGEROUS_KEYS is declared here but never referenced anywhere in api.ts — its only occurrence is this definition. The live copy that's actually used by sanitizeUpdateObject lives in analyze-job.ts. As written, this implies api.ts has prototype-pollution protection that doesn't exist.
Fix: delete this unused constant (and its JSDoc).
| const GREP_CONFIG = { | ||
| MAX_PATTERN_LENGTH: 200, | ||
| MAX_RESULTS: 200, | ||
| TIMEOUT_MS: 30000, // 30 second timeout |
There was a problem hiding this comment.
[P3 · non-blocking · code-read] TIMEOUT_MS is defined here with a // 30 second timeout comment, but it's never read — the grep scan loop (≈1424–1446) has no AbortController, setTimeout, or elapsed-time check tied to it. The comment advertises a DoS bound that isn't enforced; a slow/large repo scan still runs unbounded (only the 60 rpm rate limit applies).
Fix: either enforce the deadline in the scan loop, or remove TIMEOUT_MS and the comment so the code doesn't claim protection it lacks. (Not a regression — no timeout existed before — but this PR introduces the misleading constant.)
| throw new BadRequestError(`Pattern too long (max ${maxLength} characters)`); | ||
| } | ||
|
|
||
| const MAX_REPLACEMENTS = 10000; |
There was a problem hiding this comment.
[P3 · non-blocking · code-read] The MAX_REPLACEMENTS = 10000 branch is unreachable. The only caller is api.ts:1395 escapeRegExp(pattern, 200), and api.ts:1387 already rejects pattern.length > 200 before the call. With ≤200 input chars there can be at most 200 metacharacter replacements, far below 10000 — and even at the default maxLength = 1000 it can't reach 10000. The replacement counter logic is correct; the throw just can't fire.
Fix: drop the counter/throw (the length cap already bounds work to O(maxLength)), or express the cap relative to maxLength so its relationship is explicit.
Summary
Input validation hardening for prototype pollution, ReDoS, and Cypher injection.
Changes
Security Fixes
__proto__/constructor/prototypekeys