Skip to content

security: harden input validation#2208

Open
latte233 wants to merge 2 commits into
abhigyanpatwari:mainfrom
latte233:latte233-fix2
Open

security: harden input validation#2208
latte233 wants to merge 2 commits into
abhigyanpatwari:mainfrom
latte233:latte233-fix2

Conversation

@latte233

Copy link
Copy Markdown

Summary

Input validation hardening for prototype pollution, ReDoS, and Cypher injection.

Changes

  • analyze-job.ts: sanitize update objects to prevent prototype pollution
  • validation.ts: add length limits to escapeRegExp for ReDoS protection
  • api.ts: validate node labels before Cypher query construction
  • local-backend.ts: add NodeLabelValidator and query templates

Security Fixes

  • Prototype pollution via __proto__/constructor/prototype keys
  • ReDoS via overly long/complex regex patterns
  • Cypher injection via unvalidated node labels

- analyze-job: sanitize update objects to prevent prototype pollution
- validation: add length limits to escapeRegExp for ReDoS protection
- api: validate node labels before Cypher query construction
- local-backend: add NodeLabelValidator and query templates
@latte233 latte233 requested a review from magyargergo as a code owner June 15, 2026 10:08
@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2208 Tri-Review — "security: harden input validation"

Methods: 3 — GitNexus swarm (risk, security-boundary, test/CI) + Compound-Engineering personas (correctness, adversarial, maintainability, testing) + Codex. That's 7 Claude lanes + Codex (Codex is the one independent engine, and it was live). The headline finding was reached independently by Codex and all 7 Claude lanes — the strongest corroboration available here. Two-thirds of the lanes are Claude, so Claude-only agreement is "consistent across personas," not independent confirmation.

Bottom line: No regression and no new vulnerability is introduced, and several suspected vulns were probed and refuted (good). But for a PR titled "harden input validation," a large share of the added code is dead / no-op: the entire new block in local-backend.ts never executes, and api.ts ships two unused security artifacts. The one genuinely-live hardening (api.ts /api/search) is functionally equivalent to a whitelist that already existed. Net security gain is small; the main cost is misleading code + missing tests. Recommend changes before merge.

Headline — local-backend.ts hardening is entirely dead code [P2 · code-read · Codex + 7 Claude lanes]

NodeLabelValidator, FROZEN_VALID_NODE_LABELS, and CYPHER_TEMPLATES (lines 163–258) are referenced only by each otherCYPHER_TEMPLATES has zero external call sites. The real Cypher query sites in this file (lines 2044, 2496) are untouched and still rely on the pre-existing VALID_NODE_LABELS.has(label) whitelist + backtick-quoting. The MCP path gets no hardening from this PR even though the new infrastructure implies it does. Latent extra hazard if it's ever wired up: this file's copied VALID_NODE_LABELS is missing Section/Variable/BasicBlock that canonical NODE_TABLES has — it would silently under-validate.

Scope note: this is the local-backend.ts copy only. api.ts has its own CYPHER_TEMPLATES (line 87) that is live — called at 1283/1286/1289 (see Credit). Don't read this as the whole pattern being inert.

Inline findings (anchored on their lines, each with fix + merge-impact)

  • [P2] local-backend.ts:209 — dead CYPHER_TEMPLATES/validator block. Fix: delete it, or wire sites 2044/2496 through it. Should block merge — the PR's stated MCP hardening isn't delivered.
  • [P3] api.ts:50DANGEROUS_KEYS declared, never referenced in api.ts (live copy is in analyze-job.ts). Fix: delete. Non-blocking hygiene.
  • [P3] api.ts:1355GREP_CONFIG.TIMEOUT_MS // 30 second timeout is never read; the grep scan loop has no timeout/AbortController. Fix: implement the deadline or drop field+comment. Non-blocking, but this PR introduces the misleading constant.
  • [P3] validation.ts:105MAX_REPLACEMENTS = 10000 branch is unreachable (sole caller pre-caps length at 200). Fix: drop it or tie the cap to maxLength. Non-blocking.

Credit where due (validated, not just asserted)

  • The api.ts /api/search enrichment refactor is real and correctly wired: NodeLabelValidator.validate() at api.ts:1276 gates the live path, and api.ts's own CYPHER_TEMPLATES (line 87) is called at 1283/1286/1289. It's a cleaner, slightly stricter version of the pre-existing inline validLabel() whitelist that lived in the same enrichment block — same NODE_TABLES source, identical query bodies → behavior-equivalent, no regression.
  • No regression in updateJob: ALLOWED_UPDATE_KEYS matches the function's Partial<Pick<…>> type exactly, and all ~20 callers pass literal keys → no legitimate field is dropped.
  • ReDoS genuinely mitigated at /api/grep: literal mode is forced and escapeRegExp escapes all metacharacters before new RegExp(…,'gim'), with length capped at 200.

Probed and refuted (validation is a feature)

  • Prototype pollution via updateJob — no reachable path: every caller passes statically-typed literal keys; no endpoint spreads request-body / JSON.parse keys into the update. sanitizeUpdateObject is harmless defense-in-depth against a non-existent vector — it should not be counted as fixing a live vuln.
  • CYPHER_TEMPLATES throw escaping .catch(()=>[]) and rejecting Promise.all (500) — refuted: the label is already validated at api.ts:1276, so the template's internal re-validation throw is unreachable.
  • Cypher injection in api.ts enrichment — refuted: whitelist-guarded, equivalent to pre-PR.
  • escapeRegExp throw → 500 — refuted: maps to 400 via the global BadRequestError handler, and is unreachable behind the 200-char pre-check anyway.

Test coverage — zero test files changed (this is a security-labeled PR)

None of the new behavior is covered: escapeRegExp throw + complexity cap, sanitizeUpdateObject key-dropping, and NodeLabelValidator reject branches are all untested. The existing escapeRegExp tests cover only the happy/escaping path, not the length/complexity-cap branches added by this PR — so a green suite gives false confidence. Besides the headline, this is the most actionable gap.

CI

gh pr checks: Vercel = "Authorization required to deploy" (preview-deploy infra auth, not a code-results gate); Apply conventional label passed; Validate PR title skipped. No unit-test job ran the new code (no tests added).

Coverage note

4-file PR, read in full. The ~6000 untouched lines of local-backend.ts were not exhaustively audited for other label-interpolation sites; the two that exist (2044, 2496) are whitelist-guarded.


Automated multi-tool digest (GitNexus swarm + CE personas + Codex). Verify each item before acting.

* Pre-defined Cypher query templates for safe query construction
* Uses template functions that validate labels before interpolation
*/
const CYPHER_TEMPLATES = Object.freeze({

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2 · should block merge · code-read] Dead code — this CYPHER_TEMPLATES block (and NodeLabelValidator / FROZEN_VALID_NODE_LABELS above it, lines 163–258) is never called.

CYPHER_TEMPLATES has zero references outside its own definition, and NodeLabelValidator/FROZEN_VALID_NODE_LABELS are used only inside it. The actual Cypher query sites in this file are untouched and still use the pre-existing guard — e.g. local-backend.ts:2038 if (!VALID_NODE_LABELS.has(label)) continue; then inline MATCH (n:\${label}` {id: $nodeId})` at 2044 (also 2496). So the MCP path gets no hardening from this PR despite the new infrastructure implying it does.

Latent hazard if ever wired up: this file's copied VALID_NODE_LABELS is missing Section/Variable/BasicBlock that the canonical NODE_TABLES has → it would silently under-validate.

Fix: delete lines 163–258, or actually route the query sites (2044, 2496) through CYPHER_TEMPLATES. (Note: the parallel copy in api.ts:87 is live and called — this finding is specific to local-backend.ts.)

Reached independently by Codex + all 7 Claude review lanes.

/**
* Dangerous keys that could lead to prototype pollution or injection
*/
const DANGEROUS_KEYS = new Set(['__proto__', 'constructor', 'prototype']);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · non-blocking · code-read] DANGEROUS_KEYS is declared here but never referenced anywhere in api.ts — its only occurrence is this definition. The live copy that's actually used by sanitizeUpdateObject lives in analyze-job.ts. As written, this implies api.ts has prototype-pollution protection that doesn't exist.

Fix: delete this unused constant (and its JSDoc).

const GREP_CONFIG = {
MAX_PATTERN_LENGTH: 200,
MAX_RESULTS: 200,
TIMEOUT_MS: 30000, // 30 second timeout

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · non-blocking · code-read] TIMEOUT_MS is defined here with a // 30 second timeout comment, but it's never read — the grep scan loop (≈1424–1446) has no AbortController, setTimeout, or elapsed-time check tied to it. The comment advertises a DoS bound that isn't enforced; a slow/large repo scan still runs unbounded (only the 60 rpm rate limit applies).

Fix: either enforce the deadline in the scan loop, or remove TIMEOUT_MS and the comment so the code doesn't claim protection it lacks. (Not a regression — no timeout existed before — but this PR introduces the misleading constant.)

throw new BadRequestError(`Pattern too long (max ${maxLength} characters)`);
}

const MAX_REPLACEMENTS = 10000;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3 · non-blocking · code-read] The MAX_REPLACEMENTS = 10000 branch is unreachable. The only caller is api.ts:1395 escapeRegExp(pattern, 200), and api.ts:1387 already rejects pattern.length > 200 before the call. With ≤200 input chars there can be at most 200 metacharacter replacements, far below 10000 — and even at the default maxLength = 1000 it can't reach 10000. The replacement counter logic is correct; the throw just can't fire.

Fix: drop the counter/throw (the length cap already bounds work to O(maxLength)), or express the cap relative to maxLength so its relationship is explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants