Skip to content

sync: upstream just-bash@2.14.5#10

Merged
subsetpark merged 6 commits into
flowglad-mainfrom
sync-just-bash@2.14.5
May 8, 2026
Merged

sync: upstream just-bash@2.14.5#10
subsetpark merged 6 commits into
flowglad-mainfrom
sync-just-bash@2.14.5

Conversation

@github-actions

@github-actions github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

Automated upstream sync to just-bash@2.14.5 has merge conflicts.

Check out this branch, resolve conflicts at TypeScript-source level, then run:

pnpm install
pnpm --filter just-bash build
pnpm --filter just-bash typecheck
pnpm --filter just-bash exec vitest run src/commands/sqlite3/sqlite3.test.ts src/commands/python3/python3.optin.test.ts
git add -f packages/just-bash/dist

Commit and push to update this PR.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

subsetpark and others added 5 commits May 5, 2026 12:25
* jq: accept literal control chars inside JSON strings

Real jq tolerates JSON strings containing unescaped control characters
(newlines, tabs, carriage returns, etc.) even though RFC 8259 forbids
them. Real-world inputs — webhooks, log lines piped through jq, output
from terminals copy-pasted into a script — frequently contain literal
tabs/newlines inside string values, and rejecting those at the
JSON.parse boundary surprises users who expect parity with jq.

Pre-process the slice we hand to JSON.parse: walk the string tracking
quote/escape state, and when a control char (≤ 0x1f) appears inside a
string, replace it with its JSON escape (\n, \t, \r, \b, \f, or \uXXXX).
Outside strings the input is left untouched, so structural whitespace
keeps its meaning.

Tests cover newline, tab, and CR inside strings; preserved escape
sequences alongside literal control chars; control chars outside
strings (whitespace) leaving compact output unchanged; and concatenated
JSON values where one contains a literal newline.

* add changeset
…er (vercel-labs#218)

* perf(grep): 5-123x speedup via RE2 matcher reuse and literal pre-filter

Addresses three layered hot spots that together produced 700-800ms
per-exec latency for case-insensitive, word-boundary, and alternation
patterns over a ~25k-line corpus.

1. RE2JS Matcher cache (packages/just-bash/src/regex/user-regex.ts)
   regex.test()/exec() each allocated a new RE2 Matcher per call. For a
   recursive grep over 100+ files this meant ~25k Matcher allocations
   per invocation. Cache the Matcher and swap its charSequence in place
   when the input changes. Note: Matcher.resetMatcherInput is unsafe
   with raw strings — the constructor wraps strings via
   MatcherInput.utf16(), but resetMatcherInput assigns its argument
   directly and reset() then calls .length() as a method on the
   assigned value, throwing TypeError on strings. So we mutate the
   cached Utf16MatcherInput.charSequence field directly and reset().

2. Combine test() + exec() in searchContent
   (packages/just-bash/src/commands/search-engine/matcher.ts)
   The fast path called regex.test() to decide if a line matched, then
   regex.exec() again on matched lines for the column position. Replace
   with a single regex.exec() — its return value already encodes both
   the boolean and the position. Saves one RE2.find() per matched line.

3. Substring pre-filter
   (packages/just-bash/src/commands/search-engine/regex.ts)
   Extract must-have literal substrings from the compiled pattern at
   buildRegex time. In the per-line loop, run String.indexOf first; if
   no needle is present, the regex provably cannot match, so RE2 is
   skipped entirely. Handles bare literals, \b...\b wrapped patterns
   (-w), and top-level alternation (-E 'a|b'). For -i, both line and
   needles compare in lowercase. Falls back to the full regex for any
   pattern containing quantifiers, anchors, character classes, or
   other regex metacharacters — false positives are fine, false
   negatives would be bugs.

Benchmark (HTTP-API-fronted virtualfs sandbox backed by Neon Postgres,
recursive grep over 113-file 25,956-line TypeScript source tree, server-
reported ms, p50 of 3 runs after warmup):

  Pattern                  before   after    speedup
  -ri 'async'              680ms    14ms     48x
  -rw 'type'               476ms     5ms     90x
  -rE 'interface|type'     823ms     7ms    123x
  -r  'interface'           44ms     4ms     11x
  -rF 'Promise<'            26ms     3ms      9x
  -rn 'export'              21ms     4ms      5x

Tests:
  - packages/just-bash/src/commands/search-engine/regex.test.ts
                                                   29/29   pass (NEW)
    direct preFilter extractor coverage: 8 happy-path
    (literal / -w wrap / alternation / -F escape / -i / \\n decode),
    16 safety-negative (each quantifier, char class, anchor, dot,
    \\d \\w \\s \\b alone, capture groups, mixed alternation), and
    5 structural (no | split inside groups or [classes], ordering
    preserved, partial \\b not stripped, empty pattern)
  - packages/just-bash/src/regex/user-regex.test.ts (67 tests)
  - packages/just-bash/src/commands/grep/{basic,advanced,perl,exclude,binary}.test.ts
  - packages/just-bash/src/spec-tests/grep/grep-spec.test.ts (332 tests)
  - packages/just-bash/src/comparison-tests/grep.comparison.test.ts
    (validates output against system grep)
  - packages/just-bash/src/commands/{sed,awk,rg}/*.test.ts
    (sister commands sharing UserRegex)

* fix(grep): address Copilot review — hex escape false negatives, stdin preFilter, misleading comment

- Reject \x/\u/\U escapes in literalFromAlternative to prevent false-negative
  pre-filter needles (e.g. \x41 would have produced 'x41' instead of 'A')
- Pass preFilter to searchContent in the stdin code path (was only wired for
  file inputs, missing the perf win on piped input)
- Reword acquireMatcher comment: says resetMatcherInput was used but the code
  explicitly avoids it due to a re2js 1.2.1 bug; clarify charSequence mutation
- Add safety tests for \x and \u escape rejection

* chore: add changeset for grep perf improvement
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@github-actions github-actions Bot added the needs-conflict-resolution Auto-sync PR with merge conflicts label May 7, 2026
@CLAassistant

CLAassistant commented May 7, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 5 committers have signed the CLA.

✅ subsetpark
❌ mamuso
❌ Hazzng
❌ github-actions[bot]
❌ cramforce
You have signed the CLA already but the status is still pending? Let us recheck it.

@subsetpark subsetpark force-pushed the sync-just-bash@2.14.5 branch from 4ff4e40 to f544617 Compare May 8, 2026 19:15

@flowglad-review-service flowglad-review-service Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: 5 comments

this._matcherInput = input;
return this._matcher;
}
if (this._matcherInput !== input) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MUST FIX] packages/just-bash/src/regex/user-regex.ts:96 — The acquireMatcher method mutates this._matcher.matcherInput.charSequence in-place across different UserRegex instances and calls. UserRegex instances are not necessarily single-threaded singletons — if the same UserRegex is reused across parallel command invocations (e.g., two concurrent grep calls sharing the same compiled regex object), two concurrent callers can race on this._matcher and this._matcherInput, leading to incorrect match results or crashes.

More critically, within a single call to searchContent, the firstMatch result from regex.exec(line) at line ~222 is stored, then regex.exec(line) is called again in the onlyMatching/vimgrep loop. Between these calls, regex.lastIndex is advanced by RE2's side effects but the cached Matcher's internal state from the previous exec call is still live. The acquireMatcher path only calls matcher.reset() when the input changes — if the same line is passed twice in a row (first exec to get firstMatch, then loop iterations), reset() is skipped and the Matcher resumes from wherever lastIndex left off. This is by design for global regex iteration, but the reset logic must correctly handle the case where lastIndex was explicitly set to 0 (regex.lastIndex = 0 before the diff's removed calls).

Please verify that: (1) the Matcher's internal position is always consistent with this._lastIndex at the start of acquireMatcher, and (2) concurrent use of a single UserRegex across parallel executions is either impossible or protected.

Reviewed at f544617

// directly and then calls .length() as a method, which throws on a
// raw string). MatcherInput is not exported, so we mutate the existing
// wrapper's charSequence field — Matcher.reset() reads matcherInput.length()
// afterwards, so the new length is picked up correctly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MUST FIX] packages/just-bash/src/regex/user-regex.ts:104 — The comment explains that resetMatcherInput is broken in re2js 1.2.1 and works around it by directly mutating this._matcher.matcherInput.charSequence. This is coupling the implementation to private internals of re2js that could change in any patch release, and it suppresses the type error with as any. If re2js 1.2.1 is confirmed broken, pin it in package.json with an exact version (or at most ~1.2.1) to prevent accidental upgrades that would break this assumption silently. As written, a minor re2js upgrade could change the internal field name or layout, and this code would silently produce wrong matches with no type-level protection.

Reviewed at f544617

for (let i = 0; i < lastIdx; i++) {
const line = lines[i];
// Pre-filter: skip lines that can't contain any required literal.
// For invertMatch=true (-vc), a no-match line still counts, so we

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MUST FIX] packages/just-bash/src/commands/search-engine/matcher.ts:168 — In the onlyMatching branch (line ~230 in the diff), the loop is changed to start from firstMatch instead of a fresh regex.exec(line). This is correct only if regex.lastIndex has already been advanced past the first match before the loop continues with match = regex.exec(line). However, the pre-filter branch that sets firstMatch = null also never calls regex.exec, so regex.lastIndex is 0 at that point — that case is fine. The issue is in the matches && onlyMatching path: the first exec call sets lastIndex past the first match; the loop then calls regex.exec(line) for subsequent matches. This looks correct. But please confirm the case where firstMatch[0].length === 0 (zero-length match): the original code had regex.lastIndex++ inside the loop body — verify that this guard still applies correctly when starting from firstMatch rather than a fresh exec.

Reviewed at f544617

if (core.length === 0) return null;

const alternatives = splitTopLevelAlternation(core);
if (alternatives === null) return null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD FIX] packages/just-bash/src/commands/search-engine/regex.ts:280 — The literalFromAlternative reject-list regex /[dDwWsSbBAZzGQE0-9ckpPNXRxuU]/ uses character ranges 0-9 and A-Z to block numeric and uppercase back-references. However, the range A-Z in a character class covers all uppercase letters, including some that are only used as literal-escape sequences in certain regex flavors (e.g., \C, \F, \H, \I, etc.). This is an overly broad rejection — fine for safety, but the intent comment only mentions \d, \w, \s, \b, \B. More importantly, the range 0-9 would block \0 through \9, which are back-references in BRE/ERE and octal escapes in PCRE, so rejection is correct. Please add a test case or comment that explicitly documents why the full A-Z range is rejected rather than individual characters, to make future maintenance safer.

Reviewed at f544617

padding: "60px 80px",
padding: "60px 200px",
}}
>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD FIX] examples/website/app/opengraph-image.tsx:33 — The <svg> element uses fill-rule and clip-rule as HTML attribute names, but in JSX/React these must be camelCased: fillRule and clipRule. Using hyphenated attribute names in JSX will produce a React warning and the attributes will be ignored in the rendered output (the triangle fill may not render correctly).

Reviewed at f544617

@flowglad-review-service

Copy link
Copy Markdown

Review Summary

Five issues found across the diff:

  1. Must-fix (user-regex.ts): The cached-Matcher optimization in acquireMatcher mutates internal re2js state (matcherInput.charSequence) that is both private/untyped and shared across all calls to the same UserRegex instance. The correctness argument depends on call ordering and single-threaded access; please verify that concurrent grep invocations sharing a UserRegex cannot race on the cached Matcher, and that lastIndex is always consistent with the Matcher's internal seek position when reset() is skipped.

  2. Must-fix (user-regex.ts): The workaround explicitly couples to re2js internals due to a reported bug in 1.2.1. The re2js version should be pinned exactly (or with ~) to prevent a silent breakage if the internal structure changes in a later release.

  3. Must-fix (matcher.ts): The onlyMatching and vimgrep loop refactoring reuses firstMatch as the loop seed. The zero-length-match guard (regex.lastIndex++) must still fire correctly when starting from the cached firstMatch result — please verify this edge case.

  4. Should-fix (regex.ts): The reject-list in literalFromAlternative uses broad character ranges (0-9, A-Z) without a comment explaining why the full ranges are needed. This makes future maintenance risky; a targeted list or explanatory comment is needed.

  5. Should-fix (opengraph-image.tsx): JSX requires camelCase SVG attribute names (fillRule, clipRule), not the hyphenated HTML forms (fill-rule, clip-rule). The triangle may not render correctly.

Severity Count
Must-fix 3
Should-fix 2
Note 0

5 comments posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 23810 in / 1846 out · Cache: 6103 created / 0 read · Context: 2 items · Review mode: one-shot

@subsetpark subsetpark merged commit 718e5fd into flowglad-main May 8, 2026
2 of 3 checks passed
@subsetpark subsetpark deleted the sync-just-bash@2.14.5 branch May 8, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-conflict-resolution Auto-sync PR with merge conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants