Skip to content

Commit f1c7934

Browse files
fix: prevent OOM crash from globby's eager .gitignore scanning in lis… (cline#9917)
* fix: prevent OOM crash from globby's eager .gitignore scanning in listFiles Replace globby's gitignore:true (which reads ALL .gitignore files in the entire tree upfront, including inside gitignored directories) with incremental .gitignore reading during BFS traversal. In projects with large gitignored vendored dependencies containing many nested repos, globby collects thousands of patterns, builds a massive regex, and V8 runs out of memory during regex compilation (~488MB). The fix reads .gitignore files only from directories the BFS actually enters. Gitignored directories are never entered, so their .gitignore files are never parsed and the pattern count stays small. - Set gitignore:false, handle .gitignore ourselves - Read root .gitignore in buildIgnorePatterns() to seed initial patterns - Read subdirectory .gitignore lazily during globbyLevelByLevel BFS - Accumulate patterns in currentIgnore so deeper levels respect them - Add 4 tests: root patterns, file patterns, subdirectory .gitignore, and OOM-prevention (no reading inside gitignored dirs) * Code review followup * Potential fix for code scanning alert no. 147: Incomplete string escaping or encoding Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1 parent 1d42da5 commit f1c7934

File tree

2 files changed

+247
-10
lines changed

2 files changed

+247
-10
lines changed

src/services/glob/__tests__/list-files.test.ts

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,157 @@ describe("listFiles", () => {
3737
files.map(normalizeForComparison).should.containEql(normalizeForComparison(nestedFile))
3838
})
3939
})
40+
41+
describe("listFiles gitignore handling", () => {
42+
// Each test gets its own isolated subdirectory to avoid cross-test pollution.
43+
// Previous version shared a single tmpDir, which meant later tests could
44+
// overwrite earlier .gitignore files and pass for the wrong reasons.
45+
const baseDir = path.join(os.tmpdir(), `cline-gitignore-test-${Math.random().toString(36).slice(2)}`)
46+
47+
after(async () => {
48+
await fs.rm(baseDir, { recursive: true, force: true }).catch(() => undefined)
49+
})
50+
51+
it("excludes files matching root .gitignore directory patterns", async () => {
52+
// Verifies the most common .gitignore use case: a directory pattern like "some-dir/"
53+
// at the project root excludes that directory and everything inside it.
54+
//
55+
// project/
56+
// .gitignore → "ignored-dir/"
57+
// visible.ts
58+
// ignored-dir/
59+
// secret.ts ← should be excluded
60+
// src/
61+
// app.ts
62+
const project = path.join(baseDir, "test-root-gitignore")
63+
await fs.mkdir(path.join(project, "ignored-dir"), { recursive: true })
64+
await fs.mkdir(path.join(project, "src"), { recursive: true })
65+
await fs.writeFile(path.join(project, ".gitignore"), "ignored-dir/\n")
66+
await fs.writeFile(path.join(project, "visible.ts"), "export const x = 1\n")
67+
await fs.writeFile(path.join(project, "ignored-dir", "secret.ts"), "secret\n")
68+
await fs.writeFile(path.join(project, "src", "app.ts"), "app\n")
69+
70+
const [files] = await listFiles(project, true, 200)
71+
const normalized = files.map(normalizeForComparison)
72+
73+
normalized.should.containEql(normalizeForComparison(path.join(project, "visible.ts")))
74+
normalized.should.containEql(normalizeForComparison(path.join(project, "src", "app.ts")))
75+
76+
const hasIgnoredContent = normalized.some((f) => f.includes("ignored-dir"))
77+
hasIgnoredContent.should.equal(false, "ignored-dir/ contents should be excluded by root .gitignore")
78+
})
79+
80+
it("excludes files matching .gitignore file patterns (not just directories)", async () => {
81+
// The .gitignore parser handles two kinds of patterns differently:
82+
// - Directory patterns ending in "/" → converted to "**/dir/**"
83+
// - File/glob patterns like "*.log" → converted to "**/*.log" + "**/*.log/**"
84+
// This test exercises the file pattern branch.
85+
//
86+
// project/
87+
// .gitignore → "*.log\nsecret.env"
88+
// app.ts
89+
// debug.log ← should be excluded
90+
// src/
91+
// nested.log ← should also be excluded (pattern is global)
92+
// secret.env ← should be excluded
93+
// config.ts
94+
const project = path.join(baseDir, "test-file-patterns")
95+
await fs.mkdir(path.join(project, "src"), { recursive: true })
96+
await fs.writeFile(path.join(project, ".gitignore"), "*.log\nsecret.env\n")
97+
await fs.writeFile(path.join(project, "app.ts"), "app\n")
98+
await fs.writeFile(path.join(project, "debug.log"), "debug output\n")
99+
await fs.writeFile(path.join(project, "src", "nested.log"), "nested log\n")
100+
await fs.writeFile(path.join(project, "src", "secret.env"), "API_KEY=xxx\n")
101+
await fs.writeFile(path.join(project, "src", "config.ts"), "config\n")
102+
103+
const [files] = await listFiles(project, true, 200)
104+
const normalized = files.map(normalizeForComparison)
105+
106+
normalized.should.containEql(normalizeForComparison(path.join(project, "app.ts")))
107+
normalized.should.containEql(normalizeForComparison(path.join(project, "src", "config.ts")))
108+
109+
const hasLogFiles = normalized.some((f) => f.endsWith(".log"))
110+
hasLogFiles.should.equal(false, "*.log files should be excluded")
111+
112+
const hasSecretEnv = normalized.some((f) => f.includes("secret.env"))
113+
hasSecretEnv.should.equal(false, "secret.env should be excluded")
114+
})
115+
116+
it("reads .gitignore from subdirectories during BFS traversal", async () => {
117+
// .gitignore files aren't only at the root — subdirectories can have their own.
118+
// During BFS, when we enter a non-ignored directory, we read its .gitignore
119+
// and add those patterns to the accumulator for all deeper traversal.
120+
//
121+
// project/
122+
// src/
123+
// .gitignore → "generated/"
124+
// code.ts
125+
// generated/
126+
// output.ts ← should be excluded by src/.gitignore
127+
// lib/
128+
// util.ts
129+
const project = path.join(baseDir, "test-subdirectory-gitignore")
130+
const srcDir = path.join(project, "src")
131+
const genDir = path.join(srcDir, "generated")
132+
const libDir = path.join(project, "lib")
133+
await fs.mkdir(genDir, { recursive: true })
134+
await fs.mkdir(libDir, { recursive: true })
135+
await fs.writeFile(path.join(srcDir, ".gitignore"), "generated/\n")
136+
await fs.writeFile(path.join(srcDir, "code.ts"), "code\n")
137+
await fs.writeFile(path.join(genDir, "output.ts"), "generated output\n")
138+
await fs.writeFile(path.join(libDir, "util.ts"), "util\n")
139+
140+
const [files] = await listFiles(project, true, 200)
141+
const normalized = files.map(normalizeForComparison)
142+
143+
normalized.should.containEql(normalizeForComparison(path.join(srcDir, "code.ts")))
144+
normalized.should.containEql(normalizeForComparison(path.join(libDir, "util.ts")))
145+
146+
const hasGeneratedContent = normalized.some((f) => f.includes("generated"))
147+
hasGeneratedContent.should.equal(false, "src/generated/ should be excluded by src/.gitignore")
148+
})
149+
150+
it("does not read .gitignore from inside gitignored directories", async () => {
151+
// This is the core OOM-prevention test.
152+
//
153+
// The crash scenario: a gitignored directory (e.g., third-party/) contains
154+
// hundreds of nested repos, each with their own .gitignore. globby's old
155+
// gitignore:true would read ALL of them upfront, build a massive regex,
156+
// and OOM during V8 regex compilation.
157+
//
158+
// With incremental reading, we never enter third-party/ because the root
159+
// .gitignore excludes it, so we never read any .gitignore files inside it.
160+
//
161+
// NOTE: We intentionally use "third-party/" instead of "vendor/" here because
162+
// "vendor" is in DEFAULT_IGNORE_DIRECTORIES and would be excluded regardless
163+
// of .gitignore. Using a name NOT in that list proves the .gitignore-based
164+
// exclusion is actually working.
165+
//
166+
// project/
167+
// .gitignore → "third-party/"
168+
// app.ts
169+
// third-party/
170+
// .gitignore ← should never be read
171+
// repo1/
172+
// .gitignore ← should never be read
173+
// file.ts
174+
const project = path.join(baseDir, "test-no-read-inside-ignored")
175+
const thirdPartyDir = path.join(project, "third-party")
176+
const repo1Dir = path.join(thirdPartyDir, "repo1")
177+
await fs.mkdir(repo1Dir, { recursive: true })
178+
await fs.writeFile(path.join(project, ".gitignore"), "third-party/\n")
179+
await fs.writeFile(path.join(project, "app.ts"), "app\n")
180+
// These .gitignore files simulate the nested repos that caused OOM
181+
await fs.writeFile(path.join(thirdPartyDir, ".gitignore"), "*.log\nbuild/\n")
182+
await fs.writeFile(path.join(repo1Dir, ".gitignore"), "dist/\ncoverage/\n")
183+
await fs.writeFile(path.join(repo1Dir, "file.ts"), "file\n")
184+
185+
const [files] = await listFiles(project, true, 200)
186+
const normalized = files.map(normalizeForComparison)
187+
188+
normalized.should.containEql(normalizeForComparison(path.join(project, "app.ts")))
189+
190+
const hasThirdPartyContent = normalized.some((f) => f.includes("third-party"))
191+
hasThirdPartyContent.should.equal(false, "third-party/ contents should be excluded — and its .gitignore files never read")
192+
})
193+
})

src/services/glob/list-files.ts

Lines changed: 93 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import { workspaceResolver } from "@core/workspace"
22
import { isDirectory } from "@utils/fs"
33
import { arePathsEqual } from "@utils/path"
4+
import * as fs from "fs/promises"
45
import { globby, Options } from "globby"
56
import * as os from "os"
67
import * as path from "path"
@@ -46,7 +47,56 @@ function isTargetingHiddenDirectory(absolutePath: string): boolean {
4647
return dirName.startsWith(".")
4748
}
4849

49-
function buildIgnorePatterns(absolutePath: string): string[] {
50+
/**
51+
* Read a .gitignore file and convert its patterns to glob ignore patterns.
52+
*
53+
* We do NOT use globby's built-in `gitignore: true` option because it recursively
54+
* reads ALL .gitignore files in the entire directory tree upfront - including those
55+
* inside directories that are themselves gitignored. In projects with large gitignored
56+
* directories containing many nested repos (each with their own .gitignore), this
57+
* causes V8 to run out of memory during regex compilation, crashing the extension host.
58+
*
59+
* Instead, we read .gitignore files incrementally during BFS traversal: only from
60+
* directories we actually enter (which are not ignored), never from ignored directories.
61+
*/
62+
async function readGitignorePatterns(dirPath: string): Promise<string[]> {
63+
try {
64+
const gitignorePath = path.join(dirPath, ".gitignore")
65+
const content = await fs.readFile(gitignorePath, "utf8")
66+
const patterns: string[] = []
67+
68+
for (const line of content.split("\n")) {
69+
const trimmed = line.trim()
70+
// Skip empty lines and comments
71+
if (!trimmed || trimmed.startsWith("#")) {
72+
continue
73+
}
74+
// Skip negation patterns - they're complex to convert and rarely
75+
// critical for the directory listing use case
76+
if (trimmed.startsWith("!")) {
77+
continue
78+
}
79+
// Convert gitignore patterns to glob ignore patterns
80+
if (trimmed.endsWith("/")) {
81+
// Directory pattern: "ignored-dir/" → match the directory itself and its contents.
82+
// Two explicit patterns avoid ambiguity across glob library versions:
83+
const dirName = trimmed.slice(0, -1)
84+
patterns.push(`**/${dirName}`)
85+
patterns.push(`**/${dirName}/**`)
86+
} else {
87+
// File or ambiguous pattern: "*.log" -> "**/*.log" and "**/*.log/**"
88+
patterns.push(`**/${trimmed}`)
89+
patterns.push(`**/${trimmed}/**`)
90+
}
91+
}
92+
93+
return patterns
94+
} catch {
95+
return []
96+
}
97+
}
98+
99+
async function buildIgnorePatterns(absolutePath: string): Promise<string[]> {
50100
const isTargetHidden = isTargetingHiddenDirectory(absolutePath)
51101

52102
const patterns = [...DEFAULT_IGNORE_DIRECTORIES]
@@ -56,7 +106,15 @@ function buildIgnorePatterns(absolutePath: string): string[] {
56106
patterns.push(".*")
57107
}
58108

59-
return patterns.map((dir) => `**/${dir}/**`)
109+
const globPatterns = patterns.map((dir) => `**/${dir}/**`)
110+
111+
// Read root .gitignore to seed the initial ignore patterns.
112+
// Additional .gitignore files from subdirectories are read incrementally
113+
// during BFS traversal in globbyLevelByLevel().
114+
const gitignorePatterns = await readGitignorePatterns(absolutePath)
115+
globPatterns.push(...gitignorePatterns)
116+
117+
return globPatterns
60118
}
61119

62120
export async function listFiles(dirPath: string, recursive: boolean, limit: number): Promise<[string[], boolean]> {
@@ -78,8 +136,8 @@ export async function listFiles(dirPath: string, recursive: boolean, limit: numb
78136
dot: true, // do not ignore hidden files/directories
79137
absolute: true,
80138
markDirectories: true, // Append a / on any directories matched
81-
gitignore: recursive, // globby ignores any files that are gitignored
82-
ignore: recursive ? buildIgnorePatterns(absolutePath) : undefined,
139+
gitignore: false, // We handle .gitignore ourselves incrementally during BFS to avoid OOM
140+
ignore: recursive ? await buildIgnorePatterns(absolutePath) : undefined,
83141
onlyFiles: false, // include directories in results
84142
suppressErrors: true,
85143
}
@@ -95,6 +153,9 @@ Breadth-first traversal of directory structure level by level up to a limit:
95153
- Processes directory patterns level by level
96154
- Captures a representative sample of the directory structure up to the limit
97155
- Minimizes risk of missing deeply nested files
156+
- Reads .gitignore files incrementally from each non-ignored directory entered,
157+
avoiding the OOM crash caused by globby's gitignore:true reading ALL nested
158+
.gitignore files upfront (including those inside gitignored directories)
98159
99160
- Notes:
100161
- Relies on globby to mark directories with /
@@ -104,23 +165,45 @@ Breadth-first traversal of directory structure level by level up to a limit:
104165
async function globbyLevelByLevel(limit: number, options?: Options) {
105166
const results: Set<string> = new Set()
106167
const queue: string[] = ["*"]
168+
// Track all ignore patterns, starting with whatever was passed in options.
169+
// We'll add patterns from .gitignore files as we discover non-ignored directories.
170+
const currentIgnore: string[] = [...((options?.ignore as string[]) ?? [])]
107171

108172
const globbingProcess = async () => {
109173
while (queue.length > 0 && results.size < limit) {
110174
const pattern = queue.shift()!
111-
const filesAtLevel = await globby(pattern, options)
175+
// Use current accumulated ignore patterns for each globby call
176+
const currentOptions = { ...options, ignore: currentIgnore }
177+
const filesAtLevel = await globby(pattern, currentOptions)
112178

113179
for (const file of filesAtLevel) {
114180
if (results.size >= limit) {
115181
break
116182
}
117183
results.add(file)
118184
if (file.endsWith("/")) {
119-
// Escape parentheses in the path to prevent glob pattern interpretation
120-
// This is crucial for NextJS folder naming conventions which use parentheses like (auth), (dashboard)
121-
// Without escaping, glob treats parentheses as special pattern grouping characters
122-
const escapedFile = file.replace(/\(/g, "\\(").replace(/\)/g, "\\)")
123-
queue.push(`${escapedFile}*`)
185+
// This directory passed the ignore filters, so it's not gitignored.
186+
// Read its .gitignore (if any) and add patterns to the ignore list
187+
// so deeper traversal respects them.
188+
const dirGitignorePatterns = await readGitignorePatterns(file)
189+
if (dirGitignorePatterns.length > 0) {
190+
currentIgnore.push(...dirGitignorePatterns)
191+
}
192+
193+
// Queue as a RELATIVE path to cwd so that ignore patterns (like **/tmp/**)
194+
// are checked against relative entry paths, not absolute ones. Using absolute
195+
// patterns causes false matches when the project is under a directory whose
196+
// name collides with DEFAULT_IGNORE_DIRECTORIES (e.g., /tmp on Linux).
197+
const cwd = options?.cwd?.toString() ?? ""
198+
const relativeDir = path.relative(cwd, file)
199+
// Escape backslashes and parentheses in the path to prevent glob pattern interpretation.
200+
// This is crucial for NextJS folder naming conventions which use parentheses like (auth), (dashboard).
201+
// Without escaping, glob treats backslashes as escapes and parentheses as special pattern grouping characters.
202+
const escapedDir = relativeDir
203+
.replace(/\\/g, "\\\\")
204+
.replace(/\(/g, "\\(")
205+
.replace(/\)/g, "\\)")
206+
queue.push(`${escapedDir}/*`)
124207
}
125208
}
126209
}

0 commit comments

Comments
 (0)