feat(indexing): respect root .gitignore patterns during indexing#607
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for loading and merging .gitignore patterns alongside .cgrignore patterns to prevent indexing build artifacts and generated files. It refactors the loading logic in config.py and updates references across the CLI, main module, and tests, while adding a new test suite for gitignore patterns. The review feedback highlights an issue with the merging logic: using a simple union (|) fails to correctly respect .cgrignore as the authoritative override channel when conflicting exclude/unignore patterns exist. Actionable suggestions were provided to adjust the merge logic in config.py and update the test assertions accordingly.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Greptile SummaryThis PR adds root
Confidence Score: 4/5The change is narrowly scoped, but the ignore-pattern merge can mis-handle ordered negation cases and index files that Git would ignore. The review is based on the touched ignore-loading and indexing call paths plus the new tests around codebase_rag/config.py
What T-Rex did
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
codebase_rag/config.py:421-424
**Preserve gitignore order**
`load_ignore_patterns()` collapses `.gitignore` into sets, so `dist/\n!dist/\ndist/` returns `dist/` only in `unignore` and indexes `dist/`, while Git's documented rule is that within one precedence level the last matching pattern decides the outcome. Repos that re-ignore after a negation will now include generated files that Git ignores; the loader needs to preserve pattern order or compile an ordered `PathSpec` for `.gitignore` instead of subtracting all negations globally.
### Agentic Framework
-... ([source](https://app.greptile.com/graph-code/github/vitali87/code-graph-rag/-/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
Reviews (3): Last reviewed commit: "fix(indexing): cancel gitignore excludes..." | Re-trigger Greptile |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…ores at load time
|
@greptile review |
|
@greptile review |
| negations = cgr.unignore | git.unignore | ||
| return CgrignorePatterns( | ||
| exclude=cgr.exclude | (git.exclude - negations), | ||
| unignore=negations, |
There was a problem hiding this comment.
Preserve gitignore order
load_ignore_patterns() collapses .gitignore into sets, so dist/\n!dist/\ndist/ returns dist/ only in unignore and indexes dist/, while Git's documented rule is that within one precedence level the last matching pattern decides the outcome. Repos that re-ignore after a negation will now include generated files that Git ignores; the loader needs to preserve pattern order or compile an ordered PathSpec for .gitignore instead of subtracting all negations globally.
Rule Used: ## Technical Requirements
Agentic Framework
-... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/config.py
Line: 421-424
Comment:
**Preserve gitignore order**
`load_ignore_patterns()` collapses `.gitignore` into sets, so `dist/\n!dist/\ndist/` returns `dist/` only in `unignore` and indexes `dist/`, while Git's documented rule is that within one precedence level the last matching pattern decides the outcome. Repos that re-ignore after a negation will now include generated files that Git ignores; the loader needs to preserve pattern order or compile an ordered `PathSpec` for `.gitignore` instead of subtracting all negations globally.
**Rule Used:** ## Technical Requirements
### Agentic Framework
-... ([source](https://app.greptile.com/graph-code/github/vitali87/code-graph-rag/-/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise.
Summary
Dogfooding
dead-codeon cgr's own repo indexedevals/results/l3_workspace-- a GITIGNORED directory of generated eval fixtures -- and reported 30 of its symbols as dead code. cgr honors.cgrignoreand--exclude(#495) but never read.gitignore, so build artifacts and generated output pollute the graph for every user.load_ignore_patterns(repo_path): merges root.gitignoreinto the exclude/unignore set alongside.cgrignore(shared_load_ignore_fileparser; both files use gitwildmatch semantics,!negations map to unignores)..cgrignoreremains the override channel: a!patternthere re-includes something.gitignoreexcludes (indexing generated code on purpose).start,index, MCP) switch to the merged loader.Tests (RED -> GREEN)
4 tests in
test_gitignore_patterns.py: gitignore excludes loaded,!negations map to unignores, cgrignore+gitignore merge with override, empty default. Existing cgrignore/CLI tests updated to the new patch target.Full suite: 4408 passed, 14 skipped. ruff + format clean; ty at the pre-existing baseline.