Skip to content

Support negation in excluded_patterns to allow exceptions #1778

@petrarca

Description

@petrarca

Problem

First of all — thank you for building CocoIndex. It's a fantastic piece of engineering and we're using it extensively as the foundation for our enterprise code search and analysis platform. The performance, the incremental processing, and the Tree-sitter integration are excellent.

One thing we're running into: excluded_patterns always overrides included_patterns (documented behavior). This makes it impossible to express:

"Exclude all dot-directories, but include .github/workflows"

CI/CD workflow files (.github/workflows/*.yml) contain valuable context for code analysis and AI agents, but the only way to exclude all other dot-directories (.git, .vscode, .idea, .ruff_cache, etc.) while keeping .github is to enumerate every single dot-directory explicitly — currently ~30 patterns and growing with every new tool.

Current workaround

Replace the single **/.* exclusion with an explicit list of every dot-directory to exclude:

excluded_patterns=[
    "**/.git",
    "**/.vscode", 
    "**/.idea",
    "**/.ruff_cache",
    "**/.pytest_cache",
    # ... 25+ more entries, grows over time
]

This is fragile and requires updating whenever a new dot-directory convention appears.

Desired behavior

Support ! negation prefix in excluded_patterns, following .gitignore semantics (patterns evaluated in order, last match wins):

cocoindex.sources.LocalFile(
    path="/path/to/repo",
    included_patterns=["*.py", "*.yml", "*.yaml"],
    excluded_patterns=[
        "**/.* ",           # exclude all dot-directories
        "!**/.github/**",  # but allow .github through
    ],
)

Other use cases

This pattern is useful for any "exclude a category but keep specific exceptions":

# Exclude all test directories except integration tests
excluded_patterns=["**/test/**", "!**/test/integration/**"]

# Exclude vendor but keep a specific vendored library
excluded_patterns=["**/vendor/**", "!**/vendor/internal-lib/**"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions