Skip to content

Bleve Code Search: Cannot find binary files by full filename #34332

Open
@wbste

Description

@wbste

Description

Related to: #33828

When using code search, searching for the full or partial filename of binary files (e.g., my document.pdf, image.png, my document) yields no results.

Steps to Reproduce:

  1. Add a binary file (e.g., test report.pdf) to a repository.
  2. Wait for indexing.
  3. Search for test report.pdf or test report.

Actual Behavior:
No search results are returned for the binary file based on its filename.

Expected Behavior:
The search should return the binary file test report.pdf.

Observations:

  • Searching for just the extension (e.g., pdf) does return all PDF files, including test report.pdf.
  • Searching for code files (e.g., README.md) works correctly.
  • Binary file content is correctly not being indexed, the issue is specific to matching the filename string.

Possible Cause:
This might stem from the interaction between the filenameIndexerAnalyzer (using unicode tokenizer + path filter) and the bleve.NewPrefixQuery used for filename searching in modules/indexer/code/bleve/bleve.go. The tokenization of filenames containing spaces/dots combined with a prefix query seems to prevent finding these files by their complete name.

Relevant Code:

  • modules/indexer/code/bleve/bleve.go (Filename query logic & analyzer definition)
  • modules/indexer/code/bleve/token/path/path.go (Path token filter logic)

Gitea Version

1.24.0rc0

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

binary from windows

Database

SQLite

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions