Description
Description
Related to: #33828
When using code search, searching for the full or partial filename of binary files (e.g., my document.pdf
, image.png
, my document
) yields no results.
Steps to Reproduce:
- Add a binary file (e.g.,
test report.pdf
) to a repository. - Wait for indexing.
- Search for
test report.pdf
ortest report
.
Actual Behavior:
No search results are returned for the binary file based on its filename.
Expected Behavior:
The search should return the binary file test report.pdf
.
Observations:
- Searching for just the extension (e.g.,
pdf
) does return all PDF files, includingtest report.pdf
. - Searching for code files (e.g.,
README.md
) works correctly. - Binary file content is correctly not being indexed, the issue is specific to matching the filename string.
Possible Cause:
This might stem from the interaction between the filenameIndexerAnalyzer
(using unicode
tokenizer + path
filter) and the bleve.NewPrefixQuery
used for filename searching in modules/indexer/code/bleve/bleve.go
. The tokenization of filenames containing spaces/dots combined with a prefix query seems to prevent finding these files by their complete name.
Relevant Code:
modules/indexer/code/bleve/bleve.go
(Filename query logic & analyzer definition)modules/indexer/code/bleve/token/path/path.go
(Path token filter logic)
Gitea Version
1.24.0rc0
Can you reproduce the bug on the Gitea demo site?
Yes
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
No response
How are you running Gitea?
binary from windows
Database
SQLite