Skip to content

fix(security): prevent binary file injection in text upload pipeline#244

Open
Adar5 wants to merge 4 commits intojenkinsci:mainfrom
Adar5:fix/binary-upload-spoofing
Open

fix(security): prevent binary file injection in text upload pipeline#244
Adar5 wants to merge 4 commits intojenkinsci:mainfrom
Adar5:fix/binary-upload-spoofing

Conversation

@Adar5
Copy link
Contributor

@Adar5 Adar5 commented Mar 5, 2026

Description

Currently, the /upload endpoint relies strictly on the file extension to determine content type. This allows users to bypass validation by renaming binary files (images, compiled executables) to .txt.

When this happens, the backend processes the binary data as text, truncates it, and injects it into the LLM context window. This pollutes the RAG context and wastes LLM tokens/compute.

The Fix

Implemented a byte-level validator in file_service.py that inspects the first 1024 bytes of the uploaded file for null bytes (\x00).

  • If binary signatures are detected, it fast-fails with a 415 Unsupported Media Type.
  • File pointers are safely reset via .seek(0) for valid files to ensure no data loss.

Steps to Reproduce the Bug

  1. Rename a .png file to test.txt.
  2. Upload via the Swagger /upload endpoint.
  3. Observe the API returning 200 OK and injecting binary headers into the chat_service.py context pipeline.

Testing

  • Verified valid .txt files still process correctly.
  • Verified disguised .png files are cleanly rejected with a 415 error.

@Adar5 Adar5 requested a review from a team as a code owner March 5, 2026 08:08
@berviantoleo berviantoleo added the enhancement For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Mar 6, 2026
@sharma-sugurthi
Copy link
Contributor

the binary upload fix and reformulation loop fix are unrelated changes. i think spliting this into two PRs makes review and potential reverts much cleaner. @berviantoleo be clarify on this !!

@sharma-sugurthi
Copy link
Contributor

And also content[:1024] only catches null bytes in the first 1KB. a crafted file with a clean ASCII header followed by binary payload would may bypass this. i suggest considering checking the full content @Adar5

@Adar5 Adar5 force-pushed the fix/binary-upload-spoofing branch from 351edb3 to 6fc9971 Compare March 7, 2026 04:35
@Adar5
Copy link
Contributor Author

Adar5 commented Mar 7, 2026

@sharma-sugurthi Thanks for the thorough review!

Splitting the PRs: Good catch on the commits. I had accidentally included the reformulation commit in this branch's history. I've rebased the branch to drop that commit, so this PR is now strictly scoped to the binary upload fix.

1KB Bypass: You are totally right about the polyglot bypass vulnerability. I've updated the logic to check the full content buffer for null bytes instead of slicing it, which closes that loophole safely.

The branch has been force-updated with both of these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants