fix(security): prevent binary file injection in text upload pipeline#244
fix(security): prevent binary file injection in text upload pipeline#244Adar5 wants to merge 4 commits intojenkinsci:mainfrom
Conversation
|
the binary upload fix and reformulation loop fix are unrelated changes. i think spliting this into two PRs makes review and potential reverts much cleaner. @berviantoleo be clarify on this !! |
|
And also content[:1024] only catches null bytes in the first 1KB. a crafted file with a clean ASCII header followed by binary payload would may bypass this. i suggest considering checking the full content @Adar5 |
351edb3 to
6fc9971
Compare
|
@sharma-sugurthi Thanks for the thorough review! Splitting the PRs: Good catch on the commits. I had accidentally included the reformulation commit in this branch's history. I've rebased the branch to drop that commit, so this PR is now strictly scoped to the binary upload fix. 1KB Bypass: You are totally right about the polyglot bypass vulnerability. I've updated the logic to check the full content buffer for null bytes instead of slicing it, which closes that loophole safely. The branch has been force-updated with both of these changes! |
Description
Currently, the
/uploadendpoint relies strictly on the file extension to determine content type. This allows users to bypass validation by renaming binary files (images, compiled executables) to.txt.When this happens, the backend processes the binary data as text, truncates it, and injects it into the LLM context window. This pollutes the RAG context and wastes LLM tokens/compute.
The Fix
Implemented a byte-level validator in
file_service.pythat inspects the first 1024 bytes of the uploaded file for null bytes (\x00).415 Unsupported Media Type..seek(0)for valid files to ensure no data loss.Steps to Reproduce the Bug
.pngfile totest.txt./uploadendpoint.200 OKand injecting binary headers into thechat_service.pycontext pipeline.Testing
.txtfiles still process correctly..pngfiles are cleanly rejected with a415error.