fix: write binary response content as raw bytes (issue #242)#243
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d94cbba329
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """ | ||
|
|
||
| @patch('awscurl.awscurl.__send_request') | ||
| @patch('builtins.open', new_callable=lambda: __builtins__['open']) |
There was a problem hiding this comment.
Remove the broken open patch from the new test
In the CI matrix's normal Python 3.10+ module context, __builtins__ is a module rather than a dict, so this new_callable raises TypeError: 'module' object is not subscriptable as soon as the test's patch context is entered. That makes the newly added pytest case fail before it exercises the response handling; this patch is also unnecessary because the test never calls open.
Useful? React with 👍 / 👎.
|
|
||
| print(response.text) | ||
| # Write response body to stdout as raw bytes (matching curl behavior) | ||
| sys.stdout.buffer.write(response.content) |
There was a problem hiding this comment.
Flush text headers before writing raw body bytes
When --include is used with stdout redirected or piped, the header line is written through sys.stdout's text buffer but the body now bypasses it via sys.stdout.buffer; because the print(response.headers, ...) above is not flushed, Python can emit the raw body before the headers. This regresses the combined header/body output ordering that previously stayed within the text stream, so flush sys.stdout before writing response.content or write both parts through the same binary path.
Useful? React with 👍 / 👎.
Always write response.content as raw bytes to both stdout and -o file, matching curl's behavior. Previously response.text was used which decoded non-UTF-8 bytes as U+FFFD, corrupting binary files (gzip, images, PDFs). The --data-binary flag was incorrectly controlling response encoding — it only affects request body hashing, not response output. Signed-off-by: igor <igor@earendil.works>
d94cbba to
1871ba2
Compare
Add # type: ignore[assignment] for pre-existing CaseInsensitiveDict
reassignment patterns that were flagged by mypy after version update.
Also fix type ignore for resp.headers = {} in test mocks.
These are pre-existing issues in the repo, not introduced by the
binary response fix.
… (PR #243) - Remove unnecessary open patch from test_binary_response_content_preserved_in_output that crashes on Python 3.10+ (__builtins__ is a module, not a dict). The patch is unused since the test never calls open. - Flush sys.stdout before writing response.content to stdout.buffer. This ensures text-mode output (--include headers) is written before the binary body, matching curl's ordering guarantees.
Summary
Binary responses (gzip, images, PDFs, etc.) were corrupted when written via
-oor printed to stdout, becauseresponse.textwas used which decodes non-UTF-8 bytes asU+FFFD(ef bf bd). This makes it impossible to download binary files like gzip archives.Changes
print(response.text)→sys.stdout.buffer.write(response.content)— writes raw bytes-ofile: Alwaysopen("wb")+write(response.content)— no longer tied to--data-binary--data-binaryonly affects request body hashing, not response encoding (its original purpose)Behavior
Matches
curlexactly — always writes raw bytes, no text/binary toggle.Related
-ofile encoding)Testing
TestBinaryResponseOutputclass with 2 unit tests covering gzip magic bytes and binary content preservation