fix: header duplication & JSON parsing#37
Conversation
Refactor error handling for connection issues in auth_capture_proxy.py to provide more specific error messages.
- Fix regex pattern for path normalization: r"/ +" should be r"/+". - Add an instance variable to hold background tasks.
WalkthroughReplaced aiohttp error handling with httpx equivalents, added guarded JSON decoding based on Content-Type, cloned request headers per-request into Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Proxy as Proxy\n(auth_capture_proxy)
participant HTTPX as httpx.AsyncClient
participant Upstream as Upstream\nHost
Client->>Proxy: HTTP request
Proxy->>Proxy: clone headers -> req_headers\ninspect Content-Type
alt content-type is application/json or endswith +json
Proxy->>Proxy: attempt req_body.json() (guarded)\non JSONDecodeError/ValueError -> use text/bytes
Proxy->>HTTPX: forward request with req_headers + JSON/body
else multipart/form or other
Proxy->>HTTPX: forward request with req_headers + form/body
end
HTTPX->>Upstream: send request
Upstream-->>HTTPX: response
HTTPX-->>Proxy: response
Proxy-->>Client: proxied response
Note right of Proxy: stop endpoint schedules\nbackground stop task -> tracked in _background_tasks
Proxy->>Proxy: background task closes session via HTTPX
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
475-504:⚠️ Potential issue | 🟠 MajorRemove duplicate
TimeoutExceptionhandler (line 491 is unreachable).httpx.TimeoutException is caught at line 479, making the second handler at line 491 dead code. The second handler has more helpful logging and a detailed error message that should be preserved. Remove the first handler to keep the more informative one.
🛠️ Proposed fix
- except httpx.TimeoutException as ex: - return await self._build_response( - text=f"Error connecting to {site}; request timed out: {ex}" - )
🧹 Nitpick comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
326-336: Normalize multipart part content-type before comparison.
Strict equality misses common values likeapplication/json; charset=utf-8, so JSON parts can be skipped unintentionally.♻️ Proposed fix
- elif hdrs.CONTENT_TYPE in part.headers: - if part.headers[hdrs.CONTENT_TYPE] == "application/json": + elif hdrs.CONTENT_TYPE in part.headers: + content_type = part.headers.get(hdrs.CONTENT_TYPE, "") + if content_type.split(";", 1)[0].strip() == "application/json":
alandtse
left a comment
There was a problem hiding this comment.
Your AI is truncating things weirdly and I'm also not going to trust it given how low level this is. Please add tests cases for test driven development. The test should demonstrate the bug you're trying to fix (and without any patches should fail). Once you implement the fix, the test should then be fixed.
- 59: Fixed "Args: proxy_url (URL):" truncation generated by AI. - 326-336: Normalize multipart part content-type before comparison. - 475-504: Removed duplicate `TimeoutException` handler (line 491 is unreachable).
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
479-504:⚠️ Potential issue | 🟠 MajorUnreachable
TimeoutExceptionhandler.
httpx.TimeoutExceptionis a subclass ofhttpx.HTTPError. Theexcept httpx.HTTPErrorblock on line 487 will catch timeout exceptions before they reach line 491, making theTimeoutExceptionhandler dead code.🐛 Proposed fix: reorder exception handlers
except httpx.ConnectError as ex: return await self._build_response( text=f"Error connecting to {site}; please retry: {ex}" ) except httpx.TooManyRedirects as ex: return await self._build_response( text=f"Error connecting to {site}; too many redirects: {ex}" ) - except httpx.HTTPError as ex: - return await self._build_response( - text=f"Error connecting to {site}: {ex}" - ) except httpx.TimeoutException as ex: _LOGGER.warning( "Timeout during proxy request to %s: %s", site, ex.__class__.__name__, ) return await self._build_response( text=( "Timed out while contacting the service during login.\n\n" "This is usually caused by slow or blocked network access. " "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, " "and that the service endpoint is reachable from this host." ) ) + except httpx.HTTPError as ex: + return await self._build_response( + text=f"Error connecting to {site}: {ex}" + )
🧹 Nitpick comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
329-344: Inconsistent content-type normalization.Line 331 correctly normalizes the content-type by stripping parameters, but lines 336 and 339 still use the raw header value which may include charset or other parameters.
For example,
text/plain; charset=utf-8would fail thestartswith("text")check on line 336 if the header included leading whitespace after the semicolon in an unusual format, andapplication/www-urlform-encodecomparison on line 339 would fail if charset is appended.♻️ Suggested fix for consistency
elif hdrs.CONTENT_TYPE in part.headers: content_type = part.headers.get(hdrs.CONTENT_TYPE, "") - if content_type.split(";", 1)[0].strip() == "application/json": + mime_type = content_type.split(";", 1)[0].strip() + if mime_type == "application/json": part_data: Optional[ Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes] ] = await part.json() writer.append_json(part_data) - elif part.headers[hdrs.CONTENT_TYPE].startswith("text"): + elif mime_type.startswith("text"): part_data = await part.text() writer.append(part_data) - elif part.headers[hdrs.CONTENT_TYPE] == "application/www-urlform-encode": + elif mime_type == "application/x-www-form-urlencoded": part_data = await part.form() writer.append_form(part_data)Note: Line 339 also has a typo —
application/www-urlform-encodeshould likely beapplication/x-www-form-urlencoded.
Will do, thanks. |
329-344: Inconsistent content-type normalization
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
480-505:⚠️ Potential issue | 🟠 MajorUnreachable exception handler: reorder to catch
TimeoutExceptionbeforeHTTPError.
httpx.TimeoutExceptioninherits fromhttpx.HTTPError(viaRequestError→TransportError). TheHTTPErrorhandler at line 488 will catch all timeout exceptions before reaching the dedicated handler at line 492, making that code unreachable. This prevents the timeout-specific logging and detailed error message from ever executing.Move the
TimeoutExceptionhandler beforeHTTPError:Proposed fix
except httpx.TooManyRedirects as ex: return await self._build_response( text=f"Error connecting to {site}; too many redirects: {ex}" ) + except httpx.TimeoutException as ex: + _LOGGER.warning( + "Timeout during proxy request to %s: %s", + site, + ex.__class__.__name__, + ) + return await self._build_response( + text=( + "Timed out while contacting the service during login.\n\n" + "This is usually caused by slow or blocked network access. " + "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, " + "and that the service endpoint is reachable from this host." + ) + ) except httpx.HTTPError as ex: return await self._build_response( text=f"Error connecting to {site}: {ex}" ) - except httpx.TimeoutException as ex: - _LOGGER.warning( - "Timeout during proxy request to %s: %s", - site, - ex.__class__.__name__, - ) - return await self._build_response( - text=( - "Timed out while contacting the service during login.\n\n" - "This is usually caused by slow or blocked network access. " - "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, " - "and that the service endpoint is reachable from this host." - ) - )
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
329-345:⚠️ Potential issue | 🟡 MinorGuard multipart JSON parsing against malformed data.
part.json()can raiseJSONDecodeErrorandValueErrorwhen encountering invalid JSON; unhandled exceptions would abort the request. Add error handling with fallback topart.read().🛡️ Suggested fix
if mime_type == "application/json": + try: - part_data: Optional[ - Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes] - ] = await part.json() - writer.append_json(part_data) + part_data: Optional[ + Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes] + ] = await part.json() + writer.append_json(part_data) + except (JSONDecodeError, ValueError): + part_data = await part.read() + writer.append(part_data)
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 132: Fix the typos in the docstring for the setter described by ":setter:
value (Dict[Text, Any])" in auth_capture_proxy.py: change "oof" to "of" and
"dictioary" to "dictionary", and tidy punctuation/spacing so the sentence reads
clearly (e.g., "takes a httpx.Response, a dictionary of post variables, and a
dictionary of query variables and returns a URL or string"). Also apply the same
corrections to the other occurrence referenced around the later docstring (the
one noted at the second occurrence).
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
151-151:⚠️ Potential issue | 🟡 MinorDocstring typos in modifiers setter.
Tidy “shoud/couroutine” → “should/coroutine” for clarity.
✏️ Proposed fix
- :setter: value (Dict[Text, Dict[Text, Callable]): A nested dictionary of modifiers. The key shoud be a MIME type and the value should be a dictionary of modifiers for that MIME type where the key should be the name of the modifier and the value should be a function or couroutine that takes a string and returns a modified string. If parameters are necessary, functools.partial should be used. See :mod:`authcaptureproxy.examples.modifiers` for examples. + :setter: value (Dict[Text, Dict[Text, Callable]): A nested dictionary of modifiers. The key should be a MIME type and the value should be a dictionary of modifiers for that MIME type where the key should be the name of the modifier and the value should be a function or coroutine that takes a string and returns a modified string. If parameters are necessary, functools.partial should be used. See :mod:`authcaptureproxy.examples.modifiers` for examples.
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 6: Add regression tests that reproduce the header-duplication and
JSON-parse crash: write failing TDD tests first that call the proxy's
request-handling path (e.g., the function/method that builds/forwards
headers—look for functions like handle_request or forward_request in
auth_capture_proxy.py) and assert that repeated proxied requests do not increase
header count or duplicate header entries, and separate tests that pass
invalid/malformed JSON bodies to the JSON parsing logic (the code path using
JSONDecodeError/import) to ensure it is guarded and does not raise—expect a safe
fallback value or empty dict. Implement tests that simulate multiple sequential
requests to validate no header growth and a test passing malformed JSON to the
parse_json_body or equivalent to confirm the JSONDecodeError branch is
exercised.
- Around line 329-343: The JSON parsing call await part.json() inside the
content-type branch (mime_type == "application/json") can raise more than
JSONDecodeError; wrap the await part.json() in a try/except that catches a
broader set (either Exception or the specific additional exceptions
json.JSONDecodeError, UnicodeDecodeError, RuntimeError, binascii.Error) and on
failure fall back to reading the part as text (await part.text()) or a safe
fallback and then call writer.append or writer.append_json with the fallback
value; update the block around part.json(), writer.append_json and mime_type
handling to ensure malformed/encoded JSON does not raise and break processing.
Uses “text first, bytes last” fallback for best match to the stated requirement (and better debugging/user experience).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 106: The annotation for self._background_tasks uses the builtin generic
set[asyncio.Task] which is inconsistent with the file's use of typing generics;
change the annotation to use Set[asyncio.Task] from typing (or alternatively
convert other annotations to builtins consistently), i.e., update the
declaration of self._background_tasks to use Set imported from typing while
keeping the asyncio.Task element type to match the surrounding annotations and
imports.
🧹 Nitpick comments (1)
authcaptureproxy/auth_capture_proxy.py (1)
338-345: Broad exception handling is intentional but consider documenting.Static analysis flags
except Exception(BLE001). This is intentional here aspart.json()can raise multiple exception types (JSONDecodeError, UnicodeDecodeError, RuntimeError, binascii.Error). A brief inline comment explaining the rationale would silence future questions.💡 Optional documentation
except Exception: - # Best-effort fallback: text, then bytes + # Best-effort fallback: text, then bytes + # Broad catch needed: part.json() can raise JSONDecodeError, + # UnicodeDecodeError, RuntimeError, or binascii.Error try: part_text = await part.text() writer.append(part_text) except Exception: + # Final fallback for any text decoding failures part_data = await part.read() writer.append(part_data)
Line 106 used builtin generic set[asyncio.Task], but the file uses Dict, List, and other generics from typing elsewhere (lines 86-103).
|
@coderabbitai, please draft the failing test cases and expected fixtures. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 33641 🏁 Script executed: Length of output: 167 🏁 Script executed: Length of output: 17900 🏁 Script executed: Length of output: 20
Here are the failing test cases and fixtures to demonstrate the bugs: Task: Add failing test cases for header duplication and JSON parsing bugsTest File Structure
The test file includes:
Expected Behavior:
💡 Use |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/test_regression_headers_and_json_parsing.py`:
- Around line 67-113: The proxy fixture has mixed 2-space and 4-space
indentation causing a parser IndentationError; reformat the entire body of the
proxy function to use consistent 4-space indentation (including the nested class
Proxy declaration, instantiation of p, and all monkeypatch.setattr and attribute
assignments) so the function, class Proxy, p = Proxy(...), and subsequent
monkeypatch.setattr(...) / p._tests / p._modifiers lines are indented uniformly
and parse correctly.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/test_regression_headers_and_json_parsing.py`:
- Around line 36-41: The tests assign to the read-only aiohttp Request.has_body
property (ineffective and may error); in the helper _make_request and in tests
referencing make_mocked_request, remove all manual assignments like req.has_body
= True (and any # type: ignore[attr-defined] comments used to suppress that),
and instead rely on passing the body via the payload parameter to
make_mocked_request so aiohttp computes has_body automatically; if you need to
make body presence explicit, set a Content-Length header in _make_request (e.g.,
when body is non-empty) rather than assigning has_body.
…s_body is read-only).
make_mocked_request(..., payload=body_bytes) is giving aiohttp.web.Request a _payload that is literally a bytes object. Then request.has_body tries to call: ``` self._payload.at_eof() ``` …but bytes has no at_eof(), so you get: AttributeError: 'bytes' object has no attribute 'at_eof' ✅ Fix: pass a real aiohttp StreamReader-like payload, not raw bytes. The cleanest way in tests is to create an aiohttp.streams.StreamReader, feed it the bytes, and pass that as payload=. Because StreamReader needs a running loop, we’ll make _make_request async and call it with await.
… argument: 'limit' In the aiohttp version used in CI (it downgrades to aiohttp==3.9.0), StreamReader’s constructor requires a limit argument, so CI raises: ``` TypeError: StreamReader.__init__() missing 1 required positional argument: 'limit' ``` All 5 tests then fail immediately when trying to create the request.
…rules. Added terminating "." to the first line in docstring
```
+ from typing import Any
import pytest
import httpx
...
class DummyAsyncClient:
def __init__(self) -> None:
- self.calls = []
+ self.calls: list[dict[str, Any]] = []
self.cookies = type("Cookies", (), {"jar": {}})()
```
Fixes runaway header duplication during auth flow by isolating header mutation and guarding JSON parsing.
This PR adds regression tests covering cross-request header contamination caused by in-place mutation of request headers during JSON POST handling in all_handler(). The tests fail on the previous behavior and pass once headers are copied before sanitization, ensuring per-request isolation and preventing invalid header sets across requests.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.