fix: header duplication & JSON parsing by danielbrunt57 · Pull Request #37 · alandtse/auth_capture_proxy

danielbrunt57 · 2026-01-31T08:26:40Z

Fixes runaway header duplication during auth flow by isolating header mutation and guarding JSON parsing.

This PR adds regression tests covering cross-request header contamination caused by in-place mutation of request headers during JSON POST handling in all_handler(). The tests fail on the previous behavior and pass once headers are copied before sanitization, ensuring per-request isolation and preventing invalid header sets across requests.

Fixes issue fix: prevent header duplication & guard JSON parsing in auth proxy #33
Should fix [Bug]: 500 Internal Server Error Server got itself in trouble

Summary by CodeRabbit

New Features
- Background task tracking for stop operations; optional custom session factory and header-preservation support
Bug Fixes
- Only parse JSON when content-type indicates JSON; tolerate JSON decode failures
- Prevent cross-request header mutation; ensure consistent per-request header handling
- Improved handling and messaging for network, timeout, redirect, multipart scenarios
Tests
- Added regression tests covering header mutation and JSON parsing behavior
Refactor
- Consolidated error handling and background-task scheduling mechanisms

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Refactor error handling for connection issues in auth_capture_proxy.py to provide more specific error messages.

- Fix regex pattern for path normalization: r"/ +" should be r"/+". - Add an instance variable to hold background tasks.

coderabbitai · 2026-01-31T08:28:08Z

Walkthrough

Replaced aiohttp error handling with httpx equivalents, added guarded JSON decoding based on Content-Type, cloned request headers per-request into req_headers to avoid in-place mutation, introduced session_factory/preserve_headers constructor args and a _background_tasks set for scheduled stop tasks, and added regression tests for headers/JSON parsing.

Changes

Cohort / File(s)	Summary
Core proxy `authcaptureproxy/auth_capture_proxy.py`	Constructor signature updated to accept `session_factory` and `preserve_headers`; added `_background_tasks: Set[asyncio.Task]`; stop endpoint schedules tracked background task; minor URL/docstring tweaks.
Request processing & headers `authcaptureproxy/auth_capture_proxy.py`	Request headers are cloned into a local `req_headers` to prevent cross-request in-place mutation; all outbound httpx calls and proxy-specific header removals use `req_headers`.
Body parsing & multipart `authcaptureproxy/auth_capture_proxy.py`	JSON decoding is gated by Content-Type (exact `application/json` or any type ending with `+json`); JSONDecodeError/ValueError fallbacks to text/bytes; multipart handling uses local header/body copies and preserves application/json fallback logic.
Error handling `authcaptureproxy/auth_capture_proxy.py`	Replaced aiohttp exception handling with httpx equivalents (`ConnectError`, `TooManyRedirects`, generic `httpx.HTTPError`) and adjusted error messages.
Tests `tests/test_regression_headers_and_json_parsing.py`	New regression tests validating header copy semantics and conditional JSON parsing; adds `DummyAsyncClient`, `_make_request`, `proxy` fixture, and multiple test cases covering JSON and non-JSON content types and header contamination.
Metadata `manifest_file`, `requirements.txt`, `pyproject.toml`	Updated packaging/dependency metadata to reflect httpx migration and related entries.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Proxy as Proxy\n(auth_capture_proxy)
  participant HTTPX as httpx.AsyncClient
  participant Upstream as Upstream\nHost

  Client->>Proxy: HTTP request
  Proxy->>Proxy: clone headers -> req_headers\ninspect Content-Type
  alt content-type is application/json or endswith +json
    Proxy->>Proxy: attempt req_body.json() (guarded)\non JSONDecodeError/ValueError -> use text/bytes
    Proxy->>HTTPX: forward request with req_headers + JSON/body
  else multipart/form or other
    Proxy->>HTTPX: forward request with req_headers + form/body
  end
  HTTPX->>Upstream: send request
  Upstream-->>HTTPX: response
  HTTPX-->>Proxy: response
  Proxy-->>Client: proxied response

  Note right of Proxy: stop endpoint schedules\nbackground stop task -> tracked in _background_tasks
  Proxy->>Proxy: background task closes session via HTTPX

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A header cloned, a JSON checked,
httpx calls neatly decked,
Background tasks queued to stop,
Mutable leaks caught—no slop,
Tests sing fixes, all correct.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: preventing header duplication through isolated mutation and adding guards around JSON parsing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

authcaptureproxy/auth_capture_proxy.py (1)
475-504: ⚠️ Potential issue | 🟠 Major

Remove duplicate TimeoutException handler (line 491 is unreachable).

httpx.TimeoutException is caught at line 479, making the second handler at line 491 dead code. The second handler has more helpful logging and a detailed error message that should be preserved. Remove the first handler to keep the more informative one.
🛠️ Proposed fix
-            except httpx.TimeoutException as ex:
-                return await self._build_response(
-                    text=f"Error connecting to {site}; request timed out: {ex}"
-                )

🧹 Nitpick comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

326-336: Normalize multipart part content-type before comparison.
Strict equality misses common values like application/json; charset=utf-8, so JSON parts can be skipped unintentionally.

♻️ Proposed fix

-                elif hdrs.CONTENT_TYPE in part.headers:
-                    if part.headers[hdrs.CONTENT_TYPE] == "application/json":
+                elif hdrs.CONTENT_TYPE in part.headers:
+                    content_type = part.headers.get(hdrs.CONTENT_TYPE, "")
+                    if content_type.split(";", 1)[0].strip() == "application/json":

authcaptureproxy/auth_capture_proxy.py

alandtse

Your AI is truncating things weirdly and I'm also not going to trust it given how low level this is. Please add tests cases for test driven development. The test should demonstrate the bug you're trying to fix (and without any patches should fail). Once you implement the fix, the test should then be fixed.

- 59: Fixed "Args: proxy_url (URL):" truncation generated by AI. - 326-336: Normalize multipart part content-type before comparison. - 475-504: Removed duplicate `TimeoutException` handler (line 491 is unreachable).

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

479-504: ⚠️ Potential issue | 🟠 Major

Unreachable TimeoutException handler.

httpx.TimeoutException is a subclass of httpx.HTTPError. The except httpx.HTTPError block on line 487 will catch timeout exceptions before they reach line 491, making the TimeoutException handler dead code.

🐛 Proposed fix: reorder exception handlers

             except httpx.ConnectError as ex:
                 return await self._build_response(
                     text=f"Error connecting to {site}; please retry: {ex}"
                 )
             except httpx.TooManyRedirects as ex:
                 return await self._build_response(
                     text=f"Error connecting to {site}; too many redirects: {ex}"
                 )
-            except httpx.HTTPError as ex:
-                return await self._build_response(
-                    text=f"Error connecting to {site}: {ex}"
-                )
             except httpx.TimeoutException as ex:
                 _LOGGER.warning(
                     "Timeout during proxy request to %s: %s",
                     site,
                     ex.__class__.__name__,
                 )
                 return await self._build_response(
                     text=(
                         "Timed out while contacting the service during login.\n\n"
                         "This is usually caused by slow or blocked network access. "
                         "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, "
                         "and that the service endpoint is reachable from this host."
                     )
                 )
+            except httpx.HTTPError as ex:
+                return await self._build_response(
+                    text=f"Error connecting to {site}: {ex}"
+                )

🧹 Nitpick comments (1)

authcaptureproxy/auth_capture_proxy.py (1)
329-344: Inconsistent content-type normalization.

Line 331 correctly normalizes the content-type by stripping parameters, but lines 336 and 339 still use the raw header value which may include charset or other parameters.

For example, text/plain; charset=utf-8 would fail the startswith("text") check on line 336 if the header included leading whitespace after the semicolon in an unusual format, and application/www-urlform-encode comparison on line 339 would fail if charset is appended.
♻️ Suggested fix for consistency
             elif hdrs.CONTENT_TYPE in part.headers:
                 content_type = part.headers.get(hdrs.CONTENT_TYPE, "")
-                if content_type.split(";", 1)[0].strip() == "application/json":
+                mime_type = content_type.split(";", 1)[0].strip()
+                if mime_type == "application/json":
                     part_data: Optional[
                         Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes]
                     ] = await part.json()
                     writer.append_json(part_data)
-                elif part.headers[hdrs.CONTENT_TYPE].startswith("text"):
+                elif mime_type.startswith("text"):
                     part_data = await part.text()
                     writer.append(part_data)
-                elif part.headers[hdrs.CONTENT_TYPE] == "application/www-urlform-encode":
+                elif mime_type == "application/x-www-form-urlencoded":
                     part_data = await part.form()
                     writer.append_form(part_data)
Note: Line 339 also has a typo — application/www-urlform-encode should likely be application/x-www-form-urlencoded.

danielbrunt57 · 2026-01-31T10:00:59Z

Your AI is truncating things weirdly and I'm also not going to trust it given how low level this is. Please add tests cases for test driven development. The test should demonstrate the bug you're trying to fix (and without any patches should fail). Once you implement the fix, the test should then be fixed.

Will do, thanks.

329-344: Inconsistent content-type normalization

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

480-505: ⚠️ Potential issue | 🟠 Major

Unreachable exception handler: reorder to catch TimeoutException before HTTPError.

httpx.TimeoutException inherits from httpx.HTTPError (via RequestError → TransportError). The HTTPError handler at line 488 will catch all timeout exceptions before reaching the dedicated handler at line 492, making that code unreachable. This prevents the timeout-specific logging and detailed error message from ever executing.

Move the TimeoutException handler before HTTPError:

Proposed fix

             except httpx.TooManyRedirects as ex:
                 return await self._build_response(
                     text=f"Error connecting to {site}; too many redirects: {ex}"
                 )
+            except httpx.TimeoutException as ex:
+                _LOGGER.warning(
+                    "Timeout during proxy request to %s: %s",
+                    site,
+                    ex.__class__.__name__,
+                )
+                return await self._build_response(
+                    text=(
+                        "Timed out while contacting the service during login.\n\n"
+                        "This is usually caused by slow or blocked network access. "
+                        "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, "
+                        "and that the service endpoint is reachable from this host."
+                    )
+                )
             except httpx.HTTPError as ex:
                 return await self._build_response(
                     text=f"Error connecting to {site}: {ex}"
                 )
-            except httpx.TimeoutException as ex:
-                _LOGGER.warning(
-                    "Timeout during proxy request to %s: %s",
-                    site,
-                    ex.__class__.__name__,
-                )
-                return await self._build_response(
-                    text=(
-                        "Timed out while contacting the service during login.\n\n"
-                        "This is usually caused by slow or blocked network access. "
-                        "Please retry, or check DNS resolution, firewall rules, proxy/VPN settings, "
-                        "and that the service endpoint is reachable from this host."
-                    )
-                )

… before HTTPError.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

329-345: ⚠️ Potential issue | 🟡 Minor

Guard multipart JSON parsing against malformed data.

part.json() can raise JSONDecodeError and ValueError when encountering invalid JSON; unhandled exceptions would abort the request. Add error handling with fallback to part.read().

🛡️ Suggested fix

                    if mime_type == "application/json":
+                        try:
-                        part_data: Optional[
-                            Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes]
-                        ] = await part.json()
-                        writer.append_json(part_data)
+                            part_data: Optional[
+                                Union[Text, Dict[Text, Any], List[Tuple[Text, Text]], bytes]
+                            ] = await part.json()
+                            writer.append_json(part_data)
+                        except (JSONDecodeError, ValueError):
+                            part_data = await part.read()
+                            writer.append(part_data)

🤖 Fix all issues with AI agents

In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 132: Fix the typos in the docstring for the setter described by ":setter:
value (Dict[Text, Any])" in auth_capture_proxy.py: change "oof" to "of" and
"dictioary" to "dictionary", and tidy punctuation/spacing so the sentence reads
clearly (e.g., "takes a httpx.Response, a dictionary of post variables, and a
dictionary of query variables and returns a URL or string"). Also apply the same
corrections to the other occurrence referenced around the later docstring (the
one noted at the second occurrence).

authcaptureproxy/auth_capture_proxy.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

151-151: ⚠️ Potential issue | 🟡 Minor

Docstring typos in modifiers setter.

Tidy “shoud/couroutine” → “should/coroutine” for clarity.

✏️ Proposed fix

-        :setter: value (Dict[Text, Dict[Text, Callable]): A nested dictionary of modifiers. The key shoud be a MIME type and the value should be a dictionary of modifiers for that MIME type where the key should be the name of the modifier and the value should be a function or couroutine that takes a string and returns a modified string. If parameters are necessary, functools.partial should be used. See :mod:`authcaptureproxy.examples.modifiers` for examples.
+        :setter: value (Dict[Text, Dict[Text, Callable]): A nested dictionary of modifiers. The key should be a MIME type and the value should be a dictionary of modifiers for that MIME type where the key should be the name of the modifier and the value should be a function or coroutine that takes a string and returns a modified string. If parameters are necessary, functools.partial should be used. See :mod:`authcaptureproxy.examples.modifiers` for examples.

🤖 Fix all issues with AI agents

In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 6: Add regression tests that reproduce the header-duplication and
JSON-parse crash: write failing TDD tests first that call the proxy's
request-handling path (e.g., the function/method that builds/forwards
headers—look for functions like handle_request or forward_request in
auth_capture_proxy.py) and assert that repeated proxied requests do not increase
header count or duplicate header entries, and separate tests that pass
invalid/malformed JSON bodies to the JSON parsing logic (the code path using
JSONDecodeError/import) to ensure it is guarded and does not raise—expect a safe
fallback value or empty dict. Implement tests that simulate multiple sequential
requests to validate no header growth and a test passing malformed JSON to the
parse_json_body or equivalent to confirm the JSONDecodeError branch is
exercised.
- Around line 329-343: The JSON parsing call await part.json() inside the
content-type branch (mime_type == "application/json") can raise more than
JSONDecodeError; wrap the await part.json() in a try/except that catches a
broader set (either Exception or the specific additional exceptions
json.JSONDecodeError, UnicodeDecodeError, RuntimeError, binascii.Error) and on
failure fall back to reading the part as text (await part.text()) or a safe
fallback and then call writer.append or writer.append_json with the fallback
value; update the block around part.json(), writer.append_json and mime_type
handling to ensure malformed/encoded JSON does not raise and break processing.

authcaptureproxy/auth_capture_proxy.py

Uses “text first, bytes last” fallback for best match to the stated requirement (and better debugging/user experience).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@authcaptureproxy/auth_capture_proxy.py`:
- Line 106: The annotation for self._background_tasks uses the builtin generic
set[asyncio.Task] which is inconsistent with the file's use of typing generics;
change the annotation to use Set[asyncio.Task] from typing (or alternatively
convert other annotations to builtins consistently), i.e., update the
declaration of self._background_tasks to use Set imported from typing while
keeping the asyncio.Task element type to match the surrounding annotations and
imports.

🧹 Nitpick comments (1)

authcaptureproxy/auth_capture_proxy.py (1)

338-345: Broad exception handling is intentional but consider documenting.

Static analysis flags except Exception (BLE001). This is intentional here as part.json() can raise multiple exception types (JSONDecodeError, UnicodeDecodeError, RuntimeError, binascii.Error). A brief inline comment explaining the rationale would silence future questions.

💡 Optional documentation

                     except Exception:
-                            # Best-effort fallback: text, then bytes
+                            # Best-effort fallback: text, then bytes
+                            # Broad catch needed: part.json() can raise JSONDecodeError,
+                            # UnicodeDecodeError, RuntimeError, or binascii.Error
                             try:
                                 part_text = await part.text()
                                 writer.append(part_text)
                             except Exception:
+                                # Final fallback for any text decoding failures
                                 part_data = await part.read()
                                 writer.append(part_data)

authcaptureproxy/auth_capture_proxy.py

Line 106 used builtin generic set[asyncio.Task], but the file uses Dict, List, and other generics from typing elsewhere (lines 86-103).

danielbrunt57 · 2026-01-31T11:58:38Z

@coderabbitai, please draft the failing test cases and expected fixtures.

coderabbitai · 2026-01-31T12:00:27Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main auth_capture_proxy.py file to understand the implementation
cat authcaptureproxy/auth_capture_proxy.py

Length of output: 33641

🏁 Script executed:

#!/bin/bash
# Find and examine existing test files to understand test structure
fd -e py test

Length of output: 167

🏁 Script executed:

#!/bin/bash
# Read existing test files to understand the testing patterns
find . -name "*test*.py" -o -name "test_*.py" | head -5 | xargs cat

Length of output: 17900

🏁 Script executed:

#!/bin/bash
# Check for conftest.py or pytest fixtures
fd conftest.py
cat tests/conftest.py 2>/dev/null || echo "No conftest.py found"

Length of output: 20

@danielbrunt57

Here are the failing test cases and fixtures to demonstrate the bugs:

Task: Add failing test cases for header duplication and JSON parsing bugs

Test File Structure

Create comprehensive test fixtures for mocking requests
Add test for header duplication across multiple JSON POST requests
Add test for JSON parsing errors on non-JSON content
Tests should fail before the fix is applied

The test file includes:

Header Duplication Tests:
- test_header_duplication_across_json_posts: Demonstrates that without req_headers = dict(headers) copy, headers get mutated when proxy-specific headers are removed for JSON posts, causing issues across multiple requests
- test_header_isolation_between_request_types: Verifies headers are independently processed between JSON and form requests
JSON Parsing Guard Tests:
- test_json_parsing_guards_on_non_json_content: Shows that without content-type checks, JSON parsing is attempted on form data, raising exceptions
- test_json_parsing_only_for_json_content_types: Verifies JSON parsing is only attempted for appropriate content types
- test_json_parsing_for_json_plus_suffix_content_types: Tests that +json suffix content types are correctly handled

Expected Behavior:

Before fix: Tests will fail because:
1. Headers are mutated in place during JSON posts (no req_headers copy)
2. JSON parsing attempted without content-type guards, causing exceptions
After fix: Tests will pass because:
1. req_headers = dict(headers) creates independent copy
2. JSON parsing guarded by content-type checks and try/except blocks

💡 Use @coderabbitai plan to create a plan for code changes

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/test_regression_headers_and_json_parsing.py`:
- Around line 67-113: The proxy fixture has mixed 2-space and 4-space
indentation causing a parser IndentationError; reformat the entire body of the
proxy function to use consistent 4-space indentation (including the nested class
Proxy declaration, instantiation of p, and all monkeypatch.setattr and attribute
assignments) so the function, class Proxy, p = Proxy(...), and subsequent
monkeypatch.setattr(...) / p._tests / p._modifiers lines are indented uniformly
and parse correctly.

tests/test_regression_headers_and_json_parsing.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/test_regression_headers_and_json_parsing.py`:
- Around line 36-41: The tests assign to the read-only aiohttp Request.has_body
property (ineffective and may error); in the helper _make_request and in tests
referencing make_mocked_request, remove all manual assignments like req.has_body
= True (and any # type: ignore[attr-defined] comments used to suppress that),
and instead rely on passing the body via the payload parameter to
make_mocked_request so aiohttp computes has_body automatically; if you need to
make body presence explicit, set a Content-Length header in _make_request (e.g.,
when body is non-empty) rather than assigning has_body.

tests/test_regression_headers_and_json_parsing.py

…s_body is read-only).

make_mocked_request(..., payload=body_bytes) is giving aiohttp.web.Request a _payload that is literally a bytes object. Then request.has_body tries to call: ``` self._payload.at_eof() ``` …but bytes has no at_eof(), so you get: AttributeError: 'bytes' object has no attribute 'at_eof' ✅ Fix: pass a real aiohttp StreamReader-like payload, not raw bytes. The cleanest way in tests is to create an aiohttp.streams.StreamReader, feed it the bytes, and pass that as payload=. Because StreamReader needs a running loop, we’ll make _make_request async and call it with await.

… argument: 'limit' In the aiohttp version used in CI (it downgrades to aiohttp==3.9.0), StreamReader’s constructor requires a limit argument, so CI raises: ``` TypeError: StreamReader.__init__() missing 1 required positional argument: 'limit' ``` All 5 tests then fail immediately when trying to create the request.

…rules. Added terminating "." to the first line in docstring

``` + from typing import Any import pytest import httpx ... class DummyAsyncClient: def __init__(self) -> None: - self.calls = [] + self.calls: list[dict[str, Any]] = [] self.cookies = type("Cookies", (), {"jar": {}})() ```

danielbrunt57 added 9 commits January 29, 2026 09:54

fix: guard JSON parsing, avoid header mutation, handle timeouts

8d270f9

Improve error handling in auth_capture_proxy.py

1a2fd76

Refactor error handling for connection issues in auth_capture_proxy.py to provide more specific error messages.

Update auth_capture_proxy.py with new implementation.

95277d5

Apply proposed fixes

b66fd37

- Fix regex pattern for path normalization: r"/ +" should be r"/+". - Add an instance variable to hold background tasks.

Use a more forgiving timeout for proxying browser-driven auth traffic

4f1d27f

Remove unused import: from datetime import timedelta

4711a88

Syntax error

905c4f0

Merge branch 'main' into Prevent-header-“growth”-by-mutation

9fe551e

Remove unused DEFAULT_HTTPX_TIMEOUT

20ae1ba