fix(oauth): parse form-encoded responses in refresh_token() (#4259)

kimsehwan96 · jonpspri · web-flow · commit d78588208670 · 2026-04-17T08:42:49.000+01:00
* fix(oauth): parse form-encoded responses in refresh_token()

Token endpoints that respond with application/x-www-form-urlencoded
(e.g. GitHub's /login/oauth/access_token) caused response.json() to
raise JSONDecodeError, silently failing token refresh and driving
gateways offline after access-token expiry.

Apply the same content-type branch used in the three other token
fetching paths in this module (_client_credentials_flow, _password_flow,
and the two authorization code exchanges) so refresh_token() handles
both JSON and form-encoded responses.

Add unit tests covering:
- happy path for form-encoded responses (asserts response.json() is
  not invoked to prove the form-encoded branch is actually taken)
- negative path for unexpected content-type (raw fallback then
  OAuthError for missing access_token)
- regression guard: set content-type on the existing success test

Signed-off-by: kimsehwan96 &lt;sktpghks138@gmail.com&gt;

* refactor(oauth): consolidate token-response parsing and prevent secret leaks

Pull the JSON / form-encoded response parsing duplicated across five
token-fetching paths (_client_credentials_flow, _password_flow, both
authorization-code exchanges, and refresh_token) into a single
_parse_token_response() helper, and add a _redact_token_response()
helper that all five "No access_token" OAuthError sites and the
refresh_token 4xx path use to bound what reaches logs and exception
messages.

Parsing improvements (apply to all five paths via the helper):

- Treat the content-type header case-insensitively per RFC 7231 §3.1.1.1
  so providers sending "Application/X-WWW-Form-Urlencoded" no longer
  fall through to the JSON branch.
- URL-decode form-encoded values via urllib.parse.parse_qsl, so a value
  like "scope=repo%3Astatus" is delivered to callers as "repo:status".
- Narrow the parse-failure except clause from bare Exception to
  ValueError (covers JSONDecodeError and UnicodeDecodeError) so that
  unrelated failures such as httpx.ResponseNotRead surface instead of
  being silently captured as raw_response.
- Log parse failures with diagnostic context (status, content-type,
  body bytes) and exc_info=True to aid operator debugging.
- Detect garbage form bodies: parse_qsl runs without keep_blank_values
  so an HTML page with no "=" parses to {} and falls through to the
  raw_response capture, and any non-empty parse whose keys aren't
  OAuth-parameter shaped (e.g. &lt;meta charset=...&gt;) is rejected the
  same way.
- Tolerate undecodable bodies via a new _safe_response_text() helper
  that returns "&lt;undecodable body, N bytes&gt;" when response.text raises
  UnicodeDecodeError or LookupError.

Secret-leak prevention (everywhere a token-shaped dict reaches an
OAuthError or log line):

- Redact known credential-bearing keys (access_token, refresh_token,
  id_token, client_secret, password) to "[REDACTED]".
- Scrub URL/form-style "&lt;key&gt;=&lt;value&gt;" patterns inline so that secrets
  embedded in HTML hrefs, form actions, or stack traces are neutered
  even when the surrounding string fits inside the truncation window.
- Cap any string value at 256 chars with a "... [truncated, N chars
  total]" marker so HTML error pages and verbose stack traces don't
  swamp logs.
- Route the refresh_token 4xx error path through the same parse +
  redact pipeline (it previously surfaced raw response.text in both
  the OAuthError and the failure-status warning).
- refresh_token's "No access_token" OAuthError now matches the sibling
  flows by echoing the (redacted) parsed payload for diagnostics.

Tests cover mixed-case content-type, URL-decoded values, JSON-branch
JSONDecodeError + UnicodeDecodeError fallbacks, missing content-type
header, empty form body, garbage form bodies, undecodable bodies,
sensitive-key redaction, raw_response truncation, URL-param scrubbing,
4xx echoed-secret redaction, and 4xx oversized HTML truncation. The
two pre-existing 4xx tests are tightened to assert the parsed error
field appears in the OAuthError so a regression that loses the body
would be caught.

Signed-off-by: Jonathan Springer &lt;jps@s390x.com&gt;

---------

Signed-off-by: kimsehwan96 &lt;sktpghks138@gmail.com&gt;
Signed-off-by: Jonathan Springer &lt;jps@s390x.com&gt;
Co-authored-by: Jonathan Springer &lt;jps@s390x.com&gt;
diff --git a/.secrets.baseline b/.secrets.baseline
@@ -3,7 +3,7 @@
     "files": "^.secrets.baseline|package-lock.json|Cargo.lock|scripts/sign_image.sh|scripts/zap|sonar-project.properties|uv.lock|go.sum|mcpgateway/sri_hashes.json|^.secrets.baseline$",
     "lines": null
   },
-  "generated_at": "2026-04-16T20:31:39Z",
+  "generated_at": "2026-04-17T07:05:17Z",
   "plugins_used": [
     {
       "name": "AWSKeyDetector"
@@ -5192,15 +5192,15 @@
         "hashed_secret": "f2b14f68eb995facb3a1c35287b778d5bd785511",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 105,
+        "line_number": 106,
         "type": "Secret Keyword",
         "verified_result": null
       },
       {
         "hashed_secret": "819ef87051ee2837fefbb462d846b8d282d3b756",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 108,
+        "line_number": 109,
         "type": "Secret Keyword",
         "verified_result": null
       }
@@ -8388,39 +8388,39 @@
         "hashed_secret": "34e587c8f9ba011db386d719d66ffe3cfaea5447",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 399,
+        "line_number": 372,
         "type": "Secret Keyword",
         "verified_result": null
       },
       {
         "hashed_secret": "a0f4ea7d91495df92bbac2e2149dfb850fe81396",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 419,
+        "line_number": 391,
         "type": "Secret Keyword",
         "verified_result": null
       },
       {
         "hashed_secret": "920a25ef686c4f7ca6ad23dd109d3ad653161832",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 458,
+        "line_number": 640,
         "type": "Secret Keyword",
         "verified_result": null
       },
       {
         "hashed_secret": "a62f2225bf70bfaccbc7f1ef2a397836717377de",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 535,
+        "line_number": 716,
         "type": "Secret Keyword",
         "verified_result": null
       },
       {
         "hashed_secret": "355e7ab792a8403301eb0732bab9d2b3950ac048",
         "is_secret": false,
         "is_verified": false,
-        "line_number": 538,
+        "line_number": 719,
         "type": "Secret Keyword",
         "verified_result": null
       }
diff --git a/mcpgateway/services/oauth_manager.py b/mcpgateway/services/oauth_manager.py
@@ -17,9 +17,10 @@
 from datetime import datetime, timedelta, timezone
 import hashlib
 import logging
+import re
 import secrets
 from typing import Any, Dict, Optional
-from urllib.parse import urlparse
+from urllib.parse import parse_qsl, urlparse
 
 # Third-Party
 import httpx
@@ -236,6 +237,142 @@ async def _prepare_runtime_credentials(credentials: Dict[str, Any], flow_name: s
             logger.warning("Failed to prepare runtime OAuth credentials for %s flow: %s", flow_name, exc)
         return credentials
 
+    # Keys whose values must never be echoed in error messages or logs.
+    _SENSITIVE_TOKEN_KEYS = frozenset({"access_token", "refresh_token", "id_token", "client_secret", "password"})
+
+    # Cap on raw_response excerpts and any other string values surfaced via
+    # OAuthError / logs (defense-in-depth against unbounded provider bodies).
+    _MAX_RAW_RESPONSE_LEN = 256
+
+    # OAuth parameter names per RFC 6749 are token-shaped (alphanumerics plus
+    # a few separators). A parsed key outside this shape means parse_qsl picked
+    # garbage out of an HTML body (e.g. <meta charset="utf-8">).
+    _OAUTH_KEY_RE = re.compile(r"^[A-Za-z0-9_.\-]+$")
+
+    # Scrub URL/form-style ``key=value`` leaks inside arbitrary strings so that
+    # secrets embedded in HTML error pages or stack traces don't survive the
+    # length cap (the value can fit entirely inside the truncation window).
+    _LEAKY_PARAM_RE = re.compile(
+        r"(?i)\b(access_token|refresh_token|id_token|token|code|secret|key|password|api[_-]?key)=[^&\s\"'<>]+",
+    )
+
+    @staticmethod
+    def _safe_response_text(response: httpx.Response) -> str:
+        """Return ``response.text`` or a placeholder if the body is undecodable.
+
+        ``httpx.Response.text`` raises ``UnicodeDecodeError`` (or ``LookupError``
+        for an unknown charset) when the body bytes don't match the declared
+        encoding. The caller wants a string for diagnostics, not a crash.
+
+        Args:
+            response: HTTP response whose body we want as text.
+
+        Returns:
+            Decoded body, or a ``"<undecodable body, N bytes>"`` placeholder.
+        """
+        try:
+            return response.text
+        except (ValueError, LookupError):
+            return f"<undecodable body, {len(response.content)} bytes>"
+
+    @staticmethod
+    def _parse_token_response(response: httpx.Response) -> Dict[str, Any]:
+        """Parse an OAuth token response that may be JSON or form-encoded.
+
+        Per RFC 7231 §3.1.1.1, media type tokens are case-insensitive.
+        Form-encoded values are URL-decoded via ``urllib.parse.parse_qsl``.
+        Failures fall back to ``{"raw_response": <text>}`` so operators see
+        what the provider actually sent, in three cases: a JSON parse error
+        (``ValueError`` covering ``json.JSONDecodeError`` and
+        ``UnicodeDecodeError``), ``parse_qsl`` returning ``{}`` from a
+        non-empty body (e.g. an HTML error page served with a form-encoded
+        content-type), and ``response.text`` failing to decode the body
+        bytes.
+
+        Args:
+            response: HTTP response from the token endpoint.
+
+        Returns:
+            Parsed token payload, or ``{"raw_response": <text>}`` when the
+            body is neither valid JSON nor parseable as form-encoded.
+        """
+        raw_content_type = response.headers.get("content-type", "")
+        content_type = raw_content_type.lower()
+
+        if "application/x-www-form-urlencoded" in content_type:
+            text = OAuthManager._safe_response_text(response)
+            # parse_qsl drops malformed pairs (no "="); we deliberately do not
+            # set keep_blank_values=True so that garbage like an HTML error page
+            # parses to {} and falls through to the raw_response capture below.
+            parsed = dict(parse_qsl(text))
+            # An HTML body that happens to contain "=" (e.g. <meta charset="utf-8">)
+            # parses to a non-empty dict with garbage keys. Reject anything whose
+            # keys aren't OAuth parameter shaped so the leak is bounded by the
+            # raw_response truncation in _redact_token_response.
+            if parsed and not all(OAuthManager._OAUTH_KEY_RE.match(k) for k in parsed):
+                return {"raw_response": text}
+            if not parsed and text:
+                return {"raw_response": text}
+            return parsed
+
+        try:
+            return response.json()
+        except ValueError as exc:
+            # ValueError covers json.JSONDecodeError (malformed JSON) and
+            # UnicodeDecodeError (bad charset). Narrower than bare Exception,
+            # which would swallow httpx.ResponseNotRead, MemoryError, etc.
+            text = OAuthManager._safe_response_text(response)
+            logger.warning(
+                "Failed to parse OAuth token response as JSON: %s (status=%s, content-type=%r, body_bytes=%d)",
+                exc,
+                response.status_code,
+                raw_content_type,
+                len(response.content),
+                exc_info=True,
+            )
+            return {"raw_response": text}
+
+    @staticmethod
+    def _redact_token_response(token_response: Dict[str, Any]) -> Dict[str, Any]:
+        """Return a log/error-safe copy of a token response.
+
+        Three layers of protection so that misbehaving providers, HTML error
+        pages, and verbose stack traces don't leak secrets via OAuthError or
+        log lines:
+
+        1. Replace values for known credential-bearing keys with
+           ``"[REDACTED]"``.
+        2. Scrub URL/form-style ``<key>=<secret>`` patterns inside any string
+           value (HTML hrefs, form actions, stack traces).
+        3. Cap any string value at ``_MAX_RAW_RESPONSE_LEN`` chars with a
+           ``... [truncated, N chars total]`` marker.
+
+        Args:
+            token_response: Parsed token payload (possibly containing tokens
+                or a captured raw body).
+
+        Returns:
+            New dict safe to interpolate into log lines and exception messages.
+        """
+        cap = OAuthManager._MAX_RAW_RESPONSE_LEN
+        redacted: Dict[str, Any] = {}
+        for key, value in token_response.items():
+            if key in OAuthManager._SENSITIVE_TOKEN_KEYS:
+                redacted[key] = "[REDACTED]"
+                continue
+            if isinstance(value, str):
+                # Scrub URL/form-style "<key>=<secret>" patterns first (HTML
+                # bodies often carry tokens in href / form action attributes
+                # that fit entirely inside the truncation window), then cap.
+                scrubbed = OAuthManager._LEAKY_PARAM_RE.sub(lambda m: f"{m.group(1)}=[REDACTED]", value)
+                if len(scrubbed) > cap:
+                    redacted[key] = f"{scrubbed[:cap]}... [truncated, {len(value)} chars total]"
+                else:
+                    redacted[key] = scrubbed
+            else:
+                redacted[key] = value
+        return redacted
+
     async def _client_credentials_flow(self, credentials: Dict[str, Any]) -> str:
         """Machine-to-machine authentication using client credentials.
 
@@ -271,28 +408,10 @@ async def _client_credentials_flow(self, credentials: Dict[str, Any]) -> str:
                 response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
                 response.raise_for_status()
 
-                # GitHub returns form-encoded responses, not JSON
-                content_type = response.headers.get("content-type", "")
-                if "application/x-www-form-urlencoded" in content_type:
-                    # Parse form-encoded response
-                    text_response = response.text
-                    token_response = {}
-                    for pair in text_response.split("&"):
-                        if "=" in pair:
-                            key, value = pair.split("=", 1)
-                            token_response[key] = value
-                else:
-                    # Try JSON response
-                    try:
-                        token_response = response.json()
-                    except Exception as e:
-                        logger.warning(f"Failed to parse JSON response: {e}")
-                        # Fallback to text parsing
-                        text_response = response.text
-                        token_response = {"raw_response": text_response}
+                token_response = self._parse_token_response(response)
 
                 if "access_token" not in token_response:
-                    raise OAuthError(f"No access_token in response: {token_response}")
+                    raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
 
                 logger.info("""Successfully obtained access token via client credentials""")
                 return token_response["access_token"]
@@ -357,28 +476,10 @@ async def _password_flow(self, credentials: Dict[str, Any]) -> str:
                 response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
                 response.raise_for_status()
 
-                # Handle both JSON and form-encoded responses
-                content_type = response.headers.get("content-type", "")
-                if "application/x-www-form-urlencoded" in content_type:
-                    # Parse form-encoded response
-                    text_response = response.text
-                    token_response = {}
-                    for pair in text_response.split("&"):
-                        if "=" in pair:
-                            key, value = pair.split("=", 1)
-                            token_response[key] = value
-                else:
-                    # Try JSON response
-                    try:
-                        token_response = response.json()
-                    except Exception as e:
-                        logger.warning(f"Failed to parse JSON response: {e}")
-                        # Fallback to text parsing
-                        text_response = response.text
-                        token_response = {"raw_response": text_response}
+                token_response = self._parse_token_response(response)
 
                 if "access_token" not in token_response:
-                    raise OAuthError(f"No access_token in response: {token_response}")
+                    raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
 
                 logger.info("Successfully obtained access token via password grant")
                 return token_response["access_token"]
@@ -455,28 +556,10 @@ async def exchange_code_for_token(self, credentials: Dict[str, Any], code: str,
                 response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
                 response.raise_for_status()
 
-                # GitHub returns form-encoded responses, not JSON
-                content_type = response.headers.get("content-type", "")
-                if "application/x-www-form-urlencoded" in content_type:
-                    # Parse form-encoded response
-                    text_response = response.text
-                    token_response = {}
-                    for pair in text_response.split("&"):
-                        if "=" in pair:
-                            key, value = pair.split("=", 1)
-                            token_response[key] = value
-                else:
-                    # Try JSON response
-                    try:
-                        token_response = response.json()
-                    except Exception as e:
-                        logger.warning(f"Failed to parse JSON response: {e}")
-                        # Fallback to text parsing
-                        text_response = response.text
-                        token_response = {"raw_response": text_response}
+                token_response = self._parse_token_response(response)
 
                 if "access_token" not in token_response:
-                    raise OAuthError(f"No access_token in response: {token_response}")
+                    raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
 
                 logger.info("""Successfully exchanged authorization code for access token""")
                 return token_response["access_token"]
@@ -1232,28 +1315,10 @@ async def _exchange_code_for_tokens(self, credentials: Dict[str, Any], code: str
                 response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
                 response.raise_for_status()
 
-                # GitHub returns form-encoded responses, not JSON
-                content_type = response.headers.get("content-type", "")
-                if "application/x-www-form-urlencoded" in content_type:
-                    # Parse form-encoded response
-                    text_response = response.text
-                    token_response = {}
-                    for pair in text_response.split("&"):
-                        if "=" in pair:
-                            key, value = pair.split("=", 1)
-                            token_response[key] = value
-                else:
-                    # Try JSON response
-                    try:
-                        token_response = response.json()
-                    except Exception as e:
-                        logger.warning(f"Failed to parse JSON response: {e}")
-                        # Fallback to text parsing
-                        text_response = response.text
-                        token_response = {"raw_response": text_response}
+                token_response = self._parse_token_response(response)
 
                 if "access_token" not in token_response:
-                    raise OAuthError(f"No access_token in response: {token_response}")
+                    raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
 
                 logger.info("""Successfully exchanged authorization code for tokens""")
                 return token_response
@@ -1326,20 +1391,23 @@ async def refresh_token(self, refresh_token: str, credentials: Dict[str, Any]) -
                 client = await self._get_client()
                 response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
                 if response.status_code == 200:
-                    token_response = response.json()
+                    token_response = self._parse_token_response(response)
 
                     # Validate required fields
                     if "access_token" not in token_response:
-                        raise OAuthError("No access_token in refresh response")
+                        raise OAuthError(f"No access_token in refresh response: {self._redact_token_response(token_response)}")
 
                     logger.info("Successfully refreshed OAuth token")
                     return token_response
 
-                error_text = response.text
-                # If we get a 400/401, the refresh token is likely invalid
+                # Bound and redact the body before surfacing it. Some providers echo
+                # request parameters (including refresh_token / client_secret) in error
+                # responses, and HTML error pages can be unbounded — both leak via logs
+                # and OAuthError messages without this scrub.
+                error_payload = self._redact_token_response(self._parse_token_response(response))
                 if response.status_code in [400, 401]:
-                    raise OAuthError(f"Refresh token invalid or expired: {error_text}")
-                logger.warning(f"Token refresh failed with status {response.status_code}: {error_text}")
+                    raise OAuthError(f"Refresh token invalid or expired: {error_payload}")
+                logger.warning("Token refresh failed with status %s: %s", response.status_code, error_payload)
 
             except httpx.HTTPError as e:
                 logger.warning(f"Token refresh attempt {attempt + 1} failed: {str(e)}")
diff --git a/tests/unit/mcpgateway/services/test_oauth_manager.py b/tests/unit/mcpgateway/services/test_oauth_manager.py