Skip to content

Commit d785882

Browse files
fix(oauth): parse form-encoded responses in refresh_token() (#4259)
* fix(oauth): parse form-encoded responses in refresh_token() Token endpoints that respond with application/x-www-form-urlencoded (e.g. GitHub's /login/oauth/access_token) caused response.json() to raise JSONDecodeError, silently failing token refresh and driving gateways offline after access-token expiry. Apply the same content-type branch used in the three other token fetching paths in this module (_client_credentials_flow, _password_flow, and the two authorization code exchanges) so refresh_token() handles both JSON and form-encoded responses. Add unit tests covering: - happy path for form-encoded responses (asserts response.json() is not invoked to prove the form-encoded branch is actually taken) - negative path for unexpected content-type (raw fallback then OAuthError for missing access_token) - regression guard: set content-type on the existing success test Signed-off-by: kimsehwan96 <sktpghks138@gmail.com> * refactor(oauth): consolidate token-response parsing and prevent secret leaks Pull the JSON / form-encoded response parsing duplicated across five token-fetching paths (_client_credentials_flow, _password_flow, both authorization-code exchanges, and refresh_token) into a single _parse_token_response() helper, and add a _redact_token_response() helper that all five "No access_token" OAuthError sites and the refresh_token 4xx path use to bound what reaches logs and exception messages. Parsing improvements (apply to all five paths via the helper): - Treat the content-type header case-insensitively per RFC 7231 §3.1.1.1 so providers sending "Application/X-WWW-Form-Urlencoded" no longer fall through to the JSON branch. - URL-decode form-encoded values via urllib.parse.parse_qsl, so a value like "scope=repo%3Astatus" is delivered to callers as "repo:status". - Narrow the parse-failure except clause from bare Exception to ValueError (covers JSONDecodeError and UnicodeDecodeError) so that unrelated failures such as httpx.ResponseNotRead surface instead of being silently captured as raw_response. - Log parse failures with diagnostic context (status, content-type, body bytes) and exc_info=True to aid operator debugging. - Detect garbage form bodies: parse_qsl runs without keep_blank_values so an HTML page with no "=" parses to {} and falls through to the raw_response capture, and any non-empty parse whose keys aren't OAuth-parameter shaped (e.g. <meta charset=...>) is rejected the same way. - Tolerate undecodable bodies via a new _safe_response_text() helper that returns "<undecodable body, N bytes>" when response.text raises UnicodeDecodeError or LookupError. Secret-leak prevention (everywhere a token-shaped dict reaches an OAuthError or log line): - Redact known credential-bearing keys (access_token, refresh_token, id_token, client_secret, password) to "[REDACTED]". - Scrub URL/form-style "<key>=<value>" patterns inline so that secrets embedded in HTML hrefs, form actions, or stack traces are neutered even when the surrounding string fits inside the truncation window. - Cap any string value at 256 chars with a "... [truncated, N chars total]" marker so HTML error pages and verbose stack traces don't swamp logs. - Route the refresh_token 4xx error path through the same parse + redact pipeline (it previously surfaced raw response.text in both the OAuthError and the failure-status warning). - refresh_token's "No access_token" OAuthError now matches the sibling flows by echoing the (redacted) parsed payload for diagnostics. Tests cover mixed-case content-type, URL-decoded values, JSON-branch JSONDecodeError + UnicodeDecodeError fallbacks, missing content-type header, empty form body, garbage form bodies, undecodable bodies, sensitive-key redaction, raw_response truncation, URL-param scrubbing, 4xx echoed-secret redaction, and 4xx oversized HTML truncation. The two pre-existing 4xx tests are tightened to assert the parsed error field appears in the OAuthError so a regression that loses the body would be caught. Signed-off-by: Jonathan Springer <jps@s390x.com> --------- Signed-off-by: kimsehwan96 <sktpghks138@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com>
1 parent 67e63ea commit d785882

File tree

3 files changed

+592
-166
lines changed

3 files changed

+592
-166
lines changed

.secrets.baseline

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"files": "^.secrets.baseline|package-lock.json|Cargo.lock|scripts/sign_image.sh|scripts/zap|sonar-project.properties|uv.lock|go.sum|mcpgateway/sri_hashes.json|^.secrets.baseline$",
44
"lines": null
55
},
6-
"generated_at": "2026-04-16T20:31:39Z",
6+
"generated_at": "2026-04-17T07:05:17Z",
77
"plugins_used": [
88
{
99
"name": "AWSKeyDetector"
@@ -5192,15 +5192,15 @@
51925192
"hashed_secret": "f2b14f68eb995facb3a1c35287b778d5bd785511",
51935193
"is_secret": false,
51945194
"is_verified": false,
5195-
"line_number": 105,
5195+
"line_number": 106,
51965196
"type": "Secret Keyword",
51975197
"verified_result": null
51985198
},
51995199
{
52005200
"hashed_secret": "819ef87051ee2837fefbb462d846b8d282d3b756",
52015201
"is_secret": false,
52025202
"is_verified": false,
5203-
"line_number": 108,
5203+
"line_number": 109,
52045204
"type": "Secret Keyword",
52055205
"verified_result": null
52065206
}
@@ -8388,39 +8388,39 @@
83888388
"hashed_secret": "34e587c8f9ba011db386d719d66ffe3cfaea5447",
83898389
"is_secret": false,
83908390
"is_verified": false,
8391-
"line_number": 399,
8391+
"line_number": 372,
83928392
"type": "Secret Keyword",
83938393
"verified_result": null
83948394
},
83958395
{
83968396
"hashed_secret": "a0f4ea7d91495df92bbac2e2149dfb850fe81396",
83978397
"is_secret": false,
83988398
"is_verified": false,
8399-
"line_number": 419,
8399+
"line_number": 391,
84008400
"type": "Secret Keyword",
84018401
"verified_result": null
84028402
},
84038403
{
84048404
"hashed_secret": "920a25ef686c4f7ca6ad23dd109d3ad653161832",
84058405
"is_secret": false,
84068406
"is_verified": false,
8407-
"line_number": 458,
8407+
"line_number": 640,
84088408
"type": "Secret Keyword",
84098409
"verified_result": null
84108410
},
84118411
{
84128412
"hashed_secret": "a62f2225bf70bfaccbc7f1ef2a397836717377de",
84138413
"is_secret": false,
84148414
"is_verified": false,
8415-
"line_number": 535,
8415+
"line_number": 716,
84168416
"type": "Secret Keyword",
84178417
"verified_result": null
84188418
},
84198419
{
84208420
"hashed_secret": "355e7ab792a8403301eb0732bab9d2b3950ac048",
84218421
"is_secret": false,
84228422
"is_verified": false,
8423-
"line_number": 538,
8423+
"line_number": 719,
84248424
"type": "Secret Keyword",
84258425
"verified_result": null
84268426
}

mcpgateway/services/oauth_manager.py

Lines changed: 155 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,10 @@
1717
from datetime import datetime, timedelta, timezone
1818
import hashlib
1919
import logging
20+
import re
2021
import secrets
2122
from typing import Any, Dict, Optional
22-
from urllib.parse import urlparse
23+
from urllib.parse import parse_qsl, urlparse
2324

2425
# Third-Party
2526
import httpx
@@ -236,6 +237,142 @@ async def _prepare_runtime_credentials(credentials: Dict[str, Any], flow_name: s
236237
logger.warning("Failed to prepare runtime OAuth credentials for %s flow: %s", flow_name, exc)
237238
return credentials
238239

240+
# Keys whose values must never be echoed in error messages or logs.
241+
_SENSITIVE_TOKEN_KEYS = frozenset({"access_token", "refresh_token", "id_token", "client_secret", "password"})
242+
243+
# Cap on raw_response excerpts and any other string values surfaced via
244+
# OAuthError / logs (defense-in-depth against unbounded provider bodies).
245+
_MAX_RAW_RESPONSE_LEN = 256
246+
247+
# OAuth parameter names per RFC 6749 are token-shaped (alphanumerics plus
248+
# a few separators). A parsed key outside this shape means parse_qsl picked
249+
# garbage out of an HTML body (e.g. <meta charset="utf-8">).
250+
_OAUTH_KEY_RE = re.compile(r"^[A-Za-z0-9_.\-]+$")
251+
252+
# Scrub URL/form-style ``key=value`` leaks inside arbitrary strings so that
253+
# secrets embedded in HTML error pages or stack traces don't survive the
254+
# length cap (the value can fit entirely inside the truncation window).
255+
_LEAKY_PARAM_RE = re.compile(
256+
r"(?i)\b(access_token|refresh_token|id_token|token|code|secret|key|password|api[_-]?key)=[^&\s\"'<>]+",
257+
)
258+
259+
@staticmethod
260+
def _safe_response_text(response: httpx.Response) -> str:
261+
"""Return ``response.text`` or a placeholder if the body is undecodable.
262+
263+
``httpx.Response.text`` raises ``UnicodeDecodeError`` (or ``LookupError``
264+
for an unknown charset) when the body bytes don't match the declared
265+
encoding. The caller wants a string for diagnostics, not a crash.
266+
267+
Args:
268+
response: HTTP response whose body we want as text.
269+
270+
Returns:
271+
Decoded body, or a ``"<undecodable body, N bytes>"`` placeholder.
272+
"""
273+
try:
274+
return response.text
275+
except (ValueError, LookupError):
276+
return f"<undecodable body, {len(response.content)} bytes>"
277+
278+
@staticmethod
279+
def _parse_token_response(response: httpx.Response) -> Dict[str, Any]:
280+
"""Parse an OAuth token response that may be JSON or form-encoded.
281+
282+
Per RFC 7231 §3.1.1.1, media type tokens are case-insensitive.
283+
Form-encoded values are URL-decoded via ``urllib.parse.parse_qsl``.
284+
Failures fall back to ``{"raw_response": <text>}`` so operators see
285+
what the provider actually sent, in three cases: a JSON parse error
286+
(``ValueError`` covering ``json.JSONDecodeError`` and
287+
``UnicodeDecodeError``), ``parse_qsl`` returning ``{}`` from a
288+
non-empty body (e.g. an HTML error page served with a form-encoded
289+
content-type), and ``response.text`` failing to decode the body
290+
bytes.
291+
292+
Args:
293+
response: HTTP response from the token endpoint.
294+
295+
Returns:
296+
Parsed token payload, or ``{"raw_response": <text>}`` when the
297+
body is neither valid JSON nor parseable as form-encoded.
298+
"""
299+
raw_content_type = response.headers.get("content-type", "")
300+
content_type = raw_content_type.lower()
301+
302+
if "application/x-www-form-urlencoded" in content_type:
303+
text = OAuthManager._safe_response_text(response)
304+
# parse_qsl drops malformed pairs (no "="); we deliberately do not
305+
# set keep_blank_values=True so that garbage like an HTML error page
306+
# parses to {} and falls through to the raw_response capture below.
307+
parsed = dict(parse_qsl(text))
308+
# An HTML body that happens to contain "=" (e.g. <meta charset="utf-8">)
309+
# parses to a non-empty dict with garbage keys. Reject anything whose
310+
# keys aren't OAuth parameter shaped so the leak is bounded by the
311+
# raw_response truncation in _redact_token_response.
312+
if parsed and not all(OAuthManager._OAUTH_KEY_RE.match(k) for k in parsed):
313+
return {"raw_response": text}
314+
if not parsed and text:
315+
return {"raw_response": text}
316+
return parsed
317+
318+
try:
319+
return response.json()
320+
except ValueError as exc:
321+
# ValueError covers json.JSONDecodeError (malformed JSON) and
322+
# UnicodeDecodeError (bad charset). Narrower than bare Exception,
323+
# which would swallow httpx.ResponseNotRead, MemoryError, etc.
324+
text = OAuthManager._safe_response_text(response)
325+
logger.warning(
326+
"Failed to parse OAuth token response as JSON: %s (status=%s, content-type=%r, body_bytes=%d)",
327+
exc,
328+
response.status_code,
329+
raw_content_type,
330+
len(response.content),
331+
exc_info=True,
332+
)
333+
return {"raw_response": text}
334+
335+
@staticmethod
336+
def _redact_token_response(token_response: Dict[str, Any]) -> Dict[str, Any]:
337+
"""Return a log/error-safe copy of a token response.
338+
339+
Three layers of protection so that misbehaving providers, HTML error
340+
pages, and verbose stack traces don't leak secrets via OAuthError or
341+
log lines:
342+
343+
1. Replace values for known credential-bearing keys with
344+
``"[REDACTED]"``.
345+
2. Scrub URL/form-style ``<key>=<secret>`` patterns inside any string
346+
value (HTML hrefs, form actions, stack traces).
347+
3. Cap any string value at ``_MAX_RAW_RESPONSE_LEN`` chars with a
348+
``... [truncated, N chars total]`` marker.
349+
350+
Args:
351+
token_response: Parsed token payload (possibly containing tokens
352+
or a captured raw body).
353+
354+
Returns:
355+
New dict safe to interpolate into log lines and exception messages.
356+
"""
357+
cap = OAuthManager._MAX_RAW_RESPONSE_LEN
358+
redacted: Dict[str, Any] = {}
359+
for key, value in token_response.items():
360+
if key in OAuthManager._SENSITIVE_TOKEN_KEYS:
361+
redacted[key] = "[REDACTED]"
362+
continue
363+
if isinstance(value, str):
364+
# Scrub URL/form-style "<key>=<secret>" patterns first (HTML
365+
# bodies often carry tokens in href / form action attributes
366+
# that fit entirely inside the truncation window), then cap.
367+
scrubbed = OAuthManager._LEAKY_PARAM_RE.sub(lambda m: f"{m.group(1)}=[REDACTED]", value)
368+
if len(scrubbed) > cap:
369+
redacted[key] = f"{scrubbed[:cap]}... [truncated, {len(value)} chars total]"
370+
else:
371+
redacted[key] = scrubbed
372+
else:
373+
redacted[key] = value
374+
return redacted
375+
239376
async def _client_credentials_flow(self, credentials: Dict[str, Any]) -> str:
240377
"""Machine-to-machine authentication using client credentials.
241378
@@ -271,28 +408,10 @@ async def _client_credentials_flow(self, credentials: Dict[str, Any]) -> str:
271408
response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
272409
response.raise_for_status()
273410

274-
# GitHub returns form-encoded responses, not JSON
275-
content_type = response.headers.get("content-type", "")
276-
if "application/x-www-form-urlencoded" in content_type:
277-
# Parse form-encoded response
278-
text_response = response.text
279-
token_response = {}
280-
for pair in text_response.split("&"):
281-
if "=" in pair:
282-
key, value = pair.split("=", 1)
283-
token_response[key] = value
284-
else:
285-
# Try JSON response
286-
try:
287-
token_response = response.json()
288-
except Exception as e:
289-
logger.warning(f"Failed to parse JSON response: {e}")
290-
# Fallback to text parsing
291-
text_response = response.text
292-
token_response = {"raw_response": text_response}
411+
token_response = self._parse_token_response(response)
293412

294413
if "access_token" not in token_response:
295-
raise OAuthError(f"No access_token in response: {token_response}")
414+
raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
296415

297416
logger.info("""Successfully obtained access token via client credentials""")
298417
return token_response["access_token"]
@@ -357,28 +476,10 @@ async def _password_flow(self, credentials: Dict[str, Any]) -> str:
357476
response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
358477
response.raise_for_status()
359478

360-
# Handle both JSON and form-encoded responses
361-
content_type = response.headers.get("content-type", "")
362-
if "application/x-www-form-urlencoded" in content_type:
363-
# Parse form-encoded response
364-
text_response = response.text
365-
token_response = {}
366-
for pair in text_response.split("&"):
367-
if "=" in pair:
368-
key, value = pair.split("=", 1)
369-
token_response[key] = value
370-
else:
371-
# Try JSON response
372-
try:
373-
token_response = response.json()
374-
except Exception as e:
375-
logger.warning(f"Failed to parse JSON response: {e}")
376-
# Fallback to text parsing
377-
text_response = response.text
378-
token_response = {"raw_response": text_response}
479+
token_response = self._parse_token_response(response)
379480

380481
if "access_token" not in token_response:
381-
raise OAuthError(f"No access_token in response: {token_response}")
482+
raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
382483

383484
logger.info("Successfully obtained access token via password grant")
384485
return token_response["access_token"]
@@ -455,28 +556,10 @@ async def exchange_code_for_token(self, credentials: Dict[str, Any], code: str,
455556
response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
456557
response.raise_for_status()
457558

458-
# GitHub returns form-encoded responses, not JSON
459-
content_type = response.headers.get("content-type", "")
460-
if "application/x-www-form-urlencoded" in content_type:
461-
# Parse form-encoded response
462-
text_response = response.text
463-
token_response = {}
464-
for pair in text_response.split("&"):
465-
if "=" in pair:
466-
key, value = pair.split("=", 1)
467-
token_response[key] = value
468-
else:
469-
# Try JSON response
470-
try:
471-
token_response = response.json()
472-
except Exception as e:
473-
logger.warning(f"Failed to parse JSON response: {e}")
474-
# Fallback to text parsing
475-
text_response = response.text
476-
token_response = {"raw_response": text_response}
559+
token_response = self._parse_token_response(response)
477560

478561
if "access_token" not in token_response:
479-
raise OAuthError(f"No access_token in response: {token_response}")
562+
raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
480563

481564
logger.info("""Successfully exchanged authorization code for access token""")
482565
return token_response["access_token"]
@@ -1232,28 +1315,10 @@ async def _exchange_code_for_tokens(self, credentials: Dict[str, Any], code: str
12321315
response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
12331316
response.raise_for_status()
12341317

1235-
# GitHub returns form-encoded responses, not JSON
1236-
content_type = response.headers.get("content-type", "")
1237-
if "application/x-www-form-urlencoded" in content_type:
1238-
# Parse form-encoded response
1239-
text_response = response.text
1240-
token_response = {}
1241-
for pair in text_response.split("&"):
1242-
if "=" in pair:
1243-
key, value = pair.split("=", 1)
1244-
token_response[key] = value
1245-
else:
1246-
# Try JSON response
1247-
try:
1248-
token_response = response.json()
1249-
except Exception as e:
1250-
logger.warning(f"Failed to parse JSON response: {e}")
1251-
# Fallback to text parsing
1252-
text_response = response.text
1253-
token_response = {"raw_response": text_response}
1318+
token_response = self._parse_token_response(response)
12541319

12551320
if "access_token" not in token_response:
1256-
raise OAuthError(f"No access_token in response: {token_response}")
1321+
raise OAuthError(f"No access_token in response: {self._redact_token_response(token_response)}")
12571322

12581323
logger.info("""Successfully exchanged authorization code for tokens""")
12591324
return token_response
@@ -1326,20 +1391,23 @@ async def refresh_token(self, refresh_token: str, credentials: Dict[str, Any]) -
13261391
client = await self._get_client()
13271392
response = await client.post(token_url, data=token_data, timeout=self.request_timeout)
13281393
if response.status_code == 200:
1329-
token_response = response.json()
1394+
token_response = self._parse_token_response(response)
13301395

13311396
# Validate required fields
13321397
if "access_token" not in token_response:
1333-
raise OAuthError("No access_token in refresh response")
1398+
raise OAuthError(f"No access_token in refresh response: {self._redact_token_response(token_response)}")
13341399

13351400
logger.info("Successfully refreshed OAuth token")
13361401
return token_response
13371402

1338-
error_text = response.text
1339-
# If we get a 400/401, the refresh token is likely invalid
1403+
# Bound and redact the body before surfacing it. Some providers echo
1404+
# request parameters (including refresh_token / client_secret) in error
1405+
# responses, and HTML error pages can be unbounded — both leak via logs
1406+
# and OAuthError messages without this scrub.
1407+
error_payload = self._redact_token_response(self._parse_token_response(response))
13401408
if response.status_code in [400, 401]:
1341-
raise OAuthError(f"Refresh token invalid or expired: {error_text}")
1342-
logger.warning(f"Token refresh failed with status {response.status_code}: {error_text}")
1409+
raise OAuthError(f"Refresh token invalid or expired: {error_payload}")
1410+
logger.warning("Token refresh failed with status %s: %s", response.status_code, error_payload)
13431411

13441412
except httpx.HTTPError as e:
13451413
logger.warning(f"Token refresh attempt {attempt + 1} failed: {str(e)}")

0 commit comments

Comments
 (0)