Skip to content

Commit a1eac0f

Browse files
bjaggclaude
andauthored
Issue #938: Redact DATABASE_URL password before logging at startup (#951)
## Summary `database_setup.py` was logging the full `DATABASE_URL` (with PG password) on every MDR API task startup. Shared CloudWatch retention means anyone with `logs:FilterLogEvents` could recover the dev/demo DB credential. Closes #938. ## Fix Small `_redact_url(url) -> str` helper using `urlparse`/`urlunparse`: ```diff -logger.info(f"DATABASE_URL : {DATABASE_URL}") +logger.info("DATABASE_URL : %s", _redact_url(DATABASE_URL)) ``` Output before: ``` DATABASE_URL : postgresql+asyncpg://postgres:sNa22:}33eS$u:b&X_XdZu!v~4<$r.P2@devmdrdb.dev.aws:5432/devMdrDb ``` Output after: ``` DATABASE_URL : postgresql+asyncpg://postgres:***@devmdrdb.dev.aws:5432/devMdrDb ``` Helper is best-effort: any parsing surprise (malformed URL, missing env var producing a non-integer port) returns a `<unparseable-url>` sentinel rather than raising — a logging path should never take down MDR startup. ## Tests First tests under `test/components/lif/mdr_utils/` (new directory): - `_redact_url` covers typical password redaction, special characters in the password, no-port URLs, no-password URLs (IAM auth), and unparseable input. - A small `conftest.py` seeds dummy `POSTGRESQL_*` env vars so importing `database_setup` doesn't fail at engine-construction time (these pure-Python tests never touch a real DB). `uv run pytest test/components/lif/mdr_utils/` — 5 passed. ## Follow-up (out of scope here) - **Rotate dev + demo DB passwords.** Previous values are sitting in CloudWatch retention. - **Audit other LIF services** that share this logging pattern: `graphql_*`, `translator_*`, `advisor_api`, etc. Same pattern, same fix shape. Both worth filing as separate issues once this lands. ## Risk Very low. Pure-Python helper added, single log line changed from f-string to lazy-format with redaction. No behavior change for the engine itself — `create_async_engine` still gets the raw `DATABASE_URL` (it needs the credential to actually connect). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 303eb63 commit a1eac0f

4 files changed

Lines changed: 95 additions & 1 deletion

File tree

components/lif/mdr_utils/database_setup.py

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import os
55
import re
66
from typing import AsyncGenerator
7+
from urllib.parse import urlparse, urlunparse
78

89
from fastapi import HTTPException, Request, status
910
from lif.mdr_utils.logger_config import get_logger
@@ -15,8 +16,39 @@
1516
logger = get_logger(__name__)
1617

1718

19+
def _redact_url(url: str) -> str:
20+
"""Mask the password in a SQLAlchemy connection URL for safe logging.
21+
22+
Returns the URL with the password replaced by ``***`` while preserving
23+
scheme, username, host, port, and database. URLs without a password
24+
are returned unchanged. Issue #938: the previous startup log emitted
25+
the full URL including the credential, exposing the dev/demo DB
26+
password to anyone with read access on the shared CloudWatch log
27+
group.
28+
29+
Best-effort: this is only called for logging, so we never want it to
30+
raise and take down MDR startup. Any parsing surprise (urlparse on
31+
a malformed URL, ``parts.port`` raising on a non-integer port — which
32+
happens when an env var was unset and the URL contains the literal
33+
string ``None``) returns a sentinel so the log line still emits
34+
something operator-readable.
35+
"""
36+
try:
37+
parts = urlparse(url)
38+
if not parts.password:
39+
return url
40+
user = parts.username or ""
41+
host = parts.hostname or ""
42+
netloc = f"{user}:***@{host}"
43+
if parts.port:
44+
netloc += f":{parts.port}"
45+
return urlunparse(parts._replace(netloc=netloc))
46+
except ValueError:
47+
return "<unparseable-url>"
48+
49+
1850
DATABASE_URL = f"postgresql+asyncpg://{os.getenv('POSTGRESQL_USER')}:{os.getenv('POSTGRESQL_PASSWORD')}@{os.getenv('POSTGRESQL_HOST')}:{os.getenv('POSTGRESQL_PORT')}/{os.getenv('POSTGRESQL_DB')}"
19-
logger.info(f"DATABASE_URL : {DATABASE_URL}")
51+
logger.info("DATABASE_URL : %s", _redact_url(DATABASE_URL))
2052
# Create an async engine
2153
engine = create_async_engine(DATABASE_URL, echo=True)
2254

test/components/lif/mdr_utils/__init__.py

Whitespace-only changes.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
"""Seed dummy env vars so importing `lif.mdr_utils.database_setup` doesn't
2+
fail at engine-construction time when running these unit tests.
3+
4+
The tests in this directory exercise pure-Python helpers (e.g.
5+
`_redact_url`); they never actually connect to a database. We just need
6+
the module-level `DATABASE_URL` + `create_async_engine` calls to succeed.
7+
"""
8+
9+
import os
10+
11+
os.environ.setdefault("POSTGRESQL_USER", "test")
12+
os.environ.setdefault("POSTGRESQL_PASSWORD", "test")
13+
os.environ.setdefault("POSTGRESQL_HOST", "localhost")
14+
os.environ.setdefault("POSTGRESQL_PORT", "5432")
15+
os.environ.setdefault("POSTGRESQL_DB", "test")
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
"""Unit tests for `database_setup._redact_url` — issue #938.
2+
3+
The full `database_setup` module imports SQLAlchemy/asyncpg/etc at import
4+
time and tries to construct an engine from env vars, so we import the
5+
helper directly rather than the whole module via the package surface."""
6+
7+
from lif.mdr_utils.database_setup import _redact_url
8+
9+
10+
class TestRedactUrl:
11+
def test_masks_password_in_typical_postgres_url(self):
12+
url = "postgresql+asyncpg://postgres:s3cret!@dbhost.example:5432/mydb"
13+
assert _redact_url(url) == "postgresql+asyncpg://postgres:***@dbhost.example:5432/mydb"
14+
15+
def test_masks_password_with_special_characters(self):
16+
# Real dev password observed in CloudWatch had `:`, `}`, `&`, `<`, `$`
17+
# in it. urlparse handles percent-encoding; here we use a
18+
# representative literal to confirm the redaction doesn't choke.
19+
url = "postgresql+asyncpg://postgres:p:%24sw0rd@host:5432/db"
20+
redacted = _redact_url(url)
21+
assert "p:%24sw0rd" not in redacted
22+
assert "postgres:***@host:5432/db" in redacted
23+
24+
def test_url_without_port_still_redacts(self):
25+
url = "postgresql+asyncpg://postgres:s3cret@host/db"
26+
# No explicit port — netloc is just user:pass@host. Redacted form
27+
# must drop the password but keep everything else.
28+
redacted = _redact_url(url)
29+
assert "s3cret" not in redacted
30+
assert "postgres:***@host" in redacted
31+
assert "/db" in redacted
32+
33+
def test_url_without_password_is_unchanged(self):
34+
# IAM-auth style (no password), or local trust auth — we return
35+
# the original string rather than mangling it.
36+
url = "postgresql+asyncpg://postgres@host:5432/db"
37+
assert _redact_url(url) == url
38+
39+
def test_unparseable_input_does_not_raise(self):
40+
# The function is only ever called for logging; if it raises,
41+
# MDR startup fails. Return a sentinel string instead so the
42+
# log line still emits something operator-readable.
43+
# urlparse is famously tolerant — `urlparse("")` returns an
44+
# empty ParseResult rather than raising — so this test really
45+
# documents the "no exception" guarantee.
46+
assert _redact_url("") == ""
47+
assert _redact_url("not-a-url") == "not-a-url"

0 commit comments

Comments
 (0)