Skip to content

Feature/logging overhaul#226

Open
dcox761 wants to merge 8 commits into
aws-samples:mainfrom
bluecrystalsolutions:feature/logging-overhaul
Open

Feature/logging overhaul#226
dcox761 wants to merge 8 commits into
aws-samples:mainfrom
bluecrystalsolutions:feature/logging-overhaul

Conversation

@dcox761
Copy link
Copy Markdown

@dcox761 dcox761 commented Mar 3, 2026

Issue #225

Description of changes:
1. Centralise TRACE_LEVEL in setting.py

Register the custom TRACE level (5, below DEBUG at 10) once in setting.py and import it everywhere:

# setting.py
TRACE_LEVEL = 5
logging.addLevelName(TRACE_LEVEL, "TRACE")

2. Three-tier logging

  • INFO (default) — startup, errors, CORS warnings, and USAGE summary per request
  • DEBUG (DEBUG=true) — request/response details, model resolution, tool calls, cache points
  • TRACE (TRACE=true) — per-chunk streaming data, raw request/response bodies, request headers

Only the api logger hierarchy is affected — boto3/botocore/urllib3 stay at INFO:

if TRACE or DEBUG:
    logging.getLogger("api").setLevel(TRACE_LEVEL if TRACE else logging.DEBUG)

3. ECS-aware log format

Detect ECS/Fargate via ECS_CONTAINER_METADATA_URI and skip log formatting (CloudWatch adds its own timestamps and metadata). Outside ECS, use human-readable format:

_log_kwargs = {"level": logging.INFO}
if not os.environ.get("ECS_CONTAINER_METADATA_URI"):
    _log_kwargs["format"] = "%(asctime)s [%(levelname)s] %(message)s"
logging.basicConfig(**_log_kwargs)

4. Lazy %s formatting and isEnabledFor guards

All logger.debug() calls use %s lazy formatting instead of f-strings. Expensive debug operations (JSON serialisation, dict formatting) are guarded with isEnabledFor:

# Before (eager f-string evaluation even when debug is off):
logger.debug(f"Resolved {model_id}{result}")

# After (lazy — no string formatting unless DEBUG is enabled):
logger.debug("Resolved %s -> %s", model_id, result)

# Expensive operations guarded:
if logger.isEnabledFor(TRACE_LEVEL):
    logger.log(TRACE_LEVEL, "Request body: %s",
               json.dumps(chat_request.model_dump(), indent=2, default=str))

5. Per-request USAGE logging (INFO level)

A structured one-line summary logged after every request completes:

USAGE | user=admin@example.com | chat=abc-123 | model=claude-sonnet-4 | max_tokens=4096 | in=1523 | out=847 | cache_write=0 | cache_read=1200 | ua=OpenAI/Python

User and chat ID are extracted from configurable HTTP headers (empty defaults — only active when configured):

# Optional: configure headers from your upstream proxy
export USAGE_USER_HEADER=x-openwebui-user-email
export USAGE_CHAT_ID_HEADER=x-openwebui-chat-id

6. Improved validation error handler

The validation exception handler now logs error count at WARNING level and optionally logs the rejected request body at TRACE level:

@app.exception_handler(RequestValidationError)
async def validation_exception_handler(request, exc):
    error_count = len(exc.errors()) if hasattr(exc, 'errors') else 'unknown'
    logger.warning("Request validation failed: %s %s - %s error(s)",
                   request.method, request.url.path, error_count)
    if logger.isEnabledFor(TRACE_LEVEL):
        body = (await request.body()).decode("utf-8", errors="replace")[:2000]
        logger.log(TRACE_LEVEL, "Rejected request body: %s", body)
    return PlainTextResponse(str(exc), status_code=400)

New environment variables

Variable Default Description
TRACE false Enable TRACE-level logging (very verbose)
USAGE_USER_HEADER "" HTTP header for user identity in USAGE lines
USAGE_CHAT_ID_HEADER "" HTTP header for chat/session ID in USAGE lines

Files: src/api/setting.py, src/api/app.py, src/api/routers/chat.py, src/api/models/bedrock.py

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

bcsdcox added 8 commits March 3, 2026 14:20
…trumentation

- Register custom TRACE level (5) for per-chunk streaming logs
- Refactor all 'if DEBUG: logger.info' to logger.debug with isEnabledFor guards
- Replace per-chunk logging with TRACE level (2 sites in bedrock.py)
- Add INFO-level USAGE line per request: user, chat, model, max_tokens,
  tokens in/out/cache, user-agent
- Add configurable USAGE_USER_HEADER / USAGE_CHAT_ID_HEADER env vars
  for extracting request attribution from any upstream proxy
  (empty by default; set to your proxy's header names to enable)
- Add TRACE env var in setting.py
- Enhanced validation error handler with error details and TRACE body logging
Use Python's default basicConfig format (LEVEL:name:message) instead of
a custom format with timestamps.  This avoids:
1. Duplicate timestamps — CloudWatch adds its own
2. Inconsistent format — uvicorn access logs use 'INFO:' prefix while
   our custom format used '[INFO]' brackets
…tion

Three debug log calls that originally had 'if DEBUG:' guards in upstream
lost their protection when converted to logger.debug() in the logging
overhaul. This restores evaluation guards using isEnabledFor(DEBUG) to
avoid unnecessary str() conversion on every request when debug logging
is disabled.

Also converts one string concatenation debug call to f-string for
consistency (model_name is already a string, no str() cost).
Remove verbose per-field error enumeration and JSON body truncation
logic from the validation exception handler. The WARNING log now just
reports method, path, and error count. The TRACE-level body dump is
a simple raw decode with a 2000-char cap.
The profile resolution block was outside the isEnabledFor(DEBUG) guard,
causing unnecessary dictionary lookup and method call on every request
even when debug logging is disabled.
- Remove unused 'import json' from app.py (no longer needed after handler simplification)
- Convert 7 f-string logger.debug() calls to lazy %s formatting in bedrock.py
  to avoid eager string interpolation when DEBUG is disabled
- Collapse if/elif TRACE/DEBUG into single conditional setLevel call
- Restore upstream's log format (asctime + level + message) for local use
- Auto-detect ECS_CONTAINER_METADATA_URI: skip format in Fargate so
  CloudWatch doesn't get duplicate timestamps
TRACE_LEVEL=5 was defined in three files (app.py, bedrock.py, chat.py).
Move the definition and addLevelName() call into setting.py so it is
registered once and imported by all consumers.
Copy link
Copy Markdown
Member

@zxkane zxkane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough logging overhaul! The three-tier logging architecture (INFO/DEBUG/TRACE), ECS-aware formatting, and scoping to the api logger hierarchy are all well-designed improvements.

There are a couple of security concerns that should be addressed before merging:

Critical:

  • TRACE logging dumps full HTTP headers including Authorization: Bearer <API_KEY> — credentials should be redacted from logs even at TRACE level.
  • Silent except Exception: pass in the validation error handler swallows all errors when TRACE is enabled for debugging.

Important:

  • Mutable class-level default request_meta: dict = {} is a classic Python pitfall — currently safe by coincidence but fragile.
  • User identity extracted from untrusted HTTP headers is logged without sanitization (log injection risk) and at always-on INFO level (PII concern).

Suggestions:

  • Some debug lines still use eager string concatenation / f-strings, inconsistent with the PR's stated goal of lazy %s formatting.

See inline comments for details and suggested fixes.

Comment thread src/api/routers/chat.py
Comment on lines +44 to +49
TRACE_LEVEL,
"Request headers: %s",
json.dumps(dict(request.headers), indent=2),
)
logger.log(
TRACE_LEVEL,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: This dumps full HTTP headers including Authorization: Bearer <API_KEY> to the log stream.

This application uses HTTPBearer authentication (see auth.py). When TRACE is enabled, every caller's API key is written to logs in plaintext. In AWS deployments, CloudWatch logs may have broader access than secrets management, and logs may be forwarded to third-party observability platforms (Datadog, Splunk, etc.).

Please redact sensitive headers before logging:

REDACTED_HEADERS = {"authorization", "cookie", "x-api-key"}

if logger.isEnabledFor(TRACE_LEVEL):
    safe_headers = {
        k: ("***REDACTED***" if k.lower() in REDACTED_HEADERS else v)
        for k, v in request.headers.items()
    }
    logger.log(TRACE_LEVEL, "Request headers: %s", json.dumps(safe_headers, indent=2))

Comment thread src/api/app.py
Comment on lines +76 to +80
body = (await request.body()).decode("utf-8", errors="replace")[:2000]
logger.log(TRACE_LEVEL, "Rejected request body: %s", body)
except Exception:
pass

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Silent except Exception: pass swallows all errors.

When an operator has explicitly enabled TRACE for debugging, silently discarding the body-read failure defeats the purpose — it's exactly the kind of diagnostic info they need. At minimum, log the exception:

except Exception as e:
    logger.warning("Failed to read rejected request body for TRACE logging: %s", e)

Also, the WARNING-level message above (line 69-73) only logs the error count now but no longer shows what the errors are. The previous code at least included the first line of the exception string. Consider adding a brief summary:

error_summary = "; ".join(
    f"{e.get('loc', '?')}: {e.get('type', '?')}"
    for e in (exc.errors()[:3] if hasattr(exc, 'errors') else [])
)
logger.warning(
    "Request validation failed: %s %s - %s error(s): %s",
    request.method, request.url.path, error_count, error_summary or "no details",
)

Comment thread src/api/models/bedrock.py


class BedrockModel(BaseChatModel):
request_meta: dict = {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Mutable class-level default dict = {} is shared across all instances.

BedrockModel inherits from BaseChatModel(ABC) — a plain Python class, not Pydantic. This {} is created once at class definition time and shared across all instances. Currently safe by coincidence (because chat.py always reassigns a fresh dict), but fragile — any future code that does model.request_meta["key"] = val (mutation) instead of reassignment would leak data across requests.

Also, _log_usage_summary uses getattr(self, "request_meta", {}) defensively, which suggests uncertainty about whether the attribute exists — further sign of a fragile design.

Fix: use None as default and initialize per-instance:

class BedrockModel(BaseChatModel):
    request_meta: dict | None = None

    def _log_usage_summary(self, ...):
        meta = self.request_meta or {}
        ...

Comment thread src/api/routers/chat.py
Comment on lines +64 to 70
model.request_meta = {
"user_email": request.headers.get(USAGE_USER_HEADER, "-") if USAGE_USER_HEADER else "-",
"chat_id": request.headers.get(USAGE_CHAT_ID_HEADER, "-") if USAGE_CHAT_ID_HEADER else "-",
"max_tokens": effective_max_tokens,
"user_agent": request.headers.get("user-agent", "-"),
}
model.validate(chat_request)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: User identity from untrusted HTTP headers — log injection and PII concerns.

Two issues here:

1. Log injection risk — Header values are logged without any sanitization. A caller can set the header to a value containing newlines to create fake log entries, e.g.:

x-openwebui-user-email: admin@evil.com\n2026-03-06 USAGE | user=legitimate@company.com | ...

Python's %s formatting does not escape newlines. This could be used for impersonation in audit logs or to pollute monitoring/alerting systems.

2. PII at INFO level — When USAGE_USER_HEADER is configured, user email addresses are logged at INFO level (always-on) for every request. This is PII under GDPR/CCPA. In AWS, CloudWatch logs may be retained indefinitely by default. Consider logging USAGE at DEBUG level, or at minimum document clearly that configuring these headers causes PII to be written to logs.

Suggestion: sanitize header values before logging:

import re

def _sanitize_header(value: str, max_len: int = 128) -> str:
    return re.sub(r'[\x00-\x1f\x7f]', '', value)[:max_len]

model.request_meta = {
    "user_email": _sanitize_header(request.headers.get(USAGE_USER_HEADER, "-")) if USAGE_USER_HEADER else "-",
    ...
}

Comment thread src/api/models/bedrock.py
Comment on lines 365 to 377
"""Common logic for invoke bedrock models"""
if DEBUG:
logger.info("Raw request: " + chat_request.model_dump_json())
if logger.isEnabledFor(logging.DEBUG):
logger.debug("Raw request: " + chat_request.model_dump_json())

# Log profile resolution for debugging
if chat_request.model in profile_metadata:
resolved = self._resolve_to_foundation_model(chat_request.model)
profile_type = profile_metadata[chat_request.model].get("profile_type", "UNKNOWN")
logger.info(
logger.debug(
f"Profile resolution: {chat_request.model} ({profile_type}) → {resolved}"
)

# convert OpenAI chat request to Bedrock SDK request
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Several debug lines still use eager string concatenation / f-strings.

The PR description states the goal of converting to lazy %s formatting, but these lines within the isEnabledFor guard still use eager evaluation:

  • Line 366: logger.debug("Raw request: " + chat_request.model_dump_json())
  • Line 372: logger.debug(f"Profile resolution: {chat_request.model} ({profile_type}) → {resolved}")
  • Line 377: logger.debug("Bedrock request: " + json.dumps(str(args)))

While the isEnabledFor guard prevents this when debug is off, converting to %s would be consistent with the PR's stated approach:

logger.debug("Raw request: %s", chat_request.model_dump_json())
logger.debug("Profile resolution: %s (%s) -> %s", chat_request.model, profile_type, resolved)
logger.debug("Bedrock request: %s", json.dumps(str(args)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants