Skip to content

bug: incomplete Envoy logs output due to overly restrictive grep patterns #1083

@Xunzhuo

Description

@Xunzhuo

Description

The vllm-sr logs envoy command outputs incomplete logs due to overly restrictive grep patterns used to filter Envoy logs from the Docker container output. This results in missing important log entries including:

  • Envoy access logs from /var/log/envoy_access.log (tailed by tail_access_logs program)
  • Envoy initialization and startup messages
  • Envoy error messages that don't match the strict timestamp pattern
  • Config generation logs and other diagnostic information

Context

Current Implementation

Python CLI (src/vllm-sr/cli/core.py:227):

grep_pattern = r"\[2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].*\]\[.*\]|spawned: \'envoy\'|success: envoy"

Go Dashboard Backend (dashboard/backend/handlers/logs.go:229):

if (strings.Contains(line, "[20") && strings.Contains(line, "][")) ||
    strings.Contains(line, "spawned: 'envoy'") ||
    strings.Contains(lineLower, "success: envoy") ||
    strings.Contains(lineLower, "envoy entered running") ||
    strings.Contains(line, "Generating Envoy config") {
    filtered = append(filtered, line)
}

Root Cause

The grep pattern \[2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].*\]\[.*\] is too strict and only matches logs with the exact format [YYYY-MM-DD ...][...]. This misses:

  1. Access logs - The tail_access_logs program (supervisord.conf:46-56) tails /var/log/envoy_access.log which has a different format:

    [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" ...
    
  2. Envoy startup logs - Some initialization messages may not follow the strict timestamp format

  3. Error messages - Critical error messages that don't match the pattern

  4. Config generation output - The Python config generator output before Envoy starts

Affected Components

  • src/vllm-sr/cli/core.py - show_logs() function (lines 197-251)
  • dashboard/backend/handlers/logs.go - fetchLogsFromDocker() function (lines 186-255)
  • src/vllm-sr/supervisord.conf - Envoy and tail_access_logs programs

Proposed Solution

Option 1: Relax the Grep Pattern (Recommended)

Update the grep pattern to be more inclusive:

Python CLI:

# Match envoy-specific logs more broadly
grep_pattern = r"\[20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]|\[.*\]\[.*\]|envoy|spawned: \'envoy\'|success: envoy|Generating Envoy config"

Go Backend:

case "envoy":
    // Match envoy-specific logs more broadly
    if strings.Contains(line, "[20") ||  // Timestamp pattern
       strings.Contains(lineLower, "envoy") ||  // Any envoy-related log
       strings.Contains(line, "spawned: 'envoy'") ||
       strings.Contains(lineLower, "success: envoy") ||
       strings.Contains(line, "Generating Envoy config") {
        filtered = append(filtered, line)
    }

Option 2: Use Supervisor Log Files Directly

Instead of filtering from docker logs, read directly from supervisor log files:

  • /var/log/supervisor/envoy.log
  • /var/log/supervisor/envoy-error.log
  • /var/log/supervisor/tail_access_logs.log

This approach is already partially implemented in fetchLogsFromSupervisor() but needs to be the primary method.

Option 3: Add Envoy-Specific Log Markers

Modify the supervisord configuration to add a prefix to all Envoy-related logs:

[program:envoy]
command=/bin/sh -c "python -m cli.config_generator /app/config.yaml /etc/envoy/envoy.yaml && /usr/local/bin/envoy -c /etc/envoy/envoy.yaml --log-level debug --log-format '[ENVOY][%%Y-%%m-%%d %%T.%%e][%%l] %%v'"

Then filter by [ENVOY] prefix.

Steps to Reproduce

  1. Start vllm-sr: vllm-sr serve
  2. Make some requests to generate access logs
  3. Run: vllm-sr logs envoy
  4. Observe that access logs and some initialization logs are missing

Expected Behavior

The vllm-sr logs envoy command should show:

  • All Envoy application logs (info, debug, error)
  • All Envoy access logs from /var/log/envoy_access.log
  • Config generation logs
  • Supervisor status messages
  • All error messages regardless of format

Actual Behavior

Only logs matching the strict timestamp pattern [YYYY-MM-DD HH:MM:SS.mmm][level] are shown, missing access logs and other important diagnostic information.

Reference

  • Supervisord config: src/vllm-sr/supervisord.conf
  • Python CLI: src/vllm-sr/cli/core.py:197-251
  • Go backend: dashboard/backend/handlers/logs.go:186-255
  • Envoy log format: src/vllm-sr/cli/templates/envoy.template.yaml:67-72

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In progress

Relationships

None yet

Development

No branches or pull requests

Issue actions