Skip to content

Improve logging by splitting into multiple streams for easier cross-section analysis #1623

Open
@achimnol

Description

To avoid too high volume of debug logs, currently Backend.AI Manager and Agent provides [debug].log-xxxx boolean options to enable/disable specific type of log messages, as shown in:

[debug]
# Enable or disable the debug-level logging.
enabled = false
# If set true, it does not actually delete the containers after they terminate or are terminated
# so that developers can inspect the container logs.
# This is useful for debugging errors that make containers to terminate immediately after kernel
# launches, due to bugs in initialization steps such as jail.
skip-container-deletion = false
# Enable or disable the asyncio debug mode.
asyncio = false
# Use the custom task factory to get more detailed asyncio task information; this may have performance penalties
enhanced-aiomonitor-task-info = false
# Enable the debug mode of the kernel-runner.
kernel-runner = false
# Include debug-level logs for internal events.
log-events = false
# Include debug-level logs for detailed kernel creation configs and their resource spec.
log-kernel-config = false
# Include debug-level logs for allocation maps.
log-alloc-map = false
# Include debug-level logs for statistics.
log-stats = false
# Include debug-level logs for heartbeats
log-heartbeats = false
# Set the interval of agent heartbeats in seconds.
heartbeat-interval = 20.0
# Include debug-level logs for docker event stream.
log-docker-events = false

However, this approach makes it harder to perform a postmortem anaylsis on customer sites because we usually turn off many of these log "sections" by default.

Let's split out them to multiple different log streams with higher log levels. For example, as most container engine events are logged in the DEBUG level with log-docker-events = true currently, let's promote them to the INFO level using a separate "agent-container-events.log" output stream.

This will allow easier cross-section, postmortem analysis if combined with additional log viewer tools like #1138.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    comp:agentRelated to Agent componentcomp:managerRelated to Manager componentcomp:storage-proxyRelated to Storage proxy componentcomp:webserverRelated to Web Server componenteffort:normalNeed to understand a few modules / some extent of contextual or historical information.impact:invisibleThis change is invisible to users (internal changes).urgency:4As soon as feasible, implementation is essential.

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions