Logging #31

kamilsi · 2023-12-06T18:50:11Z

kamilsi
Dec 6, 2023
Maintainer

Currently, our package relies on individual functions to report errors and warnings. While this approach works, I believe we can improve our efficiency and clarity by implementing an integrated logging system.

Benefits of Integrated Logging:

Centralized Error Reporting: Instead of scattered warnings and errors, a logging system would provide a unified view of issues across the entire data processing pipeline. A comprehensive log would make it easier to trace back through the processing steps and pinpoint where and why an error occurred.
Severity Levels: With logging, we can categorize messages by severity (INFO, WARNING, ERROR), allowing for more nuanced monitoring and response. By default we can set to log just WARNING or worse. Differentiating between INFO, WARNING, and ERROR allows for targeted troubleshooting and efficient monitoring, reducing the time spent on resolving non-critical logs.
Ability to granularly deal with information: many packages work with hierarchical / stacked logging, enabling sophisticated setups, where e.g.:
1. Critical issues (WARNINGS, ERRORS, and FATAL messages) will be immediately output to the console. This ensures that any significant problem is promptly brought to the attention of the user or developer, facilitating quick responses and resolution.
2. All log messages, regardless of severity, will be recorded in a log file. This includes INFO, DEBUG, and TRACE level messages, offering a detailed and chronological record of all events and operations.
3. We will be sending some key performance metrics logs to AWS CloudWatch. This integration enables the tracking of performance metrics over time, aiding in identifying trends, bottlenecks, and potential areas of optimization. I'm just saying this kind of telemetry is a technical option (for us, or the end-user).
  - This also illustrates that it is easy to inorporate the logs from our package if other programmers decide to use it to build something bigger (e.g. shiny app, batch-processing pipeline)

Example of a Log Entry with Contextual Information:

2023-12-06 10:15:32 | INFO | DEMOGRAPHICS | SUBJID | compute_ID | Starting to process
2023-12-06 10:15:34 | ERROR | DEMOGRAPHICS | SUBJID | split_string | Error encountered - Missing data in columns X, Y
2023-12-06 10:15:35 | INFO | DEMOGRAPHICS | SUBJID | compute_ID | Finished processing

Here we have time, log-level, domain, variable derived, function that wrote the log and finally the message. This not only enables to pinpoint the problems easily, but also it is easy to track the performance of the package (e.g. which function is our bottleneck, takes most time).

I believe this change can significantly enhance our package's robustness and user-friendliness. However, this is a collaborative project, and your insights and suggestions are invaluable. I would like to open up the discussion for:

Your thoughts on integrating a logging system
Potential logging packages and tools that you recommend (e.g., logger, futile.logger)
Ideas for customizing the logging to best suit our needs
Looking forward to your valuable input and ideas!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging #31

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Logging #31

Uh oh!

kamilsi Dec 6, 2023 Maintainer

Replies: 0 comments

kamilsi
Dec 6, 2023
Maintainer