Skip to content

Commit 8fb8c56

Browse files
chores: store progres by including logging and configuration modules.
1 parent c85994a commit 8fb8c56

16 files changed

Lines changed: 1047 additions & 263 deletions

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,8 @@ __pycache__/
2121
/.pytest_cache/
2222
/.cache/
2323
/data/
24+
25+
26+
#Other
27+
demo/*
28+
sandbox/*

project_route.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Rationale
2+
3+
Our architecture consists of a number of Python packages working together. These packages require some common technical functionalities, such as logging or config handling, which are provided in well established solutions such as the logging module of Python standard library. In the past, we relied on our own implementations instead of the established solutions. In a previous attempt we intended to share these implementations across the packages, and we created the pypath-common package for this. Pypath-common came with minimal changes, it was mostly reorganization of existing code, and aimed for a working—but not optimal and future proof—solution asap. To address this shortcoming, we should migrate to standard solutions. To control these standard solutions in a way tailored to our software ecosystem, we should create a minimal layer on top of them. Below we specify what we expect from this new component.
4+
5+
# Specification
6+
7+
## Config handling
8+
9+
- Discovery and merging of configs in priority order
10+
- Working directory, user, package built-in
11+
- Format: YAML
12+
- Choose an established solution, e.g. https://hydra.cc/docs/intro/, https://omegaconf.readthedocs.io/en/latest/
13+
- Propagate config parameters to lower level packages
14+
15+
## Logging
16+
17+
- Registry of “our packages”
18+
- Control of dispatching messages to our log file
19+
- Formatter
20+
- Log traceback
21+
22+
### Core requirements logging
23+
- Plug-and-play logging and session management for Python packages.
24+
- Configuration via a YAML file (user provides a config file, no code changes needed for most settings).
25+
- Automatic logger and session setup with a single initialize(config_path) call.
26+
- Console and file logging enabled by default.
27+
- Log file rotation: When a log file exceeds 10 MB, a new file is created (configurable).
28+
- Timestamped log files: Log files include the date in their filename.
29+
- Log directory is created automatically if it does not exist.
30+
- Logger exclusion: User can specify loggers (e.g., pandas, matplotlib) to suppress or set to a higher log level via config.
31+
- Configurable log format, log level, log directory, app name, max file size, and backup count via YAML.
32+
- Session management: Centralized access to config and logger.
33+
- Demo folder: Example usage and configuration provided.
34+
- Tests folder: For unit tests (structure ready).
35+
- Use DictConfig to pass the logging configuration
36+
- Create one logger for each major component of our apps and software ecosystem
37+
- Also generate Json logs
38+
- Include the time zone in timestamps
39+
- Use a queue handler to make log calls non-blocking and async
40+
41+
Optional/Future Requirements
42+
- Per-application log files: Ability to create a dedicated log file for each application (config-ready, not yet implemented).
43+
- Extensible for more session/config features in the future.
44+
45+
## Session
46+
47+
- Has one logger and one config
48+
- Keeps things together and provides access anywhere in the code
49+
50+
51+
---
52+
53+
## Implementation Steps / Checklist
54+
55+
1. **Config handling: YAML loader and merging**
56+
- Use OmegaConf or Hydra for config loading.
57+
- Support merging configs from working dir, user, and package defaults.
58+
- Expose config as a DictConfig object.
59+
60+
2. **Logger setup: rotation, format, exclusion**
61+
- Implement logger setup with rotation, timestamped files, and both console and file handlers.
62+
- Allow exclusion or level control of 3rd-party loggers via config.
63+
- Make all parameters (format, level, dir, app name, size, backup count) configurable.
64+
65+
3. **Session management: central access**
66+
- Implement a Session class to hold config and logger.
67+
- Provide global access and ensure singleton pattern.
68+
69+
4. **Demo and example config**
70+
- Provide a demo script and YAML config showing all features, including logger exclusion and rotation.
71+
72+
5. **Component loggers and JSON logs**
73+
- Support one logger per major component.
74+
- Add optional JSON log output, configurable via YAML.
75+
76+
6. **Time zone in timestamps**
77+
- Ensure log timestamps include time zone info, configurable via YAML.
78+
79+
7. **Async logging with queue handler**
80+
- Implement non-blocking, async logging using a queue handler for file and/or console logs.
81+
82+
8. **Per-application log files (future)**
83+
- Design config and code to allow separate log files per app, but implement later.
84+
85+
9. **Unit tests for all modules**
86+
- Add tests for config loading, logger setup, and session management in the tests/ folder.
87+
88+
---

pyproject.toml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ classifiers = [
2929
]
3030
dependencies = [
3131
"numpy>=2.2.6",
32+
"omegaconf>=2.3.0",
33+
"pandas>=2.3.3",
34+
"python-json-logger>=4.0.0",
35+
"pyyaml>=6.0.3",
3236
"toml"
3337
]
3438
description = "This is session handler, configuration and logging handler for Saezlab packages and applications."
@@ -49,7 +53,8 @@ dev = [
4953
"distlib",
5054
"pre-commit",
5155
"bump2version",
52-
"twine"
56+
"twine",
57+
"ipykernel"
5358
]
5459
docs = [
5560
"mkdocs-material>=9.6.14",
@@ -59,6 +64,9 @@ docs = [
5964
security = [
6065
"bandit"
6166
]
67+
semantic = [
68+
"rdflib>=6.0.0"
69+
]
6270
tests = [
6371
"pytest>=6.0",
6472
"pytest-cov",

saezlab_core/config.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
import os
2+
3+
from omegaconf import OmegaConf, DictConfig
4+
5+
__all__ = [
6+
'ConfigLoader',
7+
]
8+
9+
10+
class ConfigLoader:
11+
"""Loader for YAML configuration files with merging and priority logic.
12+
13+
This class provides a static method to load and merge configuration files from
14+
package defaults, user directory, working directory, and an explicit path, returning
15+
a single merged DictConfig object.
16+
"""
17+
18+
@staticmethod
19+
def load(
20+
config_path: str,
21+
search_paths: list[str] | None = None,
22+
default_config: str | None = None,
23+
) -> DictConfig:
24+
"""Loads and merges YAML configs in priority order.
25+
26+
1. Package default (if provided)
27+
2. User config in home dir (if exists)
28+
3. Config in working dir (if exists)
29+
4. Explicit config_path (highest priority)
30+
31+
Returns:
32+
DictConfig: The merged configuration object.
33+
"""
34+
configs = []
35+
# 1. Package default
36+
if default_config and os.path.exists(default_config):
37+
configs.append(OmegaConf.load(default_config))
38+
# 2. User config
39+
user_config = os.path.expanduser('~/.saezlab_core.yaml')
40+
if os.path.exists(user_config):
41+
configs.append(OmegaConf.load(user_config))
42+
# 3. Working dir config
43+
cwd_config = os.path.join(os.getcwd(), 'saezlab_core.yaml')
44+
if os.path.exists(cwd_config):
45+
configs.append(OmegaConf.load(cwd_config))
46+
# 4. Explicit config_path
47+
if config_path and os.path.exists(config_path):
48+
configs.append(OmegaConf.load(config_path))
49+
# Merge all configs (later overrides earlier)
50+
if configs:
51+
merged = OmegaConf.merge(*configs)
52+
else:
53+
merged = OmegaConf.create({})
54+
return merged

saezlab_core/logger.py

Lines changed: 114 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,140 @@
1+
import os
12
import sys
2-
from typing import Optional
33
import logging
4-
from datetime import datetime
5-
from logging.handlers import RotatingFileHandler
4+
from logging.handlers import QueueHandler, QueueListener, RotatingFileHandler
5+
6+
from pythonjsonlogger import jsonlogger
67

78
__all__ = [
8-
'DATE_FORMAT',
9-
'LOG_FORMAT',
109
'get_logger',
11-
'get_timestamped_log_path',
1210
'setup_logging',
11+
'stop_async_listener',
1312
]
1413

15-
# Default format as per the requirements, adding levelname and module name for context.
16-
LOG_FORMAT = '[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s'
17-
DATE_FORMAT = '%Y-%m-%d %H:%M:%S'
14+
_listener = None # Global reference for QueueListener
1815

1916

20-
def setup_logging(
21-
level: int = logging.INFO,
22-
log_file: Optional[str] = None,
23-
max_bytes: int = 10 * 1024 * 1024, # 10 MB
24-
backup_count: int = 5,
25-
):
26-
"""Configures the root logger for the application with log rotation.
17+
def setup_logging(config: dict) -> None:
18+
"""Set up logging using a DictConfig (OmegaConf) or dict.
2719
28-
This sets up handlers for console and optional file logging.
29-
It uses a standardized format for all log messages and rotates
30-
log files when they reach a specified size.
20+
Supports rotation, timestamped files, console+file handlers, logger exclusion, and optional JSON logs.
3121
32-
:param level: The minimum logging level to capture (e.g., logging.INFO).
33-
:param log_file: Optional path to a file for log output.
34-
:param max_bytes: The maximum size in bytes for a log file before it is rotated.
35-
:param backup_count: The number of backup log files to keep.
22+
Args:
23+
config (dict): Logging configuration dictionary or DictConfig.
3624
"""
37-
formatter = logging.Formatter(LOG_FORMAT, datefmt=DATE_FORMAT)
38-
25+
# Support both DictConfig and dict
26+
cfg = config if isinstance(config, dict) else dict(config)
27+
log_dir = cfg.get('log_dir', './log')
28+
app_name = cfg.get('app_name', 'saezlab_core')
29+
log_level = getattr(logging, cfg.get('level', 'INFO').upper(), logging.INFO)
30+
log_format = cfg.get(
31+
'format', '[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s'
32+
)
33+
# Support max_megabytes (preferred) or fallback to max_bytes for backward compatibility
34+
if 'max_megabytes' in cfg:
35+
max_bytes = int(cfg.get('max_megabytes', 10)) * 1024 * 1024
36+
else:
37+
max_bytes = cfg.get('max_bytes', 10 * 1024 * 1024)
38+
backup_count = cfg.get('backup_count', 5)
39+
timestamp = cfg.get('timestamp')
40+
use_json = cfg.get('json_logs', False)
41+
timezone = cfg.get('timezone', 'UTC')
42+
import queue
43+
from datetime import datetime
44+
45+
try:
46+
from zoneinfo import ZoneInfo
47+
48+
tzinfo = ZoneInfo(timezone)
49+
except ImportError:
50+
# For Python <3.9, fallback to UTC
51+
import pytz
52+
53+
tzinfo = pytz.timezone(timezone) if timezone != 'UTC' else None
54+
if not timestamp:
55+
timestamp = datetime.now(tz=tzinfo).strftime('%Y-%m-%d')
56+
async_logging = cfg.get('async_logging', False)
57+
if not os.path.exists(log_dir):
58+
os.makedirs(log_dir, exist_ok=True)
59+
log_file = os.path.join(log_dir, f'{app_name}_{timestamp}.log')
60+
61+
# Custom formatter to inject timezone-aware asctime
62+
class TZFormatter(logging.Formatter):
63+
def __init__(
64+
self,
65+
fmt: str | None = None,
66+
datefmt: str | None = None,
67+
tz: object = None,
68+
) -> None:
69+
super().__init__(fmt=fmt, datefmt=datefmt)
70+
self.tz = tz
71+
72+
def formatTime(
73+
self, record: logging.LogRecord, datefmt: str | None = None
74+
) -> str:
75+
dt = datetime.fromtimestamp(record.created, tz=self.tz)
76+
if datefmt:
77+
return dt.strftime(datefmt)
78+
return dt.isoformat()
79+
80+
if use_json:
81+
json_format = '%(asctime)s %(levelname)s %(name)s %(message)s'
82+
formatter = jsonlogger.JsonFormatter(json_format)
83+
formatter.formatTime = (
84+
lambda record, datefmt=None: TZFormatter().formatTime(
85+
record, datefmt
86+
)
87+
)
88+
else:
89+
formatter = TZFormatter(log_format, tz=tzinfo)
3990
handlers = []
40-
41-
# Console handler
4291
console_handler = logging.StreamHandler(sys.stdout)
4392
console_handler.setFormatter(formatter)
44-
handlers.append(console_handler)
45-
46-
# Rotating file handler (if a path is provided)
47-
if log_file:
48-
# Use RotatingFileHandler for automatic log rotation.
49-
file_handler = RotatingFileHandler(
50-
log_file, maxBytes=max_bytes, backupCount=backup_count
51-
)
52-
file_handler.setFormatter(formatter)
53-
handlers.append(file_handler)
54-
55-
# The `force=True` argument removes any existing handlers
56-
# on the root logger, ensuring our configuration is the only one.
57-
logging.basicConfig(level=level, handlers=handlers, force=True)
58-
59-
# Set up a hook for unhandled exceptions to be logged automatically.
60-
# This addresses the "Log traceback" requirement.
61-
def handle_exception(exc_type, exc_value, exc_traceback):
62-
if issubclass(exc_type, KeyboardInterrupt):
63-
sys.__excepthook__(exc_type, exc_value, exc_traceback)
64-
return
65-
logging.getLogger().critical(
66-
'Unhandled exception', exc_info=(exc_type, exc_value, exc_traceback)
93+
file_handler = RotatingFileHandler(
94+
log_file, maxBytes=max_bytes, backupCount=backup_count
95+
)
96+
file_handler.setFormatter(formatter)
97+
98+
global _listener
99+
if async_logging:
100+
log_queue = queue.Queue(-1)
101+
queue_handler = QueueHandler(log_queue)
102+
handlers = [queue_handler]
103+
_listener = QueueListener(log_queue, console_handler, file_handler)
104+
_listener.start()
105+
else:
106+
handlers = [console_handler, file_handler]
107+
108+
logging.basicConfig(level=log_level, handlers=handlers, force=True)
109+
110+
# Exclude or set log level for specified loggers
111+
exclude_loggers = cfg.get('exclude_loggers', [])
112+
for logger_name in exclude_loggers:
113+
logger = logging.getLogger(logger_name)
114+
logger.setLevel(logging.WARNING)
115+
logger.propagate = (
116+
False # Prevents messages from being passed to the root logger
67117
)
68118

69-
sys.excepthook = handle_exception
70119

120+
def stop_async_listener() -> None:
121+
"""Stop the async QueueListener if running (flushes all logs)."""
122+
global _listener
123+
if _listener is not None:
124+
_listener.stop()
125+
_listener = None
71126

72-
def get_timestamped_log_path(log_dir: str, app_name: str) -> str:
73-
"""Generates a timestamped log file path.
74127

75-
:param log_dir: The directory where the log file should be stored.
76-
:param app_name: The base name for the log file.
77-
:return: A string representing the full path to the log file.
78-
"""
79-
timestamp = datetime.now().strftime('%Y-%m-%d')
80-
return f'{log_dir}/{app_name}_{timestamp}.log'
128+
def get_logger(name: str) -> logging.Logger:
129+
"""Get a logger for a given component/module.
81130
131+
Usage:
132+
log = get_logger(__name__) or get_logger('my_component').
82133
83-
def get_logger(name: str) -> logging.Logger:
84-
"""Returns a logger instance for the given name.
134+
Args:
135+
name (str): The logger name (usually __name__ or a component name).
85136
86-
This is the primary function that developers will use to get a
87-
pre-configured logger within their modules.
137+
Returns:
138+
logging.Logger: The logger instance.
88139
"""
89140
return logging.getLogger(name)

0 commit comments

Comments
 (0)