observability: implement structured logger and EVENTS registry#106
Merged
Conversation
Add geno_lewm/observability.py implementing the JSONL structured-log contract from docs/spec/05-observability.md and RFC-0013: - EVENTS: immutable registry of EventSpec(name, severity, summary) rows covering every event named in the canonical v0.1 table. - LogRecord: dataclass carrying the spec-required fields (ts, severity, event, run_id, component, data) plus the standardized optional fields (step, epoch, phase, duration_ms, trace_id, span_id, error_code). to_dict() emits the stable wire shape; the optional fields are omitted when None. - get_logger(component, run_id, log_dir, level, pretty): cached factory returning a thread-safe GenoLeWMLogger bound to a shared per-run sink. Defaults respect GENO_LEWM_LOG_DIR / GENO_LEWM_LOG_LEVEL / GENO_LEWM_LOG_FORMAT, and falls back to ~/.geno-lewm/logs. - JSONL sink: line-buffered, append-only, mkdir -p the run dir. Concurrent components writing under the same run_id share one ordered stream guarded by a lock. - Pretty stderr formatter: auto-enabled when stderr is a TTY (or GENO_LEWM_LOG_FORMAT=pretty); jsonl otherwise. - Trace context: contextvar pair plus set_trace_context() block. trace_id / span_id are attached to records iff they are set. - logged_run(): context manager that flushes the sink on any exception (records survive a crash) and, for GenoLeWMError subclasses, emits a final ``error`` record carrying the typed error_code before re-raising. Tests in tests/unit/test_observability.py cover record shape, the canonical event coverage, the ISO-8601-ms timestamp format, severity threshold filtering, standardized-field promotion, trace context attach/absent, error_code propagation, factory caching, default log-dir resolution, run-id uniqueness, crash survival (both typed and untyped exceptions), set_level validation (raises InputError), thread-safety smoke (4 threads × 50 writes), and JSON-isolation of the data field. Closes #23
3 tasks
This was referenced May 20, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
RFC-0013 /
docs/spec/05-observability.mddefine a JSONL structured-log contract that every other subsystem references (training metrics, eval book-ends, cache hit/miss, attestation verify, error events). Nothing downstream can take a hard dependency on event names until theEVENTSregistry and the per-run JSONL sink land.Solution
geno_lewm/observability.pyEVENTS: tuple ofEventSpec(name, severity, summary)covering all 22 canonical v0.1 events. Renaming a name is a MAJOR change. Duplicate-detection at import.LogRecord: dataclass carrying every spec-required field; standardized optional fields (step,epoch,phase,duration_ms,trace_id,span_id,error_code) are emitted only when non-None.get_logger(component, run_id?, log_dir?, level?, pretty?): cached factory; identical args return the same instance, so independent subsystems share a single ordered stream per run.${GENO_LEWM_LOG_DIR}/{run_id}.jsonl, falling back to~/.geno-lewm/logs/. Concurrent threads safe.GENO_LEWM_LOG_FORMAT=pretty); jsonl otherwise.set_trace_context()block; IDs attach only when set.logged_run(): context manager that flushes the sink on any exception. ForGenoLeWMErrorsubclasses, emits a finalevent="error"record carrying the typederror_codebefore re-raising (INV-OBS-6).tests/unit/test_observability.py(20 cases):Ztimestamp; severity threshold; standardized-field promotion; trace context present/absent;error_codeonly when supplied; factory caching; env-var log-dir resolution; unique default run_id; JSONL path; book-end events; crash survival (typed + untyped exceptions);set_levelrejection raisesInputError; thread-safety smoke (4 threads × 50 writes); JSON isolation ofdata.Validation
The error linter from #22 immediately caught two
RuntimeError/ValueErrorraises in this module's first draft. Both replaced withInvariantViolation/InputError(the public surface intentionally takes the typed-error contract — see RFC-0012).Caveats / out of scope (deferred to follow-ups)
GenoLeWMLogger._emitbetween record construction and JSON serialization.registered_event_nameAST linter (observability: AST linterregistered_event_nameandregistered_metric_name#27): at runtime unknown event names are still emitted (no crash); the linter is what closes INV-OBS-1.GenoLeWMLogger._log(level threshold today) and will extend when training paths use it.Closes #23