Structured pipeline diagnostics with PipelineM, phase timing, and --metrics JSONL#1093
Conversation
3b069cf to
04b30cc
Compare
tautschnig
left a comment
There was a problem hiding this comment.
verboseWarnings : Bool := false is a hardcoded constant — there's no way for users to enable verbose warnings without editing source code. This should either be:
- A CLI flag (e.g.,
--verbose-pyspec-warnings) passed through topythonAndSpecToLaurel, or - Controlled by the existing
--verboseflag that's already available in the pipeline
The existing quiet parameter already suppresses the count line. Adding a verbose parameter (or reusing the one from the caller) would be cleaner than a dead constant.
59340c0 to
50adc34
Compare
|
@keyboardDrummer-bot Please resolve git conflicts. |
tautschnig
left a comment
There was a problem hiding this comment.
Main concerns
- Real regression in
pyResolveOverloadsCommand(see inline onStrataMain.lean:943). The oldEIO String-returningreadDispatchOverloadssurfaced dispatch-file read failures viaexitFailure. The newBaseIO (OverloadTable × PipelineState)version emits a fatal.pySpecReadErrorinto the state and returns an empty table — but the command tuple-destructures(overloads, _), throwing the state away. A broken dispatch file now silently produces an empty overload table and a zero-module stdout. shouldAbortis a load-bearing invariant that the type system doesn't enforce.buildPySpecLaurelandreadDispatchOverloadsreturn(α × PipelineState)withαconstructed viaInhabited.defaulton fatal paths (return defaultinresolveAndBuildLaurelPrelude). Callers that forget to checkshouldAbortsilently proceed with empty/zero results —pyResolveOverloadsCommandis one case today. It would be worth either (a) returning a sum (PipelineRun α := .ok α | .aborted (msgs : Array PipelineMessage)), or (b) wrappingBaseIO (α × PipelineState)in a helperrunPipelineOrThrow : … → EIO String αfor callers that want legacy semantics, and documenting loudly in the structure docstring that aPySpecLaurelResultreturned alongsideshouldAbort = trueis meaningless. See inline.- Duplicate warnings for invalid module names.
resolveModuleEntryemits.invalidModuleNameon parse failure and returnsnone;resolveAndBuildLaurelPreludethen emits.missingDispatchModule/.missingExplicitPySpecfor everynone. So an invalid name in--dispatchyields two messages (one internalWarning + one configurationError) where one is enough. See inline. LT/Ordinstances disagree onPhaseandMessageKind. Both types deriveOrd(lex over all fields) and hand-rollLT(lex over only some fields). OnMessageKindthe hand-rolledLTignores theimpactfield entirely, socompare a b = .eqcan coexist witha ≠ b. Theqsortusage intoSummaryJsonuses the hand-rolledLT, which is fine, but the two orderings should agree. Suggest dropping the derivedOrd(or dropping the hand-rolledLTand usingltOfOrd).MessageImpact.isFatalhas no proof/test coverage and the(impact, isFatal)contract lives in three places. The mapping impact→fatal is encoded in: theisFatalfunction, the impact field of eachMessageKindconstant, and readers ofisFatalinemitMessage. Adding a new impact level is a three-file change with no compiler help. At minimum, please add a#guard-level lock-in (see inline) and/or a small theorem bank (also inline).
Testing / proof suggestions
emitMessage/addMessagesbehavioral locks (small state-threading theorems — seePipelineMessages.leaninline).- Snapshot / roundtrip for
PipelineMessage.toSummaryJsonso the external JSON schema can't drift silently (inline). - Regression test for the
pyResolveOverloadsCommanddispatch-file-read failure path (bullet #1). - Regression test that a bad
--dispatch <modname>produces one message and an exit with a singleconfigurationError, not two overlapping warnings (bullet #3). - Small theorems on
PhaseandMessageKindorderings (∀ a b, a < b ↔ compare a b = .ltto lock in LT/Ord agreement).
Refactoring notes
mergeOverloads: the oldOption-basedparamName <|> n.paramNamesemantics are now gone (the field isString), which is an improvement; the new code does emit.overloadParamNameDisagreementwhen two registrations conflict, which is great.PipelineM:StateT PipelineState BaseIOmeans anyBaseIOeffect can be mixed in. In practice onlyreadDDMescapes into it. Consider narrowing (e.g.ExceptT Abort (StateT … BaseIO)) so the abort control flow is visible in types andreturn defaultisn't needed.- Consider consolidating
MessageImpactandPipelineError(inPySpecPipeline.lean): today thePipelineError.userCodebranch exits with code 1 (hard fatal) whileMessageImpact.userCodeIssueis non-fatal. The semantic gap is defensible but undocumented; a comment explaining why the two vocabularies exist would help.
1e3265b to
420ba51
Compare
tautschnig
left a comment
There was a problem hiding this comment.
Substantial pipeline refactor. The PR delivers what the description promises: PipelineContext + PipelineM with proper diagnostic accumulation, withPhase / withRepeatedPhase replacing ad-hoc profileStep, streaming JSONL metrics via --metrics, fatal-abort via EIO Unit, and cleaner module resolution. lake build passes. The structural-output-survives-timeout property is genuinely useful.
1. The "impact drives behavior" claim in the PR description isn't enforced by code. The PR body's table says:
| Impact | Fatal? |
| configurationError | yes |
| internalError | yes |
| internalWarning | no |
| knownLimitation | no |
| userCodeIssue | no |
But whether a message aborts depends entirely on whether the caller uses emitFatalMessage vs emitMessage (see Messages.lean:461-481). Nothing in PipelineM enforces the mapping:
emitFatalMessage .overloadResolveWarning msg …would throw despite the.internalWarningimpact.emitMessage .internalError msg …would accumulate a fatal-impact message without aborting.
Today's in-tree callers are consistent with the table, but the invariant is tribal knowledge. Two low-cost options:
- Derive behavior from impact — make
emitMessagecheckif kind.impact.isFatal then throw ()and dropemitFatalMessageentirely. This makes the table the contract. - Retain both but guard —
emitFatalMessagepanic!s on non-fatal impact;emitMessagepanic!s on fatal impact. Keeps the current call-site readability; enforces the invariant.
Related: MessageImpact.isError (Messages.lean:73) returns true for .userCodeIssue but the PR body's table marks it non-fatal. isError here means "display as 'error' tag", not "abort the pipeline". Calling the predicate isError for a display concern and leaving the abort concern nameless is confusing — consider displayAsError + isFatal as two explicit predicates.
2. PipelineM = ReaderT PipelineContext (EIO Unit) throws () on abort. The unit-typed error channel means any caller doing try … catch e => … gets zero information from e — they have to go read ctx.messagesRef to reconstruct what happened. Two suggestions:
- Throw a typed error (
PipelineAbortwith a reason or message ref) socatchis self-documenting. - Or document the
()-throwing convention explicitly in thePipelineMdocstring, including the required "inspectmessagesRefon abort" recovery pattern. Currently the docstring says "throws()to abort" but not how a caller should react.
3. metricsHandle is never explicitly closed. StrataMain.lean:640 opens the file, threads it through PipelineContext, but nothing calls h.close / withFile / equivalent. Lean's GC will finalize the handle, but:
- If the
outcomerecord emission atStrataMain.lean:651-657fails partway through, the handle is leaked. - For long-running pipelines with the process still live, the outcome record may be buffered (even with per-record flush — the OS page cache is a separate layer).
- No guarantee the OS flushes before a
SIGKILL.
The PR body claims "all records written up to that point remain on disk" — per-record h.flush at Messages.lean:304 does flush, which is fflush() equivalent; on most systems that reaches the OS buffer but not the disk (that's fsync). For timeout resilience, fsync on each record is expensive but correct. Worth a docstring note clarifying the guarantee, and wrapping the handle in IO.FS.withFile-style resource management. Inline on PyAnalyzeLaurel.lean:41.
4. Thread safety: PipelineContext uses plain IO.Refs. If the pipeline ever parallelizes (e.g., parallel per-obligation SMT), messagesRef, timingRef, repeatedPhasesRef, currentPhaseRef will race. Not a problem today, but the design choice should be stated in the docstring ("single-threaded; adopt IO.Mutex before parallelizing"). Inline on Messages.lean:255.
5. withPhase's finally may emit multiple [end] lines if the action itself called withPhase but threw before its own finally ran. The finally in withPhase (Messages.lean:374-382) saves and restores the parent phase; but if a nested withPhase is interrupted by a throw before its own finally, the state is in a partial save-restore pattern. The outer finally restores to the outermost parent, which should be correct — but worth a test case that throws mid-nested-phase to confirm [end] lines match [start] lines in the profile output.
6. Test coverage is thin for the new infrastructure. The existing tests (AnalyzeLaurelTest.lean, ToLaurelTest.lean) exercise the happy path and a few error paths through the new PipelineM. Not exercised:
- Streaming JSONL output when the process is interrupted (can't be tested easily, but a fixture that verifies the JSONL is well-formed on normal exit would catch structural regressions).
- Multiple fatal errors accumulating before abort (the PR body notes "multiple error-impact messages may accumulate before or across aborts" but no test pins this down).
withRepeatedPhaseaggregation (count + total) — the test suite calls through to the verifier but doesn't assert on the aggregated output.- Error path in
pyResolveOverloads(T1 says a negative test was added; I seeToLaurelTest.leangrew by 250 lines — worth confirming the happy-and-error-path coverage is balanced).
Proof-coverage suggestions (each small):
/-- `emitFatalMessage` always throws. -/
theorem emitFatalMessage_throws (kind : MessageKind) (msg : String) (file : System.FilePath) :
(emitFatalMessage kind msg file).run ctx = .error () ∧
(∃ msgs', ctx.messagesRef.get = pure msgs' ∧ msgs'.size > 0)
/-- Metric file accumulates one record per `emitMetric` call. -/
theorem emitMetric_appends_line (ctx : PipelineContext) (j : Lean.Json) :
ctx.metricsHandle.isSome →
(emitMetric ctx j).run = append j.compress ++ "\n" to file
/-- `withPhase` restores current phase regardless of whether action threw. -/
theorem withPhase_restores_phase (ctx : PipelineContext) (name : String) (action : PipelineM α) :
let before ← ctx.currentPhaseRef.get
let _ ← try (withPhase name action).run ctx catch _ => pure ()
ctx.currentPhaseRef.get = beforeThe third is the one I'd most want — it's the exception-safety invariant of withPhase, and without it, a throw inside the action can leave the phase hierarchy desynchronized, producing unmatched [start]/[end] in profile output.
Refactor callouts:
MessageKindblock (Messages.lean:102-215) has 30+ constants, each{ category := "…", impact := .X }. My earlier macro suggestion (T8) was declined and I'll drop it — but worth a#guardat the bottom of the block checking the category string matches the constant name for each (prevents{ category := "unsupportdUnion", … }-style typos).Messages.lean:258(private mk ::) + severalprivate fieldRef :fields — good encapsulation, but thecreateconstructor now has 9 fields. Consider.mkDefaultwith the common shape and individual setters for the rare overrides.withPhase/withRepeatedPhase/emitMessage/emitFatalMessage— four entry points with similar shapes. APipelineActiontypeclass or a singlerunPipelinecombinator that takes the phase/message/abort intent explicitly would consolidate.
aqjune-aws
left a comment
There was a problem hiding this comment.
Approving since the nested [profile] feature works now and lake exe strata verify --profile Examples/HeapReasoning.core.st prints a reasonable output
MikaelMayer
left a comment
There was a problem hiding this comment.
🤖🔍 All previous comments appear addressed — reviewer sign-off still needed.
MikaelMayer
left a comment
There was a problem hiding this comment.
🤖🔍 All previous comments appear addressed — reviewer sign-off still needed.
MikaelMayer
left a comment
There was a problem hiding this comment.
Just one comment and I'm good !
shigoel
left a comment
There was a problem hiding this comment.
Minor comment on mergeOverloads semantics.
# Conflicts: # Strata/Languages/Python/PySpecPipeline.lean
MikaelMayer
left a comment
There was a problem hiding this comment.
🤖🔍 All previous comments appear addressed — reviewer sign-off still needed.
--verboseLaurel/Core program dumps: restored inb53969b2f- Unused
MessageKindplaceholder constants: removed in39bd5c0a8(used ones moved toLanguages/Python/Specs/MessageKind.lean) - Latent index bug (
timingRef): already resolved in earlier commit --skip-verificationbehavior anduserCodeIssuefatality: design observations, unchanged and not blockers
Motivation
The pipeline previously had no unified infrastructure for diagnostics or
timing. Warnings were handled ad-hoc (some printed to stderr, some silently
dropped), phase timing was manual and inconsistent, and the only structured
output (
--warning-summary) wrote a single JSON blob at the end — uselessif the process was killed by a timeout. There was also no distinction between
fatal and non-fatal errors, so callers couldn't tell whether a result was
trustworthy.
This PR introduces
PipelineContext/PipelineMas a shared backbone thatgives every pipeline phase consistent behavior: categorized diagnostics,
nested phase timing, fatal-error abort, and streaming metrics output that
survives timeouts.
Summary
PipelineContextandPipelineMfor structured diagnosticaccumulation and phase timing across the entire pipeline
profileStep/startPhasewithwithPhasecombinator thathandles nesting, timing, and profile output via
try/finallywithRepeatedPhasefor iteration-level timing (preprocess, smtEncode,solver) that accumulates count+duration and reports as
[profile]lines--warning-summary(single JSON blob) with--metrics <file>(streaming JSONL, flushed per-record for timeout resilience)
--verboseprogram dumps: prints intermediate Laurel and Core programswith
---- BEGIN/END ----delimiters for programmatic parsingunified
resolveModulesfor dispatch, auto-resolved, and explicit modulespyResolveOverloadsto print errors from pipeline context on failureMessages.leaninto data types (Messages.lean) and runtimeinfrastructure (
Context.lean)MessageKindconstants toLanguages/Python/Specs/MessageKind.leanPipelineContextandPipelineMPipelineContextis a structure carrying immutable config (outputMode,profilePipeline) plus mutable state as individualIO.Reffields:messagesRef— accumulatedPipelineMessagediagnosticstoolErrorsRef/userCodeErrorsRef— bucketed by impact for fast accesscurrentPhaseRef— current position in the phase hierarchy (managed viapush/pop)repeatedDepthRef— nesting depth ofwithRepeatedPhasescopes (when > 0,withPhaseaggregates silently)phaseStateRef— per-phase scoped state bundling repeated-phase aggregationand message counts (saved/restored atomically on phase entry/exit)
metricsHandle— optional file handle for streaming JSONL metricsBecause all state is in
IO.Refs, any monad withBaseIOaccess can usepipeline capabilities by passing a
PipelineContextvalue directly —withPhaseis polymorphic over the monad ([MonadLiftT BaseIO m]).PipelineMis defined as:The
EIO Unitbase ensures that fatal errors (configuration errors,internal errors, user code issues) are impossible to silently ignore — they
throw
(), aborting the pipeline. Non-fatal diagnostics (warnings, knownlimitations) are added to
messagesRefand execution continues. Callersthat want to attempt recovery (e.g., to gather more diagnostics) can catch
the exception, but should rethrow to avoid proceeding with invalid state.
The top-level runner catches the exception and still returns all accumulated
messages — so the caller always gets complete diagnostics regardless of how
the pipeline terminated.
Phase tracking state machine
The phase system operates in two modes controlled by
repeatedDepthRef:withPhaseprints[start]/[end]inprofile mode and emits JSONL timing metrics.
withPhasesilently aggregates timinginto
phaseStateRef.repeatedPhases— no print, no individual metrics.withRepeatedPhaseincrementsrepeatedDepthRefon entry and decrementson exit.
withPhasenever changes the depth — it inherits the mode andbehaves accordingly.
Key invariants:
currentPhaseRefalways reflects the innermost active scope's full pathphaseStateRefis scoped: saved on entry, restored on exit (no leakage)phaseStateRef.repeatedPhasesonly--metrics <file>(replaces--warning-summary)Writes streaming JSONL (one JSON object per line, flushed immediately after
each record). If the process is killed mid-pipeline, all records should remain.
Record types:
timing— phase start/end in ms, with optional count for aggregated phasesdiagnostic— pipeline messages with phase, file, category, impact, locationoutcome— result/exit_code/total_ms, written last (absence = killed)Example output:
{"type":"timing","phase":"pythonAndSpecToLaurel","start_ms":0,"end_ms":97} {"type":"diagnostic","phase":"pythonAndSpecToLaurel.resolveAndBuildPrelude","file":"specs/builtins.ion","category":"unsupportedUnion","impact":"knownLimitation","message":"unsupported overload form for 'dict.update'"} {"type":"timing","phase":"verification.vcDischarge.solver","start_ms":0,"end_ms":1102,"count":12} {"type":"outcome","result":"verified","exit_code":0,"total_ms":1830}--verboseprogram dumpsWhen
outputMode == .verbose, the pipeline prints intermediate programs tostdout with delimiters for programmatic parsing:
Module resolution refactoring
resolveModuleEntrynow takes a parsedModuleName(not a string)resolveModuleshandles dispatch, auto-resolved, and explicit modulesinvalidModuleNamechanged frominternalWarningtoconfigurationError(fatal)missingDispatchModule+missingExplicitPySpecintomissingPySpecModuleerrors fired for the same input
pyResolveOverloadserror handlingWhen
readDispatchOverloadsencounters a fatal error (e.g., nonexistent file),the command now reads
pctx.messagesRefand prints all accumulated messagesto stderr before exiting with a non-zero code — instead of silently yielding
empty results.
Profile output format
--profileemits structured[start]/[end]/[profile]lines to stdout:Design: impact drives behavior
Each
MessageKindhas aMessageImpactthat determines pipeline behavior:configurationErrorinternalErroruserCodeIssueinternalWarningknownLimitationNote:
emitMessageAndAbortis the function that aborts — callers choosewhether to abort. A caller may use
emitMessage(continues) and checkisFatalat phase boundaries for custom abort strategies.Not addressed:
[statistics]outputThe
[statistics]lines (transform counters likeEvaluator.simulatedStmts,FilterProcedures.erasedProcedures) are orthogonal to phase timing. Theycurrently bypass
PipelineContext— printed via bareIO.printlninCore.verifyandStrataMainLib.lean— so they don't respect phase nestingor indentation. Integrating statistics into the phase system is future work.
Key files changed
Strata/Pipeline/Messages.lean— pure data types:Phase,MessageImpact,MessageKind,PipelineMessageStrata/Pipeline/Context.lean— runtime:PipelineContext,PipelineM,withPhase,withRepeatedPhase,emitMetric,OutputModeStrata/Languages/Python/Specs/MessageKind.lean— Python-specificMessageKindconstantsStrata/Pipeline/PyAnalyzeLaurel.lean— top-level pipeline orchestration,--verboseprogram dumps,metricsHandlethreadingStrata/Languages/Core/Verifier.lean— replaced manual timing withwithRepeatedPhaseStrata/Languages/Laurel/LaurelCompilationPipeline.lean— replacedprofileStepwithwithPhaseStrata/Languages/Python/PySpecPipeline.lean— fatal abort via throw, module resolution refactor, diagnostic metric emissionStrataMainLib.lean—--metricsflag, outcome record emission, error printing inpyResolveOverloadsStrataTest/Pipeline/PhaseTimingTest.lean— phase timing integration tests