Releases: sipyourdrink-ltd/bernstein
v2.7.0
v2.7.0
Released 2026-05-24.
This release focuses on making Bernstein's automation easier to verify: stricter release gates, a complete Sonar cleanup, deterministic skill authoring tools, and an opt-in maintainer-share telemetry path that stays off by default.
Highlights
- Skills are closer to end-to-end.
SKILL.mdmanifests now carry a versioned schema, and the CLI has deterministicskills init,skills test,skills diff, andskills benchcommands. Strict linting can block installs, and sandbox/sanitizer checks protect install-time execution. - Skill routing and local outcome reporting are more useful. Bernstein can build reproducible local helpfulness reports from the activation log, and deterministic routing tools make skill selection easier to inspect without model calls.
- Opt-in telemetry sharing is wired through a real maintainer-share sink. It is still off by default, requires explicit consent plus
BERNSTEIN_TELEMETRY_SHARE_ENDPOINT, uses the same redacted event schema, and signs shared receipts for offline verification. - Release and CI gates are harder to get wrong. The publish workflow now runs real release tests, checks protocol compatibility, asserts GitHub Release assets, reconciles PyPI/GitHub drift, and ties main-branch eligibility to an explicit SHA marker.
- Sonar is green. Coverage is reported from the CI shard set instead of one partial artifact, the tracker is down to zero open findings, and remaining hotspots were reviewed through a dedicated workflow.
- Several fail-closed paths were tightened. PR review uses trusted action code, issue decomposition has a narrower write boundary, plugin zip extraction is hardened, release attestation is enforced, and lineage/audit checks bind more verification to bytes on disk.
Install
pipx install --upgrade bernsteinPython packages and GitHub Release assets are published for 2.7.0.
The npm wrapper is a convenience distribution path and may lag the Python release while registry permissions are repaired.
Full changelog: v2.6.0...v2.7.0
v2.6.0
v2.6.0
Released 2026-05-22.
A large release. Highlights: bidirectional chat drivers with verifiable approvals, per-step replay with a hash-chained journal, operator-registered recurring goals, a signed supervisor surface, a skill catalog with signed manifests, image-attachment provenance, and a sharded CI test suite behind a native merge queue.
Chat and operator surfaces
- Slack bidirectional driver: drive a session, approve or reject a tool call, and watch streamed output from Slack. Every approval is recorded as a signed entry in the audit chain (covering approver, message timestamp, decision, and tool-call hash), approval scope is pinned to the worker's git worktree, and outbound messages carry an Ed25519 signature so a recipient can verify the workspace identity. Optional
bernstein[slack]extra. (#1794) - Discord bidirectional driver: the same attested-approval model as Slack, plus a per-channel scheduling fence so tasks emitted from one channel cannot land on workers bound to another. Optional
bernstein[discord]extra. (#1795) - Image attachment passthrough:
bernstein run "<prompt>" --attach ./shot.pngcarries an image to a vision-capable adapter (Claude, Gemini). The image's SHA-256 is recorded in the audit chain at decision time and anchored as a lineage parent of any artefact produced that turn; spawning with--attachon a non-multimodal adapter fails before any process launches. (#1797)
Orchestration
- Per-step session replay: each agent step is recorded in a hash-chained journal where
step_hash = H(prev_hash, input_hash, model, prompt, tool_call, tool_result).bernstein replay <agent-id>walks the chain,bernstein session fork <id> --from-step Nbranches a sibling worktree from a chosen step, and replay divergence surfaces as a precise hash mismatch rather than a flaky result. Exported receipts verify offline against the install public key. (#1799) - Operator-registered recurring goals:
bernstein schedule add --cron "<expr>" --goal "<text>"registers a recurring goal that fires inside a single installation, no host cron or external scheduler required. Each fire is a deterministic projection of(schedule_id, fire_time, last_state)onto a canonical task graph and is recorded in the audit chain, sobernstein schedule auditcan prove a nightly sequence ran exactly as expected. (#1798) - Operator supervisor surface:
bernstein supervisor statusaggregates the existing stall, watchdog, and respawn-budget detectors into one view. A detected stall produces a signed escalation receipt (last audit entries, identity tokens, structured reason, and a deterministic recommended action) that any verifier can check offline. (#1800) - Worktree GC reaps are now anchored to the audit chain: each reap appends a
worktree.reapevent capturing the pre-deletion git HEAD and a clean/dirty flag, and the reap is fail-closed (a worktree is not deleted if the reap cannot be recorded). (#1833) - Deterministic replay is now hermetic: a cache miss in replay mode raises a typed error and aborts instead of silently calling the live model, the replay key folds in provider, temperature, and max-tokens, and a coverage line reports hits, misses, and strict violations. A non-strict fall-through stays available behind an explicit, logged opt-in. (#1832)
Skills
- Skill catalog with signed manifest installs:
bernstein skills catalog browse|search|install|upgrade|info|status. Each install appends a signed audit-chain entry referencing the manifest URL and content digest, refuses unverified manifests by default, keeps a lockfile that stays consistent across parallel worktrees, and a CI lineage gate rejects a lockfile referencing an unknown manifest digest. (#1796) - Skill lifecycle CLI foundation: install, sync, lock, lint, watch, and a local activation log with an env-var opt-out. (#1734)
Evaluation and observability
- GlitchTip event ingester: a scraper turns open self-hosted error-tracker issues into one regression eval case each, deduped on the issue id, with administrative wiring-probe issues filtered out. The nightly real-run canary feeds this loop. (#1820)
- CI-failure post-mortem ingestion: a scraper walks merged PRs that needed fix-up commits and synthesizes regression eval cases, so the eval suite tracks the failure modes that surface first in CI. (#1793)
- Nightly real-run canary: a scheduled job runs real end-to-end flows (worker spawn, git worktree lifecycle, audit-chain append plus verify, signed lineage receipt) against a deterministic stub adapter, with no API key or network, and routes any failure to the telemetry sink. (#1822)
- Multi-adapter pentest fan-out:
bernstein eval pentest --adapters a,b,cruns one scenario across adapters and aggregates consensus on(canonical_vuln_type, normalized_path). Single-adapter behaviour stays byte-identical. (#1754) - Consolidated SonarQube findings tracker auto-rendered from the live Sonar API. (#1781)
- Terminal orchestration failures route to the configured error sink. (#1762)
- Per-row source-adapter provenance for the memory subsystem:
SQLiteMemoryStorecarries an optionalsource_adaptercolumn withadd_manyandquery(read_only_from_adapters=[...]), via an additive NULL-backfill migration. (#1759)
CI and quality infrastructure
- Merge-gate stack: pre-merge autosync regenerates mirror docs and formatting on PR branches, a main-red guard blocks merges while main is red, a nightly drift sweep opens a PR on accumulated drift, and the suite now responds to
merge_group:so a native merge queue tests each PR against the combined branch. Operator runbook atdocs/operations/merge-queue.md. (#1756) - The unit-test job is sharded across parallel runners (
scripts/run_tests.py --shard i/N), so the per-file isolated suite scales as the test count grows. (#1845) - Coverage ratchet: a monotonic gate keeps total coverage from regressing and nudges the per-PR diff-coverage floor up over time. (#1829)
- Static-analysis sweeper for Sonar findings, widened to MAJOR severity, turns open findings into backlog tickets keyed on the Sonar issue id. (#1763 / #1819)
- Unit tests are now hermetic: an autouse guard blocks real outbound network connections in
tests/unit/(loopback allowed), so a network-dependent unit test fails deterministically instead of flaking in CI. (#1856) - Opt-in telemetry foundation: consent CLI, a schema guard, and an off-by-default proof. (#1736)
Reliability and correctness fixes
- Deterministic replay no longer calls the live model on a cache miss (see Orchestration above). (#1832)
- HSM lineage kind fails fast at config-load when no real adapter is on the classpath; opt-in to the stub via
BERNSTEIN_ALLOW_HSM_STUB=1. (#1753) - MCP OAuth discovery metadata corrected; the Tier-3 cordon now catches deletions and renames. (#1755)
- Sensitive paths are no longer logged in clear text on the always-allow tamper path; the forensic record keeps the full value with pre-hashed digests. (#1814)
- Resolved SonarQube findings: S8413 router double-mount, S125 commented-code, S3516 invariant-return, S5754 broad-except, plus a refurb sweep across the new subsystems. (#1786 / #1787 / #1788 / #1807 / #1813 / #1814 / #1815 / #1817 / #1818)
- Post-CI dispatcher declares per-job permissions explicitly; its child-secret expectations were synced with the GlitchTip forward. (#1746 / #1801)
- The tampered-signature catalog test corrupts the leading signature byte so verification fails deterministically (a trailing-byte flip could be a no-op and let the test reach the network). (#1843)
- Workflow loader rejects
interactive: trueat config-load instead of crashing mid-run (closes #1110). (#1760) - Adapter dual-binary discovery handles the antigravity / gemini migration. (#1748 and follow-ups)
Internals and docs hygiene
- Dropped the unused
bernstein.benchmark.head_to_headmodule from the wheel. (#1767) - Reorganized the docs tree; internal working notes consolidated under
docs/_internal/. (#1768) - Front-page content, page metadata, and stale README translations refreshed. (#1769)
- Repository-wide formatting pass; AGENTS.md mirror set regenerated. (#1771 / #1776 / #1777)
v2.5.1
What this patch is
Five fix commits had landed on main without triggering a patch publish, because the post-CI dispatcher was silently failing at startup. A required secret was being referenced in a reusable-workflow call before the receiving side had a chance to declare it optional, and GitHub Actions resolves that block at call-start. One commit short of the fix unblocks the rest.
Fixes included
fix(security)(#1705) - 12 CodeQL / Semgrep findings resolved, 23 dismissed with technical justification.fix(observability)(#1713) - Sonar scan workflow no longer skips on cancelled upstream CI; direct push trigger added.fix(privacy)(#1718) - residual operator hostnames scrubbed from docs and from a PR-comment template.fix(routes)(#1726) - three sync subprocess calls in async routes converted; eight bare-excepts narrowed.fix(api)(#1727) - intermittent 500 onPOST /taskswas a real bug, not a Schemathesis flake. Validates at the request boundary now.
Dispatcher
Restored in #1730. After this, the next CI-green push computes the conventional-commits delta automatically.
Try it
pipx install --upgrade bernsteinSource: https://github.com/sipyourdrink-ltd/bernstein (Apache-2.0).
v2.5.0 - Interoperability surfaces, host portability, deterministic replay
A note on the voice
v2.4.0 was about observability surfaces, running the four backends through one umbrella so a code-scanning regression or a coverage drop surfaces in the same table as a GlitchTip spike. v2.5.0 is the next question over: now that the orchestrator can see itself, can the hosts an operator already runs see it too. And can it stop quietly phoning home to my private infrastructure when it does.
Interop, finally
The piece that kept blocking me on multi-host runs was the lack of a real handshake. Claude Desktop is one process, Claude Code is another, both can spawn agents, neither knew what the other had already decided. I shipped A2A capability cards (#1698): one process mints a signed manifest of what it can do, the other consumes it, verifies the signature against a trusted-issuer set, and refuses to delegate when the advertised policies do not meet the operator's required policies. The lineage chain rides through the same envelope so the audit trail does not break at the organisation boundary.
The MCP client got the matching upgrade (#1692). Upstream servers will return malformed responses, hang mid-stream, demand re-auth, lie about their capability manifest. The client now treats every upstream as untrusted: capability-card validation before a tool call, retry-with-continuation on dropped streams, in-flight cancellation that preserves partial output, per-server cost metering, schema-violation containment that marks a misbehaving server degraded for the rest of the task. None of this is exotic; it is the brittle-real-world posture that the larger MCP ecosystem will end up needing.
The MCP server side got prompt-catalogue plus OAuth-2 PKCE discovery metadata (#1696, #1709), so auto-discovering hosts that expect a real RFC 8414 / RFC 9728 surface stop skipping us.
bernstein desktop-register
bernstein desktop-register --host <name> (#1697, then #1708 added five more hosts) writes the host-specific config entry for Claude Desktop, Claude Code, Cursor, Continue, Cline, Zed, and Aider. One command. The orchestrator is a guest in the host's settings file; we ship the plugin, the host renders it. bernstein doctor --substrate reports which hosts have us registered, which do not, and which have a stale registration.
The honest disclaimer: if a host changes its plugin spec, the per-host adapter breaks. Each adapter is small enough that a host-spec change is a one-day fix, not a re-architecture.
I removed my private infrastructure from the shipped package
This one was a real silent bug, not a feature. The shipped wheel had errors.bernstein.run baked in as the GlitchTip DSN default, and telemetry.bernstein.run baked in as the telemetry endpoint default. Both backends soft-fail when their env vars are unset, so the package never actually reached out without consent. But the hostnames were sitting there as defaults, which is the kind of thing that turns into a real leak the day someone wires a config they did not read.
#1694 strips those defaults. tests/unit/observability/test_no_hardcoded_infra.py asserts zero operator-private host, IP, or DSN matches in src/ and will fail the build if a future change reintroduces one. Telemetry side-channel is now portable across hosts behind one Sentry-compatible BERNSTEIN_TELEMETRY_DSN (#1691) so each operator runs against their own backend, not mine.
Deterministic replay
Three small things compounded. Session ids are bound deterministically (#1684) so a replayed run reproduces its own event stream without colliding with a sibling. The supervisor enforces a bounded respawn budget and parks an agent when the budget is exhausted (#1683), instead of looping respawns indefinitely. On-disk state has a versioned migrations module (#1689) so an older .sdd/ upgrades predictably. Plus the cosmetic-but-real win: runs surface a memorable deterministic name (#1682) in user-facing output, so the operator can refer to "the brisk-sparrow run" instead of memorising a UUID.
The API stops returning 500 on a fuzzer-found bug
The TaskCreate.scope and complexity fields were typed as plain str with only a length cap. An empty or out-of-range value passed pydantic and then raised ValueError deep in the task store when the enum was constructed, surfacing as an unhandled 500 on POST /tasks and POST /tasks/batch. Schemathesis kept flagging it intermittently and everyone kept rerunning it as a flake. It was not a flake. #1700 validates at the request boundary and returns 422.
What I am not claiming
The two new transports are functional but not load-tested at adversarial scale; the OAuth-2 PKCE discovery surface ships metadata, full token issuance and OIDC federation are deferred to a follow-up. The substrate adapters cover seven hosts; Codex and Gemini CLI are stubbed by design until their respective plugin specs stabilise. The A2A integration honours the protocol as specified at the time of pickup and will need maintenance as the spec evolves.
Try it
pipx install --upgrade bernstein
bernstein interop a2a card --output card.json
bernstein desktop-register --host cursorFull per-PR notes in docs/release-notes/v2.5.0.md. Source: https://github.com/sipyourdrink-ltd/bernstein (Apache-2.0). 22 commits since v2.4.0.
v2.4.0
v2.4.0 - Observability surfaces, single-writer run state, declarative planning gates
Release date: 2026-05-20
Commits since v2.3.1: 33
Highlights
- Unified
bernstein doctor observeumbrella rolls the four observability backends (Sonar, GlitchTip, Dependency-Track, GitHub Code Scanning) into one aggregated table with delta-since-last-check, plus a per-PR sticky summary comment and a daily trends snapshot. Each backend soft-fails toSKIPPEDwhen its env vars are unset, so a fresh checkout stays green. - Single-writer
RunActorowns canonical per-session state behind one async event queue with a bounded replay buffer that emits an explicitGap{up_to_seq}marker on eviction, making reconnect-after-eviction observable instead of silently lossy. - Spec-quality gate refuses to advance a feature spec until a deterministic, library-only rule set passes; failures route through a bounded auto-fix loop and surface unresolved items to the operator rather than dispatching an implementer against a weak spec.
- Declarative task DAG: tasks gain
parallel_safeandstory_idfields, the backlog parser learns[T<id>] [P] [USn]markdown checkboxes,topological_iter_with_parallelyields ready batches honouring cycle detection, andbernstein plan dag/bernstein tasks dagrender the DAG with parallel batches highlighted; replaces the file-overlap heuristic for tasks that declare the flag while preserving the legacy heuristic as a fall-back. - Three-layer skill customization (BASE / TEAM / USER) under XDG paths with a per-field deterministic merge spec: scalars override, tables deep-merge, keyed arrays replace by name, unkeyed arrays append; missing layers fall through cleanly.
- Empirical-confidence ledger backs the model recommender: an append-only SQLite store of per-decision outcomes feeds a sample-size-gated query that prefers measured outcomes over the capability-tier heuristic and over the bandit arm, refusing to return a value below a documented threshold (default 5).
- Approval responses are now bound to a 16-byte server-minted single-use nonce; mismatches surface as
409 NONCE_MISMATCHand evicted replays as410 NONCE_EXPIRED, foreclosing stale-button replay on superseded prompts. - Canonical stream-signal vocabulary (
COMPLETED,FAILED,QUESTION,PLAN_DRAFT,PLAN_READY,BLOCKED) parseable from any wrapped CLI stdout so non-stream-json adapters surface lifecycle events through the same channel as native stream-json adapters. - CI hardening across the board: the Sonar scan consumes the existing coverage artifact via
workflow_run(andworkflow_dispatchbootstraps a coverage-bearing first scan), the review-bot-ack gate no longer cancels its own required check, the Schemathesis smoke timeout is widened to stop flaky cancellations, and the runtime Docker images are pinned back topython:3.13-slim. - Four refurb auto-fix waves (wave 4 plus clusters B / D / E) land about 320 mechanical idiom rewrites across
src/, taking FURB142 to zero and substantially reducing the FURB184 / FURB138 / FURB124 / FURB182 / FURB101 / FURB109 / FURB108 / FURB126 backlog.
What ships
Observability surfaces
- Unified
bernstein doctor observe(#1650). Umbrella command that runs each per-backend probe (Sonar, GlitchTip, Dependency-Track, GitHub Code Scanning) in order and renders one aggregated Rich table with metric, value, delta-since-last-check, threshold, and status columns. Supports--json(machine-readable) and--watch(re-runs every 60 seconds). Each backend soft-fails toSKIPPEDwhen its env vars are unset so the umbrella keeps running on a fresh checkout. Per-backend deltas are computed against a small snapshot cache at.sdd/observability/<backend>.json(suppressible via--no-persist). Thedt,code-scanning, andobserveClick commands are registered directly inbernstein.cli.mainso the wiring survives independent refactors ofadvanced_cmd.py. A per-PRpr-observability-summary.ymlworkflow posts a sticky Markdown comment rendered from the observe JSON, and a dailydocs-observability-snapshot.ymlcron (06:00 UTC) writesdocs/observability/snapshots/<date>.jsonand re-rendersdocs/observability/trends.mdvia a dependency-free unicode sparkline. Probe crash messages store only the exception type in persisted snapshots so tokens or URLs cannot leak. Docs atdocs/observability/unified-doctor.md. Tests attests/unit/cli/doctor/test_observe.pycover probe soft-fails, delta math, Click wiring, JSON shape, persistence toggle, and exit-code mapping. bernstein doctor sonar(#1648). New subcommand pulling project measures from a configured SonarQube server: coverage, code smells by severity, bugs, vulnerabilities, security hotspots, and cognitive-complexity hotspots. Rich-table or--jsonoutput. Soft-fails (exit 0) whenSONAR_HOST_URL/SONAR_TOKENare unset and prints a one-line hint atdocs/observability/sonar.md. Advisory baseline at$XDG_DATA_HOME/bernstein/sonar-baseline.jsonlets the parentbernstein doctorgroup nudge when open smells exceed the threshold or vulnerabilities regress. 28 hermetic tests viahttpx.MockTransport.bernstein doctor glitchtip(#1646). New subcommand pulling last-24h issue counts by severity, a 7-day trend, and the top unresolved issues from the configured GlitchTip server. Rich-table or--jsonoutput. Soft-fails whenBERNSTEIN_GLITCHTIP_TOKENis unset. Optional baseline cache at~/.local/share/bernstein/glitchtip-baseline.jsonpowers a nudge underbernstein doctor --suggest-docswhen the GlitchTip API reports new unresolved issues since the last check. 25 unit tests cover the fetcher, baseline persistence, nudge logic, Click wiring, and soft-fail behaviour.- Sticky PR Sonar comment (#1648). New
.github/workflows/sonar-pr-comment.ymlposts a sticky advisory PR comment with project-level Sonar measures. Soft signal only, never blocks merge. - Daily GlitchTip alert sweep (#1646). New
.github/workflows/glitchtip-insights.yml(06:30 UTC +workflow_dispatch) mirrors fatal-level GlitchTip issues into sticky GitHub issues labelledglitchtip-alert. The mirror auto-closes when the GlitchTip side resolves. Workflow now validates HTTP status on the resolved-issues fetch and runsgh issuesubprocesses withcheck=Trueso reconciliation failures fail the run instead of being swallowed.
Security
- Approval-nonce binding (#1642). Mints a 16-byte server-generated nonce per pending approval. The reply must echo the exact value or the gate refuses to resolve, foreclosing stale-button replay on superseded prompts and any path where the agent process could forge its own approval response.
core/approval/models:noncefield onPendingApproval(hex on the wire);to_dict(include_nonce=False)for adapter-facing serialisations; newApprovalNonceMismatch/ApprovalNonceExpirederrors.core/approval/queue:resolve()validates the supplied nonce in constant time. Server-internal callers (TTL evict,wait_fortimeout) keep the back-compat no-nonce path so they cannot deadlock.core/routes/approvals: HTTP reply now requires a nonce. Mismatches surface409 NONCE_MISMATCH. Replays against an evicted approval surface410 NONCE_EXPIRED. The live-fragment HTML threads the nonce through the button handlers.cli/commands/approval_cmd:approve-tool/reject-toolread the on-disk record and thread the nonce back throughresolve().- A missing
noncebody field defaults to an empty string at the schema layer so it flows through the handler and surfaces as409 NONCE_MISMATCHvia the existing_coerce_nonceguard, instead of being rejected at the Pydantic layer with422. - Closes #1619.
Reliability and runtime
- Single-writer
RunActor(#1641). Introduces a per-session actor that owns canonical run state. Mutations flow as typed events through one async queue. A pureapply_eventreducer applies them with monotonic seq numbers.ReplayBufferis a bounded ring (default 1024) that emits an explicitGap{up_to_seq}marker when a subscriber asks for an evicted range, so a reconnect-after-eviction is observable instead of silently corrupt. The approval gate gains an opt-insession_idkwarg that mirrors approval events into a registeredRunActorviarun_actor_registry. The file-driven decision contract is unchanged; the actor feed runs alongside. Migrating the remaining writers (worker subprocess, watchdog, lifecycle hooks,hooks_receiver) is a follow-up. Refs #1630. - Canonical stream-signal protocol (#1638). New
core/protocols/stream_signals.pydefines a small text-line vocabulary (COMPLETED,FAILED,QUESTION,PLAN_DRAFT,PLAN_READY,BLOCKED), a parser, a producer-side format helper, and conformance helpers.CLIAdaptergrows an optionalstream_signal_parserhook; the default delegates to the canonical parser, adapters override to map a native protocol onto the canonical vocabulary.ConformanceReportsurfaces missing terminal signals as a soft warning so adapters without canonical signals stay visible without failing. Tests cover parse, format round-trip, malformed-input resilience, concurrent multi-adapter parsing, terminal-signal check, default vs. override hook behaviour, plan, and question round-trip. Docs atdocs/adapters/stream_signals.mddescribe the vocabulary with shell and Python wrapper examples. Resolves #1632. - Declarative task DAG (#1655). Adds a declarative task DAG layer so the planner sets per-task parallel safety at task-generation time instead of having the scheduler infer it from file overlap. The
Taskschema gainsparallel_safe(defaultFalse) andstory_id(Optional[str]) with round-trip support inTask.from_dict. The backlog parser recognises the[T<id>] [P] [USn]markdown checkbox format and the matching YAML frontmatter keys. New `core/orchestration/task_dag....
v2.3.1
v2.3.1 - Maintenance
4 commits since v2.3.0. No new features, no breaking changes. Patch release covering correctness fixes, CI hardening, and follow-up on review-bot findings deferred during the v2.3.0 cycle.
Highlights
- Restore numeric and key coercions that the FURB123 auto-fix pass removed, plus 19 mechanical fixes from the 2026-05-19 review-bot catch-up.
- Soft-fail the cross-repo landing-mirror dispatch on PAT scope errors so the docs-drift pipeline keeps moving when the fine-grained PAT lacks
actions:writeon the landing repo. - Harden the CLI against malformed
GLITCHTIP_DSNand snapshot sidecars that fail schema validation. - Map
UrlSchemeErrorto the documented typed errors in MCP transports and the lineage-alert sink.
New features
None.
Fixes
- Refurb correctness fixup (#1615). Restored
int()/float()/str()coercions and added explicitisinstanceguards that the FURB refactor pass dropped. Eight bot-ack findings on the same PR addressed. - Landing-dispatch PAT scope (#1617). Capture the HTTP status from the cross-repo
workflow_dispatchcall and emit an operator-actionable warning annotation when the dispatch endpoint rejects the request. The job now exits 0 in all non-success cases instead of failing thetrigger-landing-mirrorjob and blocking the docs-drift pipeline on main. - GLITCHTIP_DSN crash on import (#1618). Wrap
sentry_sdk.initin a best-effort try/except so a malformed DSN cannot crash the CLI on import. - Snapshot-sidecar schema errors (#1618). Treat schema-invalid snapshot sidecars as unreadable metadata (return None and warn) instead of raising
KeyError/TypeError/ValueErrorthroughSnapshotStore.get/list. Reject negative--daysforbernstein git gcbefore constructing the store and before computing the cutoff. - MCP transport typed-error surface (#1618). Wrap
UrlSchemeErrorfromensure_http_urlasTransportErrorinSseTransport.connectandStreamableHttpTransport.connect. CatchUrlSchemeErrorinsink_from_configand fall back toNullAlertSink, preserving the "orchestrator never raises here" contract. - GitHub Projects adapter robustness (#1618). Catch
OSErroraround GitHub App private-key reads and raiseTrackerUnavailableso the typed error surface stays intact. In_item_to_ticket, skip items whosecontent.__typenameis not Issue / PullRequest / DraftIssue rather than emitting tickets with empty title / body / content-ids. Added regression test for the HTTP 403 abuse-detection ->RateLimitedmapping. - Bundle command input validation (#1618). Validate sign inputs as a pair and read the private key before assembling the bundle, so invalid CLI input never mutates on-disk state.
- Docs cleanups (#1618). Clarify the literal closing-fence string in the failure-taxonomy consumer-contract step. Prune scope reads as two distinct scopes in session-memory docs (episodic = session, semantic = task). Add
textlanguage tag to the post-CI-dispatcher sequence-diagram fence so markdownlint MD040 passes. Replace classic PAT scope strings with the fine-grained PAT permission model in the github-projects tracker doc.
Internal
- Refurb auto-fix wave 3 (#1615). Mechanical code-quality cleanup of remaining refurb findings across
src/andtests/. Rules auto-fixed: FURB123 (redundant cast removal, 147 sites via custom line-based rewriter), FURB138 (assign-empty-list + append loop -> list comprehension, 57 sites via libcst rewriter restricted to single-statement bodies withast.parseround-trip per file), FURB113 (repeated append ->list.extend, 5 leftovers viaruff check --select FURB113 --preview --unsafe-fixes --fix). Counts: FURB123 148 -> 0, FURB138 106 -> 49, FURB113 31 -> 26. Reformatted byruff format(3 files). Wave 3 of the bulk auto-fix work started in #1558 and continued in #1582. Skipped this round: FURB184 (2440 sites, still needs whole-function liveness analysis), FURB138 leftovers (49 sites with multi-statement bodies, guarded branches, or appends withcontinue/break), FURB108 (24 sites), FURB173 (2 sites). - Sonar-scan timeout and uv caching (#1616). Sonar scans were dying at the 20-minute job timeout during
uv sync --group dev+pytest --covon a 127-commit history. Raise the job-level budget to 60 minutes with per-step timeouts that fail fast on individual stages (sync 15m, coverage 30m, scan 10m). Pin the newastral-sh/setup-uvaction to v8.1.0 with cache enabled so subsequent runs reuse the dev environment. - GlitchTip setup doc (#1616). Added
docs/operations/glitchtip-setup.mdcovering DSN provisioning, env-var export, and end-to-end event verification on a single page. - SBOM scope (#1618). Generate the SBOM from an isolated venv where only the project and its resolved dependencies are installed, so the output reflects bernstein's dependency graph rather than the runner base image.
- Review-bot triage continuity (#1618). PR #1584 inventoried 48 review-bot findings and landed 4 mechanical fixes. This release picks up the remaining 44: 19 applied here, 14 already resolved on source PR branches before merge (CodeRabbit confirmed "Addressed in commits ..."), and 11 deferred for design judgement (config-schema changes, frozen-dataclass migrations, semantic-prune scope, stack indexing rework, worktree-aware git-dir resolution). Deferred items are tracked in
docs/review-bot/deferred-2026-05-19.mdso they are not lost. - Test typing and assertions (#1618). Annotated the
sarif_modulefixture return type asGenerator[ModuleType, None, None]intests/unit/scripts/test_sarif_drop_suppressed.py. Explicit assertion message whenEXPECTED_CHILD_SECRETSis missing an entry for a child intests/unit/test_post_ci_dispatcher_yaml.py. Parse the GraphQL request body as JSON and assert againstvariablesrather than byte-substring against rawrequest.contentin the github-projects adapter tests.
Deprecations
None.
Upgrade notes
- Drop-in upgrade from v2.3.0. No config-schema changes, no API changes, no audit-chain changes.
- Operators relying on the cross-repo
trigger-landing-mirrorjob should verify the fine-grained PAT hasactions:writeon the landing repo. If the scope is missing, the job will now emit a warning annotation instead of failing the workflow. - Operators using
bernstein git gc --days <N>should note that negative<N>is now rejected up front rather than mishandled insideSnapshotStore. - Operators using the GitHub App private-key path for the GitHub Projects tracker adapter will now see
TrackerUnavailableon filesystem errors instead of a rawOSError.
Acknowledgements
Thanks to the operators and reviewers who triaged the 2026-05-19 review-bot batch and to the CodeRabbit / Sourcery automation surface that surfaced the underlying findings.
v2.3.0
v2.3.0
127 commits since v2.2.0. The headline is the tracker-adapter family landing: 10 backlog-tracker adapters now ship under a single TrackerContract, plus webhook ingestion and a plugin hookspec for third-party tracker plugins. The orchestration loop also gained an issue-to-PR pipeline, a retry-with-continuation path for success-without-commit runs, and a multi-agent handoff message bus that piggybacks on tracker comments. The supporting workstreams (review-bot acknowledgement gate, signed lineage audit log, secrets broker, telemetry-grounded autofix, Playwright self-testing sandbox) close several long-standing reliability and security gaps.
Highlights
- Tracker-adapter family. 10 adapters land, all conforming to the single
TrackerContract(Jira Cloud + DC, GitLab Issues, Linear, Plane, Asana, ServiceNow, ClickUp, GitHub Projects v2, plus webhook ingestion). Closes the gap operators have hit when integrating non-GitHub backlogs. - Tracker plugin hookspec + registry + CLI. Third-party tracker integrations now plug in via the same pluggy spec the orchestrator uses internally (#1599).
- Issue -> plan-comment -> PR pipeline. New orchestration mode that walks a tracker issue through plan synthesis, plan-comment posting for human review, and PR creation in one path (#1600).
- Tracker comments as a multi-agent handoff bus. Worker agents now coordinate over tracker comments so a session can resume across CLI restarts and across operator machines (#1606).
- Review-bot acknowledgement gate. CodeRabbit and Sourcery findings classified as must-address now block merge until they are addressed in a fixup commit or acknowledged in the PR body with a structured marker. Nightly sweeper + reusable shepherd workflow template ship in the same PR (#1583).
- Lineage v2 - signed audit log of tracker state moves. Each tracker-side state transition is captured as a signed lineage entry, so operators can audit the full chain when a ticket loses or gains the wrong label (#1602).
- Playwright-based sandbox for UI/web agent runs. A new self-testing layer drives a Playwright context against the dev server, captures screenshots / console / network errors, and hands the structured result back to an LLM judge for verdict (#1603).
New features
| Area | Change |
|---|---|
| trackers | 10 adapters land under TrackerContract (Asana, ClickUp, GitHub Projects v2, GitLab Issues, Jira Cloud, Jira DC, Linear, Plane, ServiceNow, plus webhook ingestion) (#1560, #1570-#1577, #1601) |
| plugins | Tracker plugin hookspec + registry + bernstein trackers CLI (#1599) |
| orchestration | Issue -> plan-comment -> PR pipeline (#1600), tracker comments as handoff bus (#1606), multi-tracker federation layer (#1561), retry-with-continuation on success-without-commit (#1596) |
| security | Secrets broker for short-lived per-task tokens (#1605) |
| reliability | Progress-watch liveness probe via session-log growth (#1597) |
| sandbox | Playwright self-testing for UI/web agent runs (#1603) |
| lineage | Signed audit log of tracker state moves (#1602), content-addressed trace store + viewer (#1564), per-ticket transcript bundle (#1562) |
| devops | Scheduled upstream-signal sweep with operator rollup (#1594) |
| fleet | Directory-based instance registry for multi-instance hosts (#1592) |
| eval | YAML eval harness (#1565) |
| autofix | Telemetry-grounded autofix MVP (#1566) |
| memory | Long-running session memory (#1559) |
| observability | Run-failure classification with structured tracker writeback (#1569) |
| git | Stacked branches + per-snapshot undo (#1563) |
| quality | Review-bot acknowledgement gate + nightly sweeper + reusable shepherd template (#1583) |
| cost | Hard per-ticket cost cap with clean termination and tracker writeback (#1578) |
Fixes
fix(adapters): refresh aider contract for the upstream--yes->--yes-alwaysrename; contract checker now distinguishes a broken upstream--helpfrom real drift; CI workflow treats the new runtime-failure exit code as a warning rather than a hard fail (#1595).fix(security): dispatch audit events outside the broker lock; index tokens by value (#1607). Split scorecard job so SARIF upload completes (#1613). Mask credentials in logger calls (#1519). Replacesubprocess shell=Truewith list-form args (#1513). Close urllib / SHA1 / Trivy alerts (#1518).fix(orchestration): tracker_pipeline review follow-ups (#1609); commit-completion module review-bot follow-ups (#1608).fix(sandbox): Playwright runner review follow-ups, includingasyncio.CancelledErrorpropagation through broadexcepthandlers and unsafe-task_id rejection (#1610).fix(tui): restore startup banner regression + add coverage (#1568).fix(ci): lock aider adapter-integration job to Python 3.13 (#1586); honour SARIF suppressions before Code Scanning upload (#1520); emit CI gate for paths-ignored-only PRs (#1521); restore minimum-required write permissions broken by security hardening (#1481).fix(review): apply deferred review-bot findings batch (#1584).fix(quality): bulk refurb auto-fix wave 1 acrosssrc/(#1558).fix(test): repair main-red after refurb auto-fix removedstr()in_run_git(#1591).fix(docs): sync agents-md module map for the devops sub-package (#1612).
Internal / quality
- Bulk refurb auto-fix wave 2. FURB113 (repeated
append->list.extend, 259 sites), FURB107 (try/except: pass->contextlib.suppress, 267 sites), FURB173 (dict spread ->|merge, 178 sites), FURB108 (chained==->in {...}) - landed via libcst rewriter + ruff autofix (#1582). - Bulk refurb auto-fix wave 1. Initial refurb sweep across
src/(#1558). - CI dependency churn.
actions/checkoutv4 -> v6 (#1598),actions/upload-artifactv4 -> v7 (#1611), python pin to <=3.13 until adapter 3.14 compat is confirmed (#1590), aider adapter-integration job locked to Python 3.13 (#1586). - Adapter contract check. Truncated upstream
--helpoutput is no longer reported as N missing flags; surfaces on a dedicatedruntime_failurefield that the workflow treats as a warning rather than drift (part of #1595).
Upgrade notes
- No manual operator action required.
pip install --upgrade bernstein(oruv pip install --upgrade bernstein) brings v2.3.0 in. - Operators integrating with non-GitHub backlogs can now register their tracker via the new plugin hookspec (
bernstein trackers --helpfor the CLI surface). - The new review-bot acknowledgement gate runs on every PR. Must-address findings need either a fixup commit (
bot-ack: <id>in the commit message) or a PR-body marker (<!-- bot-ack: <id> reason=... -->).
v2.2.0
v2.1.0 closed the loop on routing observability. v2.2.0 is about the CI immune system: auto-heal grew teeth, the bot-PR class got eliminated, and five cross-discipline interventions (Toyota Lean, epidemiology, alarm fatigue, SPC) stopped recurring failure modes that had been costing real wall time. Three feature workstreams that slipped from v2.1 also landed.
Self-healing CI grew teeth
Auto-heal v2 shipped in v2.1 (#1393, 26 parameters, classifier + heal-branch + admin-merge) and produced zero successful heals in the first three weeks. Every main-red event still required a human-dispatched hotfix. Three things were wrong:
- #1452 typos-cli 404. The fetch URL was stale; the workflow failed before classification. Added a 404-cordon so the daemon now opens a self-issue and stops rather than masking errors.
- #1452 agents-md drift class was missing from the classifier. Lint drift from
bernstein agents-md syncnot running on doc-only commits looked like a new failure class to the heuristic. Added it. - #1452 composition order: ruff was running before agents-md sync, so the sync's whitespace tweaks looked like lint regressions. Reordered.
Plus the trigger leak: #1460 auto-heal pushed its fix branch but the heal-branch CI never started, because push events from GITHUB_TOKEN don't fire downstream workflows by default. Now explicitly dispatches.
Bot-PR class eliminated
#1449 moved contract-drift autofix from "open a PR with the regenerated lockfile" to "inline-push the regenerated lockfile to the PR head." That was the dominant bot-PR-class source. The recursive lint drift cycle that ate a Saturday afternoon is gone.
Cross-discipline CI hygiene wave
Five interventions, each borrowed from a discipline that already solved an analogous problem:
| PR | Discipline | Intervention |
|---|---|---|
| #1454 | alarm fatigue (anesthesiology) | Weekly aggregated digest issue. Replaces N auto-release-skipped notifications with one rolling summary. |
| #1455 | epidemiology (R0) | Hotfix R-counter. Detects when a hotfix begets another hotfix. Two-in-a-row blocks further auto-merge until human triage. |
| #1456 | Toyota Lean (Andon cord) | Trunk health SLO + Andon gate. Holds merges on red trunk. Blocks the bug spread that auto-merge would otherwise inflict. |
| #1457 | bisect on red | Auto-triage main-red to culprit PR. Halves the median MTTR for main-red events. |
| #1467 | SPC (control charts, META F) | Idempotency self-check in regen_contract_drift. Second run of the same regen must be a no-op; if not, the regen is non-deterministic and the workflow halts. |
Seven edge-case hardenings
The first three followed from the wave above. The next four are independent:
- #1458 contract-drift fork-PR fallback shape. The inline-push path needs write to the PR head; on fork PRs that's denied. Now falls back to a comment with the regenerated patch.
- #1459 R-counter benign-drift allow-list + classifier (EDGE-4). Auto-formatting churn on docs files is not a hotfix-class event. Distinct path.
- #1463 advisory PR push-lock for parallel-agent waves (EDGE-6). Six-agent waves were racing on the same PR's branch. Soft lock prevents the lost-write that bricked one PR last cycle.
- #1464 GH API rate-limit guard for long-running agent loops (EDGE-7). Token-bucket plus 429 backoff. Replaces the "wait two minutes and retry" pattern that triggered the secondary rate limit anyway.
- #1465 trunk-Andon override escapes (EDGE-5). Two override paths (force-merge label, commit-message token) for the case where the Andon-detected breakage is the fix.
- #1455 hotfix R-counter (also above) — paired with the Andon gate so the override loop has bounded depth.
- #1450 hygiene for five noise-prone workflows (auto-release filter, scheduled cleanup, telegram dedupe, release-please if-cond guard, delete-master removal).
Branch-scoped CI concurrency
#1470 scopes the CI concurrency group by branch so rapid-merge bursts drain the queue instead of cancelling each other's downstream signals. Plus #1472 hotfix repair for three follow-on root causes (QR dep skip on macOS, GUI URL test path, release-please conditional). Plus #1473 and #1474 clearing actionlint annotation-cap noise via level=error and -shellcheck= flag — the cap was eating real signal under a wall of style nags.
macOS runner saturation fix
The macOS hosted-runner queue depth was 20-70 minutes during burst-merge waves. Issue #1468 categorised the failure mode. #1475 split macOS off the per-PR default matrix into two new gated jobs (test-macos, adapter-integration-macos) that fire on push-to-main, on macos_sensitive path changes, or on a macos-needed label. Added .github/workflows/ci-macos-nightly.yml for the full matrix daily at 06:00 UTC. CI-gate accepts legitimate macOS skips.
Caught a real bug a week later: #1476. The test_reaps_stale_heartbeat test was patching one binding of _is_process_alive but _refresh_heartbeat_from_signals had a separate binding defined locally in bernstein.core.agents.agent_lifecycle. The unpatched call fell through to a real os.kill(pid=999, 0). On Linux and Windows that raised; on macos-latest PID 999 was owned by a system daemon, so the call succeeded, the heartbeat got refreshed, and the test failed. Test-only fix; production reap path was correct.
AI-BOM export (#1438)
bernstein bom emit and bernstein bom verify. Three encoders behind one dispatcher: Bernstein-native JSON, CycloneDX 1.5 with the AI/ML extension shape, and SPDX 2.3 with AI-specific annotations. Pure projection from existing lineage / cost / adapter state -- no recomputed hashes, no I/O during generate_bom. Determinism enforced by Hypothesis property tests across all three formats. Tamper detection via sha256 chain. Closes #1371.
Diary + synthesis (#1432)
Two-tier knowledge layer over closed task transcripts. Diary writes one structured entry per closed task (tried/worked/failed/rationale/tags) with redaction of OpenAI keys, GitHub tokens, AWS access keys, PEM banners, and high-entropy hex. Synthesizer clusters diaries by tag-overlap Jaccard (stdlib only, no embeddings in v1) and drafts a markdown report. HITL-gated: reports default to approved: false. 142 tests including 20 Hypothesis property tests. Closes #1369.
Consensus relay (#1435)
HMAC-chained per-cycle handoff so an operator restarting a long evolution cycle can pull the prior cycle's decisions/blockers/open-questions/next-action into context without rediscovery. Atomic-write store at .sdd/runtime/consensus/<cycle>.json. bernstein consensus list|show|export|next|verify. 73 unit + 12 integration tests. Closes #1368.
PWA + tunnel + QR onboarding (#1442)
Operator GUI is now an installable PWA: web app manifest, service worker with stale-while-revalidate for /api/projects and /api/cost, programmatic maskable icons mounted under both / and /ui/. iOS Safari and Android Chrome install cleanly. bernstein gui serve --tunnel publishes through the existing tunnel driver registry (cloudflared / ngrok / bore / tailscale, auto-select), issues a URL-safe bearer token + 6-word diceware passphrase persisted at ~/.bernstein/dashboard.passphrase (0600), and prints an ASCII QR. bernstein gui qr [--rotate] reprints or rotates. 106 unit + 22 integration tests. Closes #1218.
Upgrade
pip install -U bernstein==2.2.0 or uv tool upgrade bernstein. No config migration. Existing diaries / consensus stores / BOMs are read-compatible.
v2.1.0
I shipped v2.0.0 with a plan-routing bug that silently collapsed per-step cli: and model: pins onto the role default. v2.0.1 fixed that. v2.1.0 answers a different question: once routing works, can the system explain itself when a decision goes sideways. Most of this release is observability, calibration, and a CI loop that fixes its own breaks before a human notices.
Lineage v2 and simulate
Lineage v1 stored task ancestry in one flat table; forked sessions and detached children made the queries expensive. Lineage v2 (#1377) is two-layer, the production recorder writes both layers, and the CI gate (#1396) accepts that output.
bernstein simulate (#1378) is a digital-twin runner. Feed it a plan plus a route and it executes the orchestration without the adapter network. Rehearse an expensive plan before paying for it.
Self-healing CI
The pipeline now repairs main when a merge breaks something the autofixer can handle. #1389 added safe and heuristic autofix classes. #1393 grew that into the 26-parameter auto-heal v2. Pattern: red main, classify, fix PR, watch checks, admin-merge.
ProgramBench (#1407), a scenario generator (#1357), and a citation verifier (#1408) move three chores to eval.
Cost, criterion profile, decision log, calibration
bernstein simulate only matters if you can also see what the live orchestrator decided and what it expected to spend.
- Per-task criterion profile (#1363) plus TOPSIS multi-criteria ranking (#1361). A "latency-sensitive" task routes differently from a "thorough" one.
- Structured decision log (#1360) covering every routing, retry, and gate verdict with its inputs.
- Calibration log plus Brier score (#1359). The forecast log got teeth.
- Criterion-aware retry budget (#1355), per-quota-envelope attribution (#1413), calibrated p50/p90 cost preflight band (#1335).
- The preflight cost estimator picks the most expensive role rather than the first one declared (#1395). The old behaviour underestimated by 40 to 60 percent on multi-role plans.
Security hardening
- Invisible Unicode Tag codepoints stripped from injected skills before any prompt sees them (#1417).
- Promptware cross-agent C2 strings detected in tool output (#1421).
- MCP tool-call inputs JSON-Schema validated, deny-by-default (#1411).
- Per-tool allowlist, fail-closed policy, read-only profile (#1326).
- Constant-time HMAC compare (#1399), session_id log-injection sanitisation (#1341), Qwen and IaC adapters forward secrets via env not argv (#1390, #1392).
A security-pentest eval scenario (#1419) exercises it.
Adapters and GitLab parity
bernstein adapters check returns a conformance and capability report (#1385). bernstein compare runs a side-by-side adapter A/B (#1337). GitLab integration reaches parity with the GitHub app (#1379).
What didn't ship
- npm wrapper: NPM_TOKEN scope is wrong after the org transfer. PyPI, Homebrew, GHCR ship.
bernstein-scheduled-maintenance.ymlstays disabled while auto-heal v2 bakes.- AI-BOM export (#1371), task-transcript diary (#1369), cross-cycle consensus relay (#1368), installable PWA (#1218): tracked, did not make the cut.
- A11y audit, theme toggle, mobile responsive web UI: still open from v2.0.0, see #1262.
Upgrade
pip install -U bernstein==2.1.0 or uv tool upgrade bernstein. No config migration. TaskCountsResponse grew two integer fields (abandoned, blocked_by_abandon) defaulting to 0; clients keep working.
v2.0.1
A patch on top of v2.0.0 — that's the actual web-UI release; v2.0.1 is the first cut that survived CI and made it to PyPI.
Why 2.0.1 and not 2.0.0 on PyPI
Three contract tests broke during the v2 UI integration: a route-parity check noticed the new /ui mounts; a CLI-callback signature drift caught the freshly-added idle flag; the README-coverage test wanted the new gui command in its allow-list. None of those were product bugs — they were guardrails firing exactly as intended on a big merge. While they were red, the auto-release pipeline correctly refused to ship.
By the time main came back green a few hours later, the version had already moved to 2.0.1. v2.0.0 is now a historical marker tag — pip / pipx / uv all install 2.0.1 by default. All the v2.0.0 functionality is in this release; the v2.0.0 release notes describe what shipped (screenshots and all).
What got fixed along the way
- The
/uimount +/gui-metaroute now satisfy both directions of the API-versioning parity check (#1272, #1279). - The Tasks-page drawer stopped re-popping after Esc / X / click-outside (#1269).
CI hardening (the bigger half of this release)
Right after the v2 cut shipped, an investigation surfaced a quieter problem: auto-release had been silently skipping for hours because the v2 commit's CI was cancelled by concurrency-cancel-in-progress, and the alert filter only listened for failure, not cancelled. We took that as a signal that the safety net had a hole and spent the rest of the day patching it.
A summary of what landed:
- Aggregator gate-job (#1276). One required status check that fails on any non-success result — including
cancelled,timed_out,action_required— instead of relying on individual matrix-job names that drift. - Silent-skip alerts (#1274, #1307). Telegram fires on anything that isn't
success; auto-release opens (or updates, deduplicated by commit SHA) a tracking issue when it has to skip; a daily reconciliation cron comparespyproject.tomlagainst PyPI. - Concurrency split (#1277). PRs still cancel previous runs on new pushes — but pushes to main queue instead of cancelling, so the release pipeline always sees a real conclusion.
- Conditional allowed-skips (#1287). A
skippedresult only passes the gate when an upstream planner job said the skip was intentional. Concurrency-cancellations or unexpectedskippedflips still fail. - Contract-drift auto-fix bot (#1278). When one of the three contract tests above flags drift on a PR, a bot proposes the fix (DOCUMENTED_COMMANDS / _INFRASTRUCTURE_PATHS / cli forward-arg) as a sibling PR.
- Adapter contract drift detection (#1293). Same idea but for the external CLIs the orchestrator drives. 15 adapters (claude, gemini, codex, aider, opencode, aichat, crush, amp, continue_dev, plandex, goose, q_dev, gptme, forge, qwen) get their
--helpcapability-asserted three times a day; when upstream renames a flag we depend on, CI goes red within hours. - Supply-chain coverage (#1284). OSSF Scorecard, SBOM on every release,
actions/dependency-reviewon PRs,trufflesecurity/trufflehogsecret-scan, Dependabot extended to the github-actions ecosystem. - Workflow security pass (#1296, #1299, #1300, #1308). 163 zizmor findings resolved across
unpinned-uses,artipacked(persist-credentials: falseon read-only checkouts),template-injection,bot-conditions,dangerous-triggers,ref-version-mismatch,cache-poisoning,excessive-permissions,dependabot-cooldown. The three jobs that legitimately push back to git keep their credentials with an annotated rationale. step-security/harden-runneraudit mode (#1285) on every workflow job — egress visibility before flipping toblock.- pre-commit.ci + nightly fanout (#1275). Auto-fix lint / format on PR, and the nightly compliance / regression workflows (nightly-deep-tests, eval-nightly, SOC2 evidence, pentest) now route their failures through Telegram instead of dying silently.
Distribution
The release made it to:
- PyPI —
pip install bernstein➜2.0.1 - npm —
bernstein-orchestrator@2.0.1 - GitHub Container Registry —
ghcr.io/sipyourdrink-ltd/bernstein:2.0.1(the publish workflow had been deleted in April; #1298 restored it)
Homebrew tap and the COPR / RPM channel both had pre-existing breakage that this release exposed; #1297 documents the Homebrew fix (needs a one-time PAT) and #1309 retires the COPR channel — pipx install bernstein works natively on Fedora 41/42 anyway.
Upgrade
pip install --upgrade bernstein
bernstein gui serve
v1.10.x configs and plans run unchanged. The CLI / TUI surface is the same.