Skip to content

Releases: sipyourdrink-ltd/bernstein

v2.7.0

24 May 15:53
32500ae

Choose a tag to compare

v2.7.0

Released 2026-05-24.

This release focuses on making Bernstein's automation easier to verify: stricter release gates, a complete Sonar cleanup, deterministic skill authoring tools, and an opt-in maintainer-share telemetry path that stays off by default.

Highlights

  • Skills are closer to end-to-end. SKILL.md manifests now carry a versioned schema, and the CLI has deterministic skills init, skills test, skills diff, and skills bench commands. Strict linting can block installs, and sandbox/sanitizer checks protect install-time execution.
  • Skill routing and local outcome reporting are more useful. Bernstein can build reproducible local helpfulness reports from the activation log, and deterministic routing tools make skill selection easier to inspect without model calls.
  • Opt-in telemetry sharing is wired through a real maintainer-share sink. It is still off by default, requires explicit consent plus BERNSTEIN_TELEMETRY_SHARE_ENDPOINT, uses the same redacted event schema, and signs shared receipts for offline verification.
  • Release and CI gates are harder to get wrong. The publish workflow now runs real release tests, checks protocol compatibility, asserts GitHub Release assets, reconciles PyPI/GitHub drift, and ties main-branch eligibility to an explicit SHA marker.
  • Sonar is green. Coverage is reported from the CI shard set instead of one partial artifact, the tracker is down to zero open findings, and remaining hotspots were reviewed through a dedicated workflow.
  • Several fail-closed paths were tightened. PR review uses trusted action code, issue decomposition has a narrower write boundary, plugin zip extraction is hardened, release attestation is enforced, and lineage/audit checks bind more verification to bytes on disk.

Install

pipx install --upgrade bernstein

Python packages and GitHub Release assets are published for 2.7.0.

The npm wrapper is a convenience distribution path and may lag the Python release while registry permissions are repaired.

Full changelog: v2.6.0...v2.7.0

v2.6.0

22 May 01:30
75aa869

Choose a tag to compare

v2.6.0

Released 2026-05-22.

A large release. Highlights: bidirectional chat drivers with verifiable approvals, per-step replay with a hash-chained journal, operator-registered recurring goals, a signed supervisor surface, a skill catalog with signed manifests, image-attachment provenance, and a sharded CI test suite behind a native merge queue.

Chat and operator surfaces

  • Slack bidirectional driver: drive a session, approve or reject a tool call, and watch streamed output from Slack. Every approval is recorded as a signed entry in the audit chain (covering approver, message timestamp, decision, and tool-call hash), approval scope is pinned to the worker's git worktree, and outbound messages carry an Ed25519 signature so a recipient can verify the workspace identity. Optional bernstein[slack] extra. (#1794)
  • Discord bidirectional driver: the same attested-approval model as Slack, plus a per-channel scheduling fence so tasks emitted from one channel cannot land on workers bound to another. Optional bernstein[discord] extra. (#1795)
  • Image attachment passthrough: bernstein run "<prompt>" --attach ./shot.png carries an image to a vision-capable adapter (Claude, Gemini). The image's SHA-256 is recorded in the audit chain at decision time and anchored as a lineage parent of any artefact produced that turn; spawning with --attach on a non-multimodal adapter fails before any process launches. (#1797)

Orchestration

  • Per-step session replay: each agent step is recorded in a hash-chained journal where step_hash = H(prev_hash, input_hash, model, prompt, tool_call, tool_result). bernstein replay <agent-id> walks the chain, bernstein session fork <id> --from-step N branches a sibling worktree from a chosen step, and replay divergence surfaces as a precise hash mismatch rather than a flaky result. Exported receipts verify offline against the install public key. (#1799)
  • Operator-registered recurring goals: bernstein schedule add --cron "<expr>" --goal "<text>" registers a recurring goal that fires inside a single installation, no host cron or external scheduler required. Each fire is a deterministic projection of (schedule_id, fire_time, last_state) onto a canonical task graph and is recorded in the audit chain, so bernstein schedule audit can prove a nightly sequence ran exactly as expected. (#1798)
  • Operator supervisor surface: bernstein supervisor status aggregates the existing stall, watchdog, and respawn-budget detectors into one view. A detected stall produces a signed escalation receipt (last audit entries, identity tokens, structured reason, and a deterministic recommended action) that any verifier can check offline. (#1800)
  • Worktree GC reaps are now anchored to the audit chain: each reap appends a worktree.reap event capturing the pre-deletion git HEAD and a clean/dirty flag, and the reap is fail-closed (a worktree is not deleted if the reap cannot be recorded). (#1833)
  • Deterministic replay is now hermetic: a cache miss in replay mode raises a typed error and aborts instead of silently calling the live model, the replay key folds in provider, temperature, and max-tokens, and a coverage line reports hits, misses, and strict violations. A non-strict fall-through stays available behind an explicit, logged opt-in. (#1832)

Skills

  • Skill catalog with signed manifest installs: bernstein skills catalog browse|search|install|upgrade|info|status. Each install appends a signed audit-chain entry referencing the manifest URL and content digest, refuses unverified manifests by default, keeps a lockfile that stays consistent across parallel worktrees, and a CI lineage gate rejects a lockfile referencing an unknown manifest digest. (#1796)
  • Skill lifecycle CLI foundation: install, sync, lock, lint, watch, and a local activation log with an env-var opt-out. (#1734)

Evaluation and observability

  • GlitchTip event ingester: a scraper turns open self-hosted error-tracker issues into one regression eval case each, deduped on the issue id, with administrative wiring-probe issues filtered out. The nightly real-run canary feeds this loop. (#1820)
  • CI-failure post-mortem ingestion: a scraper walks merged PRs that needed fix-up commits and synthesizes regression eval cases, so the eval suite tracks the failure modes that surface first in CI. (#1793)
  • Nightly real-run canary: a scheduled job runs real end-to-end flows (worker spawn, git worktree lifecycle, audit-chain append plus verify, signed lineage receipt) against a deterministic stub adapter, with no API key or network, and routes any failure to the telemetry sink. (#1822)
  • Multi-adapter pentest fan-out: bernstein eval pentest --adapters a,b,c runs one scenario across adapters and aggregates consensus on (canonical_vuln_type, normalized_path). Single-adapter behaviour stays byte-identical. (#1754)
  • Consolidated SonarQube findings tracker auto-rendered from the live Sonar API. (#1781)
  • Terminal orchestration failures route to the configured error sink. (#1762)
  • Per-row source-adapter provenance for the memory subsystem: SQLiteMemoryStore carries an optional source_adapter column with add_many and query(read_only_from_adapters=[...]), via an additive NULL-backfill migration. (#1759)

CI and quality infrastructure

  • Merge-gate stack: pre-merge autosync regenerates mirror docs and formatting on PR branches, a main-red guard blocks merges while main is red, a nightly drift sweep opens a PR on accumulated drift, and the suite now responds to merge_group: so a native merge queue tests each PR against the combined branch. Operator runbook at docs/operations/merge-queue.md. (#1756)
  • The unit-test job is sharded across parallel runners (scripts/run_tests.py --shard i/N), so the per-file isolated suite scales as the test count grows. (#1845)
  • Coverage ratchet: a monotonic gate keeps total coverage from regressing and nudges the per-PR diff-coverage floor up over time. (#1829)
  • Static-analysis sweeper for Sonar findings, widened to MAJOR severity, turns open findings into backlog tickets keyed on the Sonar issue id. (#1763 / #1819)
  • Unit tests are now hermetic: an autouse guard blocks real outbound network connections in tests/unit/ (loopback allowed), so a network-dependent unit test fails deterministically instead of flaking in CI. (#1856)
  • Opt-in telemetry foundation: consent CLI, a schema guard, and an off-by-default proof. (#1736)

Reliability and correctness fixes

  • Deterministic replay no longer calls the live model on a cache miss (see Orchestration above). (#1832)
  • HSM lineage kind fails fast at config-load when no real adapter is on the classpath; opt-in to the stub via BERNSTEIN_ALLOW_HSM_STUB=1. (#1753)
  • MCP OAuth discovery metadata corrected; the Tier-3 cordon now catches deletions and renames. (#1755)
  • Sensitive paths are no longer logged in clear text on the always-allow tamper path; the forensic record keeps the full value with pre-hashed digests. (#1814)
  • Resolved SonarQube findings: S8413 router double-mount, S125 commented-code, S3516 invariant-return, S5754 broad-except, plus a refurb sweep across the new subsystems. (#1786 / #1787 / #1788 / #1807 / #1813 / #1814 / #1815 / #1817 / #1818)
  • Post-CI dispatcher declares per-job permissions explicitly; its child-secret expectations were synced with the GlitchTip forward. (#1746 / #1801)
  • The tampered-signature catalog test corrupts the leading signature byte so verification fails deterministically (a trailing-byte flip could be a no-op and let the test reach the network). (#1843)
  • Workflow loader rejects interactive: true at config-load instead of crashing mid-run (closes #1110). (#1760)
  • Adapter dual-binary discovery handles the antigravity / gemini migration. (#1748 and follow-ups)

Internals and docs hygiene

  • Dropped the unused bernstein.benchmark.head_to_head module from the wheel. (#1767)
  • Reorganized the docs tree; internal working notes consolidated under docs/_internal/. (#1768)
  • Front-page content, page metadata, and stale README translations refreshed. (#1769)
  • Repository-wide formatting pass; AGENTS.md mirror set regenerated. (#1771 / #1776 / #1777)

v2.5.1

20 May 13:13
69610b6

Choose a tag to compare

What this patch is

Five fix commits had landed on main without triggering a patch publish, because the post-CI dispatcher was silently failing at startup. A required secret was being referenced in a reusable-workflow call before the receiving side had a chance to declare it optional, and GitHub Actions resolves that block at call-start. One commit short of the fix unblocks the rest.

Fixes included

  • fix(security) (#1705) - 12 CodeQL / Semgrep findings resolved, 23 dismissed with technical justification.
  • fix(observability) (#1713) - Sonar scan workflow no longer skips on cancelled upstream CI; direct push trigger added.
  • fix(privacy) (#1718) - residual operator hostnames scrubbed from docs and from a PR-comment template.
  • fix(routes) (#1726) - three sync subprocess calls in async routes converted; eight bare-excepts narrowed.
  • fix(api) (#1727) - intermittent 500 on POST /tasks was a real bug, not a Schemathesis flake. Validates at the request boundary now.

Dispatcher

Restored in #1730. After this, the next CI-green push computes the conventional-commits delta automatically.

Try it

pipx install --upgrade bernstein

Source: https://github.com/sipyourdrink-ltd/bernstein (Apache-2.0).

v2.5.0 - Interoperability surfaces, host portability, deterministic replay

20 May 10:31
28fd2a0

Choose a tag to compare

A note on the voice

v2.4.0 was about observability surfaces, running the four backends through one umbrella so a code-scanning regression or a coverage drop surfaces in the same table as a GlitchTip spike. v2.5.0 is the next question over: now that the orchestrator can see itself, can the hosts an operator already runs see it too. And can it stop quietly phoning home to my private infrastructure when it does.

Interop, finally

The piece that kept blocking me on multi-host runs was the lack of a real handshake. Claude Desktop is one process, Claude Code is another, both can spawn agents, neither knew what the other had already decided. I shipped A2A capability cards (#1698): one process mints a signed manifest of what it can do, the other consumes it, verifies the signature against a trusted-issuer set, and refuses to delegate when the advertised policies do not meet the operator's required policies. The lineage chain rides through the same envelope so the audit trail does not break at the organisation boundary.

The MCP client got the matching upgrade (#1692). Upstream servers will return malformed responses, hang mid-stream, demand re-auth, lie about their capability manifest. The client now treats every upstream as untrusted: capability-card validation before a tool call, retry-with-continuation on dropped streams, in-flight cancellation that preserves partial output, per-server cost metering, schema-violation containment that marks a misbehaving server degraded for the rest of the task. None of this is exotic; it is the brittle-real-world posture that the larger MCP ecosystem will end up needing.

The MCP server side got prompt-catalogue plus OAuth-2 PKCE discovery metadata (#1696, #1709), so auto-discovering hosts that expect a real RFC 8414 / RFC 9728 surface stop skipping us.

bernstein desktop-register

bernstein desktop-register --host <name> (#1697, then #1708 added five more hosts) writes the host-specific config entry for Claude Desktop, Claude Code, Cursor, Continue, Cline, Zed, and Aider. One command. The orchestrator is a guest in the host's settings file; we ship the plugin, the host renders it. bernstein doctor --substrate reports which hosts have us registered, which do not, and which have a stale registration.

The honest disclaimer: if a host changes its plugin spec, the per-host adapter breaks. Each adapter is small enough that a host-spec change is a one-day fix, not a re-architecture.

I removed my private infrastructure from the shipped package

This one was a real silent bug, not a feature. The shipped wheel had errors.bernstein.run baked in as the GlitchTip DSN default, and telemetry.bernstein.run baked in as the telemetry endpoint default. Both backends soft-fail when their env vars are unset, so the package never actually reached out without consent. But the hostnames were sitting there as defaults, which is the kind of thing that turns into a real leak the day someone wires a config they did not read.

#1694 strips those defaults. tests/unit/observability/test_no_hardcoded_infra.py asserts zero operator-private host, IP, or DSN matches in src/ and will fail the build if a future change reintroduces one. Telemetry side-channel is now portable across hosts behind one Sentry-compatible BERNSTEIN_TELEMETRY_DSN (#1691) so each operator runs against their own backend, not mine.

Deterministic replay

Three small things compounded. Session ids are bound deterministically (#1684) so a replayed run reproduces its own event stream without colliding with a sibling. The supervisor enforces a bounded respawn budget and parks an agent when the budget is exhausted (#1683), instead of looping respawns indefinitely. On-disk state has a versioned migrations module (#1689) so an older .sdd/ upgrades predictably. Plus the cosmetic-but-real win: runs surface a memorable deterministic name (#1682) in user-facing output, so the operator can refer to "the brisk-sparrow run" instead of memorising a UUID.

The API stops returning 500 on a fuzzer-found bug

The TaskCreate.scope and complexity fields were typed as plain str with only a length cap. An empty or out-of-range value passed pydantic and then raised ValueError deep in the task store when the enum was constructed, surfacing as an unhandled 500 on POST /tasks and POST /tasks/batch. Schemathesis kept flagging it intermittently and everyone kept rerunning it as a flake. It was not a flake. #1700 validates at the request boundary and returns 422.

What I am not claiming

The two new transports are functional but not load-tested at adversarial scale; the OAuth-2 PKCE discovery surface ships metadata, full token issuance and OIDC federation are deferred to a follow-up. The substrate adapters cover seven hosts; Codex and Gemini CLI are stubbed by design until their respective plugin specs stabilise. The A2A integration honours the protocol as specified at the time of pickup and will need maintenance as the spec evolves.

Try it

pipx install --upgrade bernstein
bernstein interop a2a card --output card.json
bernstein desktop-register --host cursor

Full per-PR notes in docs/release-notes/v2.5.0.md. Source: https://github.com/sipyourdrink-ltd/bernstein (Apache-2.0). 22 commits since v2.4.0.

v2.4.0

20 May 07:27
5200c94

Choose a tag to compare

v2.4.0 - Observability surfaces, single-writer run state, declarative planning gates

Release date: 2026-05-20
Commits since v2.3.1: 33

Highlights

  • Unified bernstein doctor observe umbrella rolls the four observability backends (Sonar, GlitchTip, Dependency-Track, GitHub Code Scanning) into one aggregated table with delta-since-last-check, plus a per-PR sticky summary comment and a daily trends snapshot. Each backend soft-fails to SKIPPED when its env vars are unset, so a fresh checkout stays green.
  • Single-writer RunActor owns canonical per-session state behind one async event queue with a bounded replay buffer that emits an explicit Gap{up_to_seq} marker on eviction, making reconnect-after-eviction observable instead of silently lossy.
  • Spec-quality gate refuses to advance a feature spec until a deterministic, library-only rule set passes; failures route through a bounded auto-fix loop and surface unresolved items to the operator rather than dispatching an implementer against a weak spec.
  • Declarative task DAG: tasks gain parallel_safe and story_id fields, the backlog parser learns [T<id>] [P] [USn] markdown checkboxes, topological_iter_with_parallel yields ready batches honouring cycle detection, and bernstein plan dag / bernstein tasks dag render the DAG with parallel batches highlighted; replaces the file-overlap heuristic for tasks that declare the flag while preserving the legacy heuristic as a fall-back.
  • Three-layer skill customization (BASE / TEAM / USER) under XDG paths with a per-field deterministic merge spec: scalars override, tables deep-merge, keyed arrays replace by name, unkeyed arrays append; missing layers fall through cleanly.
  • Empirical-confidence ledger backs the model recommender: an append-only SQLite store of per-decision outcomes feeds a sample-size-gated query that prefers measured outcomes over the capability-tier heuristic and over the bandit arm, refusing to return a value below a documented threshold (default 5).
  • Approval responses are now bound to a 16-byte server-minted single-use nonce; mismatches surface as 409 NONCE_MISMATCH and evicted replays as 410 NONCE_EXPIRED, foreclosing stale-button replay on superseded prompts.
  • Canonical stream-signal vocabulary (COMPLETED, FAILED, QUESTION, PLAN_DRAFT, PLAN_READY, BLOCKED) parseable from any wrapped CLI stdout so non-stream-json adapters surface lifecycle events through the same channel as native stream-json adapters.
  • CI hardening across the board: the Sonar scan consumes the existing coverage artifact via workflow_run (and workflow_dispatch bootstraps a coverage-bearing first scan), the review-bot-ack gate no longer cancels its own required check, the Schemathesis smoke timeout is widened to stop flaky cancellations, and the runtime Docker images are pinned back to python:3.13-slim.
  • Four refurb auto-fix waves (wave 4 plus clusters B / D / E) land about 320 mechanical idiom rewrites across src/, taking FURB142 to zero and substantially reducing the FURB184 / FURB138 / FURB124 / FURB182 / FURB101 / FURB109 / FURB108 / FURB126 backlog.

What ships

Observability surfaces

  • Unified bernstein doctor observe (#1650). Umbrella command that runs each per-backend probe (Sonar, GlitchTip, Dependency-Track, GitHub Code Scanning) in order and renders one aggregated Rich table with metric, value, delta-since-last-check, threshold, and status columns. Supports --json (machine-readable) and --watch (re-runs every 60 seconds). Each backend soft-fails to SKIPPED when its env vars are unset so the umbrella keeps running on a fresh checkout. Per-backend deltas are computed against a small snapshot cache at .sdd/observability/<backend>.json (suppressible via --no-persist). The dt, code-scanning, and observe Click commands are registered directly in bernstein.cli.main so the wiring survives independent refactors of advanced_cmd.py. A per-PR pr-observability-summary.yml workflow posts a sticky Markdown comment rendered from the observe JSON, and a daily docs-observability-snapshot.yml cron (06:00 UTC) writes docs/observability/snapshots/<date>.json and re-renders docs/observability/trends.md via a dependency-free unicode sparkline. Probe crash messages store only the exception type in persisted snapshots so tokens or URLs cannot leak. Docs at docs/observability/unified-doctor.md. Tests at tests/unit/cli/doctor/test_observe.py cover probe soft-fails, delta math, Click wiring, JSON shape, persistence toggle, and exit-code mapping.
  • bernstein doctor sonar (#1648). New subcommand pulling project measures from a configured SonarQube server: coverage, code smells by severity, bugs, vulnerabilities, security hotspots, and cognitive-complexity hotspots. Rich-table or --json output. Soft-fails (exit 0) when SONAR_HOST_URL / SONAR_TOKEN are unset and prints a one-line hint at docs/observability/sonar.md. Advisory baseline at $XDG_DATA_HOME/bernstein/sonar-baseline.json lets the parent bernstein doctor group nudge when open smells exceed the threshold or vulnerabilities regress. 28 hermetic tests via httpx.MockTransport.
  • bernstein doctor glitchtip (#1646). New subcommand pulling last-24h issue counts by severity, a 7-day trend, and the top unresolved issues from the configured GlitchTip server. Rich-table or --json output. Soft-fails when BERNSTEIN_GLITCHTIP_TOKEN is unset. Optional baseline cache at ~/.local/share/bernstein/glitchtip-baseline.json powers a nudge under bernstein doctor --suggest-docs when the GlitchTip API reports new unresolved issues since the last check. 25 unit tests cover the fetcher, baseline persistence, nudge logic, Click wiring, and soft-fail behaviour.
  • Sticky PR Sonar comment (#1648). New .github/workflows/sonar-pr-comment.yml posts a sticky advisory PR comment with project-level Sonar measures. Soft signal only, never blocks merge.
  • Daily GlitchTip alert sweep (#1646). New .github/workflows/glitchtip-insights.yml (06:30 UTC + workflow_dispatch) mirrors fatal-level GlitchTip issues into sticky GitHub issues labelled glitchtip-alert. The mirror auto-closes when the GlitchTip side resolves. Workflow now validates HTTP status on the resolved-issues fetch and runs gh issue subprocesses with check=True so reconciliation failures fail the run instead of being swallowed.

Security

  • Approval-nonce binding (#1642). Mints a 16-byte server-generated nonce per pending approval. The reply must echo the exact value or the gate refuses to resolve, foreclosing stale-button replay on superseded prompts and any path where the agent process could forge its own approval response.
    • core/approval/models: nonce field on PendingApproval (hex on the wire); to_dict(include_nonce=False) for adapter-facing serialisations; new ApprovalNonceMismatch / ApprovalNonceExpired errors.
    • core/approval/queue: resolve() validates the supplied nonce in constant time. Server-internal callers (TTL evict, wait_for timeout) keep the back-compat no-nonce path so they cannot deadlock.
    • core/routes/approvals: HTTP reply now requires a nonce. Mismatches surface 409 NONCE_MISMATCH. Replays against an evicted approval surface 410 NONCE_EXPIRED. The live-fragment HTML threads the nonce through the button handlers.
    • cli/commands/approval_cmd: approve-tool / reject-tool read the on-disk record and thread the nonce back through resolve().
    • A missing nonce body field defaults to an empty string at the schema layer so it flows through the handler and surfaces as 409 NONCE_MISMATCH via the existing _coerce_nonce guard, instead of being rejected at the Pydantic layer with 422.
    • Closes #1619.

Reliability and runtime

  • Single-writer RunActor (#1641). Introduces a per-session actor that owns canonical run state. Mutations flow as typed events through one async queue. A pure apply_event reducer applies them with monotonic seq numbers. ReplayBuffer is a bounded ring (default 1024) that emits an explicit Gap{up_to_seq} marker when a subscriber asks for an evicted range, so a reconnect-after-eviction is observable instead of silently corrupt. The approval gate gains an opt-in session_id kwarg that mirrors approval events into a registered RunActor via run_actor_registry. The file-driven decision contract is unchanged; the actor feed runs alongside. Migrating the remaining writers (worker subprocess, watchdog, lifecycle hooks, hooks_receiver) is a follow-up. Refs #1630.
  • Canonical stream-signal protocol (#1638). New core/protocols/stream_signals.py defines a small text-line vocabulary (COMPLETED, FAILED, QUESTION, PLAN_DRAFT, PLAN_READY, BLOCKED), a parser, a producer-side format helper, and conformance helpers. CLIAdapter grows an optional stream_signal_parser hook; the default delegates to the canonical parser, adapters override to map a native protocol onto the canonical vocabulary. ConformanceReport surfaces missing terminal signals as a soft warning so adapters without canonical signals stay visible without failing. Tests cover parse, format round-trip, malformed-input resilience, concurrent multi-adapter parsing, terminal-signal check, default vs. override hook behaviour, plan, and question round-trip. Docs at docs/adapters/stream_signals.md describe the vocabulary with shell and Python wrapper examples. Resolves #1632.
  • Declarative task DAG (#1655). Adds a declarative task DAG layer so the planner sets per-task parallel safety at task-generation time instead of having the scheduler infer it from file overlap. The Task schema gains parallel_safe (default False) and story_id (Optional[str]) with round-trip support in Task.from_dict. The backlog parser recognises the [T<id>] [P] [USn] markdown checkbox format and the matching YAML frontmatter keys. New `core/orchestration/task_dag....
Read more

v2.3.1

19 May 21:13
6a4ea57

Choose a tag to compare

v2.3.1 - Maintenance

4 commits since v2.3.0. No new features, no breaking changes. Patch release covering correctness fixes, CI hardening, and follow-up on review-bot findings deferred during the v2.3.0 cycle.

Highlights

  • Restore numeric and key coercions that the FURB123 auto-fix pass removed, plus 19 mechanical fixes from the 2026-05-19 review-bot catch-up.
  • Soft-fail the cross-repo landing-mirror dispatch on PAT scope errors so the docs-drift pipeline keeps moving when the fine-grained PAT lacks actions:write on the landing repo.
  • Harden the CLI against malformed GLITCHTIP_DSN and snapshot sidecars that fail schema validation.
  • Map UrlSchemeError to the documented typed errors in MCP transports and the lineage-alert sink.

New features

None.

Fixes

  • Refurb correctness fixup (#1615). Restored int() / float() / str() coercions and added explicit isinstance guards that the FURB refactor pass dropped. Eight bot-ack findings on the same PR addressed.
  • Landing-dispatch PAT scope (#1617). Capture the HTTP status from the cross-repo workflow_dispatch call and emit an operator-actionable warning annotation when the dispatch endpoint rejects the request. The job now exits 0 in all non-success cases instead of failing the trigger-landing-mirror job and blocking the docs-drift pipeline on main.
  • GLITCHTIP_DSN crash on import (#1618). Wrap sentry_sdk.init in a best-effort try/except so a malformed DSN cannot crash the CLI on import.
  • Snapshot-sidecar schema errors (#1618). Treat schema-invalid snapshot sidecars as unreadable metadata (return None and warn) instead of raising KeyError / TypeError / ValueError through SnapshotStore.get / list. Reject negative --days for bernstein git gc before constructing the store and before computing the cutoff.
  • MCP transport typed-error surface (#1618). Wrap UrlSchemeError from ensure_http_url as TransportError in SseTransport.connect and StreamableHttpTransport.connect. Catch UrlSchemeError in sink_from_config and fall back to NullAlertSink, preserving the "orchestrator never raises here" contract.
  • GitHub Projects adapter robustness (#1618). Catch OSError around GitHub App private-key reads and raise TrackerUnavailable so the typed error surface stays intact. In _item_to_ticket, skip items whose content.__typename is not Issue / PullRequest / DraftIssue rather than emitting tickets with empty title / body / content-ids. Added regression test for the HTTP 403 abuse-detection -> RateLimited mapping.
  • Bundle command input validation (#1618). Validate sign inputs as a pair and read the private key before assembling the bundle, so invalid CLI input never mutates on-disk state.
  • Docs cleanups (#1618). Clarify the literal closing-fence string in the failure-taxonomy consumer-contract step. Prune scope reads as two distinct scopes in session-memory docs (episodic = session, semantic = task). Add text language tag to the post-CI-dispatcher sequence-diagram fence so markdownlint MD040 passes. Replace classic PAT scope strings with the fine-grained PAT permission model in the github-projects tracker doc.

Internal

  • Refurb auto-fix wave 3 (#1615). Mechanical code-quality cleanup of remaining refurb findings across src/ and tests/. Rules auto-fixed: FURB123 (redundant cast removal, 147 sites via custom line-based rewriter), FURB138 (assign-empty-list + append loop -> list comprehension, 57 sites via libcst rewriter restricted to single-statement bodies with ast.parse round-trip per file), FURB113 (repeated append -> list.extend, 5 leftovers via ruff check --select FURB113 --preview --unsafe-fixes --fix). Counts: FURB123 148 -> 0, FURB138 106 -> 49, FURB113 31 -> 26. Reformatted by ruff format (3 files). Wave 3 of the bulk auto-fix work started in #1558 and continued in #1582. Skipped this round: FURB184 (2440 sites, still needs whole-function liveness analysis), FURB138 leftovers (49 sites with multi-statement bodies, guarded branches, or appends with continue / break), FURB108 (24 sites), FURB173 (2 sites).
  • Sonar-scan timeout and uv caching (#1616). Sonar scans were dying at the 20-minute job timeout during uv sync --group dev + pytest --cov on a 127-commit history. Raise the job-level budget to 60 minutes with per-step timeouts that fail fast on individual stages (sync 15m, coverage 30m, scan 10m). Pin the new astral-sh/setup-uv action to v8.1.0 with cache enabled so subsequent runs reuse the dev environment.
  • GlitchTip setup doc (#1616). Added docs/operations/glitchtip-setup.md covering DSN provisioning, env-var export, and end-to-end event verification on a single page.
  • SBOM scope (#1618). Generate the SBOM from an isolated venv where only the project and its resolved dependencies are installed, so the output reflects bernstein's dependency graph rather than the runner base image.
  • Review-bot triage continuity (#1618). PR #1584 inventoried 48 review-bot findings and landed 4 mechanical fixes. This release picks up the remaining 44: 19 applied here, 14 already resolved on source PR branches before merge (CodeRabbit confirmed "Addressed in commits ..."), and 11 deferred for design judgement (config-schema changes, frozen-dataclass migrations, semantic-prune scope, stack indexing rework, worktree-aware git-dir resolution). Deferred items are tracked in docs/review-bot/deferred-2026-05-19.md so they are not lost.
  • Test typing and assertions (#1618). Annotated the sarif_module fixture return type as Generator[ModuleType, None, None] in tests/unit/scripts/test_sarif_drop_suppressed.py. Explicit assertion message when EXPECTED_CHILD_SECRETS is missing an entry for a child in tests/unit/test_post_ci_dispatcher_yaml.py. Parse the GraphQL request body as JSON and assert against variables rather than byte-substring against raw request.content in the github-projects adapter tests.

Deprecations

None.

Upgrade notes

  • Drop-in upgrade from v2.3.0. No config-schema changes, no API changes, no audit-chain changes.
  • Operators relying on the cross-repo trigger-landing-mirror job should verify the fine-grained PAT has actions:write on the landing repo. If the scope is missing, the job will now emit a warning annotation instead of failing the workflow.
  • Operators using bernstein git gc --days <N> should note that negative <N> is now rejected up front rather than mishandled inside SnapshotStore.
  • Operators using the GitHub App private-key path for the GitHub Projects tracker adapter will now see TrackerUnavailable on filesystem errors instead of a raw OSError.

Acknowledgements

Thanks to the operators and reviewers who triaged the 2026-05-19 review-bot batch and to the CodeRabbit / Sourcery automation surface that surfaced the underlying findings.

v2.3.0

19 May 19:53
1ecd36a

Choose a tag to compare

v2.3.0

127 commits since v2.2.0. The headline is the tracker-adapter family landing: 10 backlog-tracker adapters now ship under a single TrackerContract, plus webhook ingestion and a plugin hookspec for third-party tracker plugins. The orchestration loop also gained an issue-to-PR pipeline, a retry-with-continuation path for success-without-commit runs, and a multi-agent handoff message bus that piggybacks on tracker comments. The supporting workstreams (review-bot acknowledgement gate, signed lineage audit log, secrets broker, telemetry-grounded autofix, Playwright self-testing sandbox) close several long-standing reliability and security gaps.

Highlights

  • Tracker-adapter family. 10 adapters land, all conforming to the single TrackerContract (Jira Cloud + DC, GitLab Issues, Linear, Plane, Asana, ServiceNow, ClickUp, GitHub Projects v2, plus webhook ingestion). Closes the gap operators have hit when integrating non-GitHub backlogs.
  • Tracker plugin hookspec + registry + CLI. Third-party tracker integrations now plug in via the same pluggy spec the orchestrator uses internally (#1599).
  • Issue -> plan-comment -> PR pipeline. New orchestration mode that walks a tracker issue through plan synthesis, plan-comment posting for human review, and PR creation in one path (#1600).
  • Tracker comments as a multi-agent handoff bus. Worker agents now coordinate over tracker comments so a session can resume across CLI restarts and across operator machines (#1606).
  • Review-bot acknowledgement gate. CodeRabbit and Sourcery findings classified as must-address now block merge until they are addressed in a fixup commit or acknowledged in the PR body with a structured marker. Nightly sweeper + reusable shepherd workflow template ship in the same PR (#1583).
  • Lineage v2 - signed audit log of tracker state moves. Each tracker-side state transition is captured as a signed lineage entry, so operators can audit the full chain when a ticket loses or gains the wrong label (#1602).
  • Playwright-based sandbox for UI/web agent runs. A new self-testing layer drives a Playwright context against the dev server, captures screenshots / console / network errors, and hands the structured result back to an LLM judge for verdict (#1603).

New features

Area Change
trackers 10 adapters land under TrackerContract (Asana, ClickUp, GitHub Projects v2, GitLab Issues, Jira Cloud, Jira DC, Linear, Plane, ServiceNow, plus webhook ingestion) (#1560, #1570-#1577, #1601)
plugins Tracker plugin hookspec + registry + bernstein trackers CLI (#1599)
orchestration Issue -> plan-comment -> PR pipeline (#1600), tracker comments as handoff bus (#1606), multi-tracker federation layer (#1561), retry-with-continuation on success-without-commit (#1596)
security Secrets broker for short-lived per-task tokens (#1605)
reliability Progress-watch liveness probe via session-log growth (#1597)
sandbox Playwright self-testing for UI/web agent runs (#1603)
lineage Signed audit log of tracker state moves (#1602), content-addressed trace store + viewer (#1564), per-ticket transcript bundle (#1562)
devops Scheduled upstream-signal sweep with operator rollup (#1594)
fleet Directory-based instance registry for multi-instance hosts (#1592)
eval YAML eval harness (#1565)
autofix Telemetry-grounded autofix MVP (#1566)
memory Long-running session memory (#1559)
observability Run-failure classification with structured tracker writeback (#1569)
git Stacked branches + per-snapshot undo (#1563)
quality Review-bot acknowledgement gate + nightly sweeper + reusable shepherd template (#1583)
cost Hard per-ticket cost cap with clean termination and tracker writeback (#1578)

Fixes

  • fix(adapters): refresh aider contract for the upstream --yes -> --yes-always rename; contract checker now distinguishes a broken upstream --help from real drift; CI workflow treats the new runtime-failure exit code as a warning rather than a hard fail (#1595).
  • fix(security): dispatch audit events outside the broker lock; index tokens by value (#1607). Split scorecard job so SARIF upload completes (#1613). Mask credentials in logger calls (#1519). Replace subprocess shell=True with list-form args (#1513). Close urllib / SHA1 / Trivy alerts (#1518).
  • fix(orchestration): tracker_pipeline review follow-ups (#1609); commit-completion module review-bot follow-ups (#1608).
  • fix(sandbox): Playwright runner review follow-ups, including asyncio.CancelledError propagation through broad except handlers and unsafe-task_id rejection (#1610).
  • fix(tui): restore startup banner regression + add coverage (#1568).
  • fix(ci): lock aider adapter-integration job to Python 3.13 (#1586); honour SARIF suppressions before Code Scanning upload (#1520); emit CI gate for paths-ignored-only PRs (#1521); restore minimum-required write permissions broken by security hardening (#1481).
  • fix(review): apply deferred review-bot findings batch (#1584).
  • fix(quality): bulk refurb auto-fix wave 1 across src/ (#1558).
  • fix(test): repair main-red after refurb auto-fix removed str() in _run_git (#1591).
  • fix(docs): sync agents-md module map for the devops sub-package (#1612).

Internal / quality

  • Bulk refurb auto-fix wave 2. FURB113 (repeated append -> list.extend, 259 sites), FURB107 (try/except: pass -> contextlib.suppress, 267 sites), FURB173 (dict spread -> | merge, 178 sites), FURB108 (chained == -> in {...}) - landed via libcst rewriter + ruff autofix (#1582).
  • Bulk refurb auto-fix wave 1. Initial refurb sweep across src/ (#1558).
  • CI dependency churn. actions/checkout v4 -> v6 (#1598), actions/upload-artifact v4 -> v7 (#1611), python pin to <=3.13 until adapter 3.14 compat is confirmed (#1590), aider adapter-integration job locked to Python 3.13 (#1586).
  • Adapter contract check. Truncated upstream --help output is no longer reported as N missing flags; surfaces on a dedicated runtime_failure field that the workflow treats as a warning rather than drift (part of #1595).

Upgrade notes

  • No manual operator action required. pip install --upgrade bernstein (or uv pip install --upgrade bernstein) brings v2.3.0 in.
  • Operators integrating with non-GitHub backlogs can now register their tracker via the new plugin hookspec (bernstein trackers --help for the CLI surface).
  • The new review-bot acknowledgement gate runs on every PR. Must-address findings need either a fixup commit (bot-ack: <id> in the commit message) or a PR-body marker (<!-- bot-ack: <id> reason=... -->).

v2.2.0

18 May 12:54
0aa3e2a

Choose a tag to compare

v2.1.0 closed the loop on routing observability. v2.2.0 is about the CI immune system: auto-heal grew teeth, the bot-PR class got eliminated, and five cross-discipline interventions (Toyota Lean, epidemiology, alarm fatigue, SPC) stopped recurring failure modes that had been costing real wall time. Three feature workstreams that slipped from v2.1 also landed.

Self-healing CI grew teeth

Auto-heal v2 shipped in v2.1 (#1393, 26 parameters, classifier + heal-branch + admin-merge) and produced zero successful heals in the first three weeks. Every main-red event still required a human-dispatched hotfix. Three things were wrong:

  • #1452 typos-cli 404. The fetch URL was stale; the workflow failed before classification. Added a 404-cordon so the daemon now opens a self-issue and stops rather than masking errors.
  • #1452 agents-md drift class was missing from the classifier. Lint drift from bernstein agents-md sync not running on doc-only commits looked like a new failure class to the heuristic. Added it.
  • #1452 composition order: ruff was running before agents-md sync, so the sync's whitespace tweaks looked like lint regressions. Reordered.

Plus the trigger leak: #1460 auto-heal pushed its fix branch but the heal-branch CI never started, because push events from GITHUB_TOKEN don't fire downstream workflows by default. Now explicitly dispatches.

Bot-PR class eliminated

#1449 moved contract-drift autofix from "open a PR with the regenerated lockfile" to "inline-push the regenerated lockfile to the PR head." That was the dominant bot-PR-class source. The recursive lint drift cycle that ate a Saturday afternoon is gone.

Cross-discipline CI hygiene wave

Five interventions, each borrowed from a discipline that already solved an analogous problem:

PR Discipline Intervention
#1454 alarm fatigue (anesthesiology) Weekly aggregated digest issue. Replaces N auto-release-skipped notifications with one rolling summary.
#1455 epidemiology (R0) Hotfix R-counter. Detects when a hotfix begets another hotfix. Two-in-a-row blocks further auto-merge until human triage.
#1456 Toyota Lean (Andon cord) Trunk health SLO + Andon gate. Holds merges on red trunk. Blocks the bug spread that auto-merge would otherwise inflict.
#1457 bisect on red Auto-triage main-red to culprit PR. Halves the median MTTR for main-red events.
#1467 SPC (control charts, META F) Idempotency self-check in regen_contract_drift. Second run of the same regen must be a no-op; if not, the regen is non-deterministic and the workflow halts.

Seven edge-case hardenings

The first three followed from the wave above. The next four are independent:

  • #1458 contract-drift fork-PR fallback shape. The inline-push path needs write to the PR head; on fork PRs that's denied. Now falls back to a comment with the regenerated patch.
  • #1459 R-counter benign-drift allow-list + classifier (EDGE-4). Auto-formatting churn on docs files is not a hotfix-class event. Distinct path.
  • #1463 advisory PR push-lock for parallel-agent waves (EDGE-6). Six-agent waves were racing on the same PR's branch. Soft lock prevents the lost-write that bricked one PR last cycle.
  • #1464 GH API rate-limit guard for long-running agent loops (EDGE-7). Token-bucket plus 429 backoff. Replaces the "wait two minutes and retry" pattern that triggered the secondary rate limit anyway.
  • #1465 trunk-Andon override escapes (EDGE-5). Two override paths (force-merge label, commit-message token) for the case where the Andon-detected breakage is the fix.
  • #1455 hotfix R-counter (also above) — paired with the Andon gate so the override loop has bounded depth.
  • #1450 hygiene for five noise-prone workflows (auto-release filter, scheduled cleanup, telegram dedupe, release-please if-cond guard, delete-master removal).

Branch-scoped CI concurrency

#1470 scopes the CI concurrency group by branch so rapid-merge bursts drain the queue instead of cancelling each other's downstream signals. Plus #1472 hotfix repair for three follow-on root causes (QR dep skip on macOS, GUI URL test path, release-please conditional). Plus #1473 and #1474 clearing actionlint annotation-cap noise via level=error and -shellcheck= flag — the cap was eating real signal under a wall of style nags.

macOS runner saturation fix

The macOS hosted-runner queue depth was 20-70 minutes during burst-merge waves. Issue #1468 categorised the failure mode. #1475 split macOS off the per-PR default matrix into two new gated jobs (test-macos, adapter-integration-macos) that fire on push-to-main, on macos_sensitive path changes, or on a macos-needed label. Added .github/workflows/ci-macos-nightly.yml for the full matrix daily at 06:00 UTC. CI-gate accepts legitimate macOS skips.

Caught a real bug a week later: #1476. The test_reaps_stale_heartbeat test was patching one binding of _is_process_alive but _refresh_heartbeat_from_signals had a separate binding defined locally in bernstein.core.agents.agent_lifecycle. The unpatched call fell through to a real os.kill(pid=999, 0). On Linux and Windows that raised; on macos-latest PID 999 was owned by a system daemon, so the call succeeded, the heartbeat got refreshed, and the test failed. Test-only fix; production reap path was correct.

AI-BOM export (#1438)

bernstein bom emit and bernstein bom verify. Three encoders behind one dispatcher: Bernstein-native JSON, CycloneDX 1.5 with the AI/ML extension shape, and SPDX 2.3 with AI-specific annotations. Pure projection from existing lineage / cost / adapter state -- no recomputed hashes, no I/O during generate_bom. Determinism enforced by Hypothesis property tests across all three formats. Tamper detection via sha256 chain. Closes #1371.

Diary + synthesis (#1432)

Two-tier knowledge layer over closed task transcripts. Diary writes one structured entry per closed task (tried/worked/failed/rationale/tags) with redaction of OpenAI keys, GitHub tokens, AWS access keys, PEM banners, and high-entropy hex. Synthesizer clusters diaries by tag-overlap Jaccard (stdlib only, no embeddings in v1) and drafts a markdown report. HITL-gated: reports default to approved: false. 142 tests including 20 Hypothesis property tests. Closes #1369.

Consensus relay (#1435)

HMAC-chained per-cycle handoff so an operator restarting a long evolution cycle can pull the prior cycle's decisions/blockers/open-questions/next-action into context without rediscovery. Atomic-write store at .sdd/runtime/consensus/<cycle>.json. bernstein consensus list|show|export|next|verify. 73 unit + 12 integration tests. Closes #1368.

PWA + tunnel + QR onboarding (#1442)

Operator GUI is now an installable PWA: web app manifest, service worker with stale-while-revalidate for /api/projects and /api/cost, programmatic maskable icons mounted under both / and /ui/. iOS Safari and Android Chrome install cleanly. bernstein gui serve --tunnel publishes through the existing tunnel driver registry (cloudflared / ngrok / bore / tailscale, auto-select), issues a URL-safe bearer token + 6-word diceware passphrase persisted at ~/.bernstein/dashboard.passphrase (0600), and prints an ASCII QR. bernstein gui qr [--rotate] reprints or rotates. 106 unit + 22 integration tests. Closes #1218.

Upgrade

pip install -U bernstein==2.2.0 or uv tool upgrade bernstein. No config migration. Existing diaries / consensus stores / BOMs are read-compatible.

v2.1.0

18 May 00:26
17949cf

Choose a tag to compare

I shipped v2.0.0 with a plan-routing bug that silently collapsed per-step cli: and model: pins onto the role default. v2.0.1 fixed that. v2.1.0 answers a different question: once routing works, can the system explain itself when a decision goes sideways. Most of this release is observability, calibration, and a CI loop that fixes its own breaks before a human notices.

Lineage v2 and simulate

Lineage v1 stored task ancestry in one flat table; forked sessions and detached children made the queries expensive. Lineage v2 (#1377) is two-layer, the production recorder writes both layers, and the CI gate (#1396) accepts that output.

bernstein simulate (#1378) is a digital-twin runner. Feed it a plan plus a route and it executes the orchestration without the adapter network. Rehearse an expensive plan before paying for it.

Self-healing CI

The pipeline now repairs main when a merge breaks something the autofixer can handle. #1389 added safe and heuristic autofix classes. #1393 grew that into the 26-parameter auto-heal v2. Pattern: red main, classify, fix PR, watch checks, admin-merge.

ProgramBench (#1407), a scenario generator (#1357), and a citation verifier (#1408) move three chores to eval.

Cost, criterion profile, decision log, calibration

bernstein simulate only matters if you can also see what the live orchestrator decided and what it expected to spend.

  • Per-task criterion profile (#1363) plus TOPSIS multi-criteria ranking (#1361). A "latency-sensitive" task routes differently from a "thorough" one.
  • Structured decision log (#1360) covering every routing, retry, and gate verdict with its inputs.
  • Calibration log plus Brier score (#1359). The forecast log got teeth.
  • Criterion-aware retry budget (#1355), per-quota-envelope attribution (#1413), calibrated p50/p90 cost preflight band (#1335).
  • The preflight cost estimator picks the most expensive role rather than the first one declared (#1395). The old behaviour underestimated by 40 to 60 percent on multi-role plans.

Security hardening

  • Invisible Unicode Tag codepoints stripped from injected skills before any prompt sees them (#1417).
  • Promptware cross-agent C2 strings detected in tool output (#1421).
  • MCP tool-call inputs JSON-Schema validated, deny-by-default (#1411).
  • Per-tool allowlist, fail-closed policy, read-only profile (#1326).
  • Constant-time HMAC compare (#1399), session_id log-injection sanitisation (#1341), Qwen and IaC adapters forward secrets via env not argv (#1390, #1392).

A security-pentest eval scenario (#1419) exercises it.

Adapters and GitLab parity

bernstein adapters check returns a conformance and capability report (#1385). bernstein compare runs a side-by-side adapter A/B (#1337). GitLab integration reaches parity with the GitHub app (#1379).

What didn't ship

  • npm wrapper: NPM_TOKEN scope is wrong after the org transfer. PyPI, Homebrew, GHCR ship.
  • bernstein-scheduled-maintenance.yml stays disabled while auto-heal v2 bakes.
  • AI-BOM export (#1371), task-transcript diary (#1369), cross-cycle consensus relay (#1368), installable PWA (#1218): tracked, did not make the cut.
  • A11y audit, theme toggle, mobile responsive web UI: still open from v2.0.0, see #1262.

Upgrade

pip install -U bernstein==2.1.0 or uv tool upgrade bernstein. No config migration. TaskCountsResponse grew two integer fields (abandoned, blocked_by_abandon) defaulting to 0; clients keep working.

v2.0.1

17 May 09:58
9723e3a

Choose a tag to compare

A patch on top of v2.0.0 — that's the actual web-UI release; v2.0.1 is the first cut that survived CI and made it to PyPI.

Why 2.0.1 and not 2.0.0 on PyPI

Three contract tests broke during the v2 UI integration: a route-parity check noticed the new /ui mounts; a CLI-callback signature drift caught the freshly-added idle flag; the README-coverage test wanted the new gui command in its allow-list. None of those were product bugs — they were guardrails firing exactly as intended on a big merge. While they were red, the auto-release pipeline correctly refused to ship.

By the time main came back green a few hours later, the version had already moved to 2.0.1. v2.0.0 is now a historical marker tag — pip / pipx / uv all install 2.0.1 by default. All the v2.0.0 functionality is in this release; the v2.0.0 release notes describe what shipped (screenshots and all).

What got fixed along the way

  • The /ui mount + /gui-meta route now satisfy both directions of the API-versioning parity check (#1272, #1279).
  • The Tasks-page drawer stopped re-popping after Esc / X / click-outside (#1269).

CI hardening (the bigger half of this release)

Right after the v2 cut shipped, an investigation surfaced a quieter problem: auto-release had been silently skipping for hours because the v2 commit's CI was cancelled by concurrency-cancel-in-progress, and the alert filter only listened for failure, not cancelled. We took that as a signal that the safety net had a hole and spent the rest of the day patching it.

A summary of what landed:

  • Aggregator gate-job (#1276). One required status check that fails on any non-success result — including cancelled, timed_out, action_required — instead of relying on individual matrix-job names that drift.
  • Silent-skip alerts (#1274, #1307). Telegram fires on anything that isn't success; auto-release opens (or updates, deduplicated by commit SHA) a tracking issue when it has to skip; a daily reconciliation cron compares pyproject.toml against PyPI.
  • Concurrency split (#1277). PRs still cancel previous runs on new pushes — but pushes to main queue instead of cancelling, so the release pipeline always sees a real conclusion.
  • Conditional allowed-skips (#1287). A skipped result only passes the gate when an upstream planner job said the skip was intentional. Concurrency-cancellations or unexpected skipped flips still fail.
  • Contract-drift auto-fix bot (#1278). When one of the three contract tests above flags drift on a PR, a bot proposes the fix (DOCUMENTED_COMMANDS / _INFRASTRUCTURE_PATHS / cli forward-arg) as a sibling PR.
  • Adapter contract drift detection (#1293). Same idea but for the external CLIs the orchestrator drives. 15 adapters (claude, gemini, codex, aider, opencode, aichat, crush, amp, continue_dev, plandex, goose, q_dev, gptme, forge, qwen) get their --help capability-asserted three times a day; when upstream renames a flag we depend on, CI goes red within hours.
  • Supply-chain coverage (#1284). OSSF Scorecard, SBOM on every release, actions/dependency-review on PRs, trufflesecurity/trufflehog secret-scan, Dependabot extended to the github-actions ecosystem.
  • Workflow security pass (#1296, #1299, #1300, #1308). 163 zizmor findings resolved across unpinned-uses, artipacked (persist-credentials: false on read-only checkouts), template-injection, bot-conditions, dangerous-triggers, ref-version-mismatch, cache-poisoning, excessive-permissions, dependabot-cooldown. The three jobs that legitimately push back to git keep their credentials with an annotated rationale.
  • step-security/harden-runner audit mode (#1285) on every workflow job — egress visibility before flipping to block.
  • pre-commit.ci + nightly fanout (#1275). Auto-fix lint / format on PR, and the nightly compliance / regression workflows (nightly-deep-tests, eval-nightly, SOC2 evidence, pentest) now route their failures through Telegram instead of dying silently.

Distribution

The release made it to:

  • PyPIpip install bernstein2.0.1
  • npmbernstein-orchestrator@2.0.1
  • GitHub Container Registryghcr.io/sipyourdrink-ltd/bernstein:2.0.1 (the publish workflow had been deleted in April; #1298 restored it)

Homebrew tap and the COPR / RPM channel both had pre-existing breakage that this release exposed; #1297 documents the Homebrew fix (needs a one-time PAT) and #1309 retires the COPR channel — pipx install bernstein works natively on Fedora 41/42 anyway.

Upgrade

pip install --upgrade bernstein
bernstein gui serve

v1.10.x configs and plans run unchanged. The CLI / TUI surface is the same.

Tracking

  • Web UI contributor playbook: #1262
  • CI hardening rollout: #1273
  • Adapter contract drift: #1291