Skip to content

Latest commit

 

History

History
359 lines (259 loc) · 169 KB

File metadata and controls

359 lines (259 loc) · 169 KB

Release Notes

This file is the canonical release log for tracked Orchestrarium monorepo changes that matter at publication time.

Update RELEASE_NOTES.md in the same change whenever tracked modifications affect installed behavior, governance, routing, role contracts, install surface, developer or operator workflow, or other release-relevant user-facing expectations.

Release-note entries should explain improvements, not just name them. For each release-relevant entry, state what changed, why it matters, which user or operator workflow is affected, and any important preserved behavior, boundary, or exemption when that context helps the reader understand the impact.

Keep the log in reverse-chronological ## YYYY-MM-DD sections. Add new explanatory bullets under the current date heading, or create today's date heading if it is missing. Do not keep a long-lived ## Unreleased bucket or silently rewrite older dated sections in place.

Do not add entries for purely local-only hygiene edits such as formatting, link fixes, report-only churn, scratch cleanup, archive moves, or non-semantic wording cleanup.

2026-05-30

Added

  • Added a new-session orientation guide so maintainers do not have to rediscover the source/runtime boundary every session. docs/new-session-guide.md is now the first-read guide for Orchestrarium maintenance sessions, linked from README.md, docs/README.md, AGENTS.md, and CLAUDE.md. It states the source-first rule directly: tracked Orchestrarium source is canon, while ~/.codex/, ~/.claude/, project .agents/, and project .claude/ are installed runtime outputs to sync only after the owning source file is fixed. It also maps the main source owners, runtime surfaces, common maintenance patterns, validation commands, and "do not" cases such as patching Claude-specific Agent mechanics into Codex by analogy. Why it matters: recent incidents came from starting in stale installed surfaces or having to re-explain that this monorepo is the primary source; the guide makes that orientation durable for new sessions.

  • Conversational terminology discipline added to the shared governance spine. The existing Documentation terminology discipline rule required a ## Terms and Abbreviations section only in human-facing DOCS; the main conversation's live CHAT replies still emitted unexplained jargon, acronyms, and domain/role/provider/tool terms, which the user reported as unreadable ("в чате mainconv постоянно сыплет ими и ничего не поймешь что он говорит"). The rule in shared/AGENTS.shared.md now also binds term-dense conversational replies: expand jargon inline or list the terms. It was added as a compact clause to the spine's existing rule rather than as a new card or a cap bump — per the spine validator's own documented policy (scripts/validate-agents-spine.py), a genuinely new always-on rule is the trigger to keep the operational card compact, not to raise the size ceiling; the spine went 39,815 → 39,885 chars, still 15 under the 39,900 cap. Why it matters: governance previously made published docs readable to outside readers but left the live working conversation dense with insider shorthand, so the human steering the work could not follow it.

Changed

  • Codified "read broad, change narrow" in the shared governance spine. The minimization disciplines (Change-surface minimization, smallest safe reversible change) govern what a change touches, not what it reads — but that boundary was implicit and was being misapplied as license to read code narrowly (a few lines at a time) to conserve context, which under-reads and yields wrong changes. The Readability and local reasoning rule in shared/AGENTS.shared.md now states it explicitly: "Before changing, read what the change needs, not a narrow slice; minimization bounds the change, not the read." It was added as a compact clause with no cap bump (per the spine validator's own policy a genuinely new always-on rule keeps its operational card compact rather than raising the size ceiling); the adjacent secondary checks line was tightened to recover headroom, leaving the spine at 39,887 / 39,900. Why it matters: correctness depends on reading enough surrounding context — the whole function, its call sites and dependents — so reading too little to save tokens is a false economy, while the minimization rules correctly bound only the change surface, not the reading.

Fixed

  • Made "stop closeout, work" a hard continuation trigger instead of an acknowledgement trap. The existing continuity rules already said a passed slice is not goal completion and that дальше / go / продолжай continue the next incomplete item, but they did not explicitly cover user corrections such as "завязывай с closeout, работай". Agents could therefore answer with a bare acknowledgement like "OK, closeout removed" and still stop. Shared governance now treats stop closeout, завязывай с closeout, работай, and equivalent continue-working corrections as instructions to immediately take the next concrete action inside the active task unless genuinely blocked. The rule is mirrored into the lead operating surfaces, agents-mode continuity reference, shared subagent operating model, and new-session guide. Why it matters: the user should not have to spend every new session re-teaching Codex that a closeout correction means continue working, not produce another meta-summary.

  • Made context compaction a first-class continuity trigger instead of a license to summarize and stop. The shared spine already said side requests do not replace the primary task and that partial batches are not completion, but it did not name the context-compaction/resume-from-summary transition explicitly. In practice that allowed an agent to wake up after a compacted summary, trust stale sliced evidence, emit a final-style status, or stop instead of restoring the active task. The shared AGENTS.shared.md primary-task rule now says context compaction may interrupt but not replace the task; on resume from a summary, the agent must restore the primary task, next unchecked step, and open evidence gates before acting. The same rule is mirrored into the lead operating surfaces and the agents-mode continuity references for Codex, Claude, Gemini, and Qwen. Why it matters: a compacted summary is a recovery artifact, not a closeout signal; required gates such as full-window crop validation must survive the context transition.

  • Removed the Qt-only bias from $windows-gui-manual-testing and made crop validation an early hard gate across the visual-video workflow. The Codex UI metadata exposed the common skill as "Qt Windows GUI Manual Testing" and its default prompt asked for "a Qt desktop UI", while the skill description and shared common-skill index also led with Qt wording. That made the skill look toolkit-specific even though its workflow already covered native Windows, WebView/native-child, and other desktop surfaces. The Codex agents/openai.yaml now names it "Windows GUI Manual Testing", the provider skill descriptions and overview text name Windows desktop GUI broadly, and the shared index/README stop describing it as Qt-first. Separately, the crop rule that previously lived deep inside $windows-gui-manual-testing step 3 is now a top-level hard crop gate in both $windows-gui-manual-testing and $analyzing-video-bugs: before judging cropped screenshots, videos, or extracted frames as full-window evidence, the agent must verify the crop against an uncropped full-frame/full-desktop image and confirm the full target window is present, including the top-right close button for normal overlapped windows; clipped crops must be discarded and re-extracted. The live ~/.codex/skills/ copies were updated with the same wording so current Codex sessions see the corrected metadata before the next reinstall. Why it matters: the reported MarkMello failures were not caused by missing visual methodology — the methodology existed — but by trigger/UI wording and buried crop validation that let agents take a "convenient" crop before proving it was valid.

  • Closed the UNC machine-local-path leak gap and fixed the U+2026 ellipsis-placeholder false positive in the shared allowlist owner. Two follow-ups against check-machine-local-path.py (both packs, kept byte-identical) — the single allowlist owner consumed by the machine-local-path PreToolUse hook directly and by the publication leak-scanner via in-process import of find_machine_paths, so each change applies to both surfaces at once. (1) UNC host/share homes now block. The owner previously detected only drive-letter and POSIX/MSYS homes, so a concrete UNC home such as \\host\Users\<name> or \\server\share\...\Users\<name> written into a tracked file slipped past both the hook warning and the scanner block — the gap the prior scanner entry recorded as a tracked follow-up. A new left-anchored _PATTERNS entry now flags genuine UNC user homes (host or host+share, including deep shares) while a deliberate left-anchor plus host-label requirement keeps it from self-matching a drive-prefixed escaped Windows path literal such as the JSON "C:\\Users\\test" (a doubled \\Users\\ with no host label before it is not a UNC path); the \\?\ and \\.\ device namespaces are excluded by a negative lookahead because their embedded drive form is already caught by the drive-letter pattern. Accepted residual under-match (left out to avoid device-namespace / URL false positives): \\?\UNC\host\share\... and forward-slash //host/Users/.... (2) U+2026 ellipsis placeholder no longer false-positives. A documentation path segment written with the Unicode horizontal ellipsis U+2026 instead of an ASCII ... run — e.g. C:\Users\... in backslash, forward-slash, or bare form, or a mixed dot+ellipsis segment — was treated as a concrete account name and flagged; the placeholder filter now accepts the U+2026 character (and any dot/ellipsis mix) as an ellipsis placeholder, matching the existing ASCII ... behavior. The no-Python fallback regex in check-publication-safety.sh is intentionally NOT extended for UNC (it is a rare degraded path that already over-blocks). Covered by new gate-safe regression rows in both tests/test_machine_local_path_hook.py and tests/test_publication_safety_scanner.py (UNC and ellipsis fixtures are assembled from runtime fragments so the test sources never self-trip the scanner). Why it matters: the hook and the publication gate now catch a real machine-local-path leak class they previously missed (UNC homes), while stopping a false alarm on a legitimate documentation placeholder, so the gate stays both stricter where it should block and quieter where it should pass.

  • Publication leak-scanner now allowlists placeholder paths — and in doing so closed three real machine-local-path leak gaps it had. check-publication-safety.sh (both packs) blocked git push whenever a staged file contained C:\Users\<name> / C:\Users\... — placeholder documentation (the machine-local-path hook's own examples in INSTALL.md / RELEASE_NOTES.md / AGENTS.codex.md) that is safe and already public — because its [A-Za-z]:\\Users\\ pattern matched the full content of staged files with no allowlist. It recurred on every commit touching those docs and forced a separate security-reviewer clearance each time. The scanner now routes path-pattern candidate lines through the existing check-machine-local-path.py allowlist (find_machine_paths, resolved relative to each scanner's own location — single source of truth, no duplicated token list), so placeholders pass while concrete machine-local paths still block; non-path patterns (secrets, tokens, transcript markers) stay an unconditional block path; and if Python is unavailable it falls back to a refined regex plus a degraded-mode stderr notice, never fail-open (fail-safe = over-block). A security-engineer designed the change (approach B) and a security-reviewer gated it. In the process it closed three leak gaps the old scanner silently had: (1) forward-slash and leading-slash home forms (C:/Users/, /home/, /Users/, /c/Users/) were never matched on Windows because the bundled MSYS bash rewrites leading-/ arguments before git grep sees them — the in-process Python matcher is immune; (2) workstation dev roots (D:\dev\, even the repository's own root) were not detected at all; (3) lowercase c:\users\ slipped past the case-sensitive pattern. Covered by a new gate-safe regression suite tests/test_publication_safety_scanner.py (leak strings are assembled at runtime so the test file does not self-trip the scanner); the pre-existing tests/test_machine_local_path_hook.py was given the same gate-safe treatment because the stronger scanner would otherwise flag its legitimate test data. UNC paths (\\host\Users\...) were a known remaining gap in the shared allowlist owner at the time of this change; that follow-up is now closed by the UNC + U+2026 entry above. Why it matters: the gate stops crying wolf on safe placeholder docs (a gate that fails on every commit gets bypassed) while actually catching more real machine-local-path leaks than before.

2026-05-29

Changed

  • Backported the $mathtype-book-page common skill's new "Full-Text Coverage Gate" from active Codex downstream use into the canonical pack source. The user iterates on this skill in their live ~/.codex/skills/mathtype-book-page/ install; this change pulls those edits back so other adopters inherit them on next install instead of rediscovering the gap. What changed: SKILL.md gained a ## Full-Text Coverage Gate section (verify the translated DOCX covers all source text — no silently dropped paragraphs, sections, or list items — before claiming PASS) and its description trigger list gained "full source-text coverage"; references/defective_chunk_repair.md gained a ### Coverage Lane subsection; the Codex agents/openai.yaml short_description became "Full-coverage MathType…" and its default_prompt now names full source-text coverage as a verification step. Scope: applied to src.codex/skills/mathtype-book-page/ (all three files, exact backport of the edited global) and src.claude/skills/mathtype-book-page/ (SKILL.md + references/; Claude has no agents/openai.yaml), keeping the Claude and Codex copies byte-identical per the common-skill dual-shipping contract. Gemini and Qwen copies are intentionally deferred — they sit on an older base (also missing the earlier Typography Normalization / Heading Case sections) and are scheduled for a separate reconciliation pass consistent with their secondary-provider priority; this is a tracked follow-up, not an oversight. This matters because full-text coverage is a load-bearing correctness gate for book-page translation: without it a chunk can pass formula, typography, and figure checks while silently omitting source paragraphs, and the defect is invisible to every downstream check that does not compare against the source PDF's full text.

  • Backported active-use methodology into three more common skills, after a pre-install divergence gate caught the repo lagging the live installs. Before re-installing the packs globally, a per-skill comparison of the repo source against the live ~/.codex/skills/ install found the repo was behind on ~59 lines of methodology the user had iterated downstream but never backported. Merged into both packs ($analyzing-video-bugs and $mathtype-book-page stay byte-identical across packs; $windows-gui-manual-testing differs only in the Claude/Codex description provider word): $windows-gui-manual-testing gained the full Windows capture recipe (top-level HWND + GetWindowRect + virtual-screen metrics + DwmGetWindowAttribute extended-frame bounds, UIA preflight with the close-button crop anchor, a verification-frame-before-repro discipline, multi-monitor / high-DPI handling, and the $procId-not-$pid PowerShell gotcha) while keeping the repo's Qt-aware description and intro; $analyzing-video-bugs gained 4fps-coarse-then-60fps-dense extraction, multi-monitor target-window cropping, contact-sheet building, and the Windows -pattern_type glob caveat, plus a fixed dangling § Capture video cross-reference (now points at the real step); $mathtype-book-page gained connector-prose-between-display-formulas and inline-condition punctuation / OCR-residue guidance. The two visual skills ($windows-gui-manual-testing, $analyzing-video-bugs) join $bug-hunting as one cross-referenced family ($bug-hunting Rule 4 ⇄ the two visual skills' Related sections and shared decision tree); $mathtype-book-page stays a separate category and is not linked into that family. Why it matters: skills are a two-way channel — they get iterated both in active use and in the repo — so a blind repo→global install silently downgrades any un-backported downstream edits. The pre-install gate makes that divergence visible per skill and direction, and this backport reconciles the repo with both global installs rather than clobbering them.

  • Hardened the external-review invocation contract in src.codex/skills/consultant/SKILL.md. It now forbids --permission-mode plan for a non-interactive Claude review (plan mode makes the reviewer present a plan/handoff instead of emitting the verdict; under claude -p the captured stdout is then only the handoff turn, so the actual review never lands in the file) and requires a single-turn self-contained verdict between BEGIN_REVIEW/END_REVIEW plus a post-run check that the captured file actually contains the verdict. This matters because a real review dispatched with --permission-mode plan silently returned no usable verdict (the captured file held only "review complete — waiting for your direction") and exit-0 was nearly accepted as PASS; the rule closes the recurring review-output-contract failure (see the 2026-05-28 kosyak). Affected workflow: any operator dispatching a Claude reviewer via CLI.

  • Regenerated the stale subagent-operating-model.md normalized fingerprint in the Claude validator (src.claude/agents/scripts/validate-skill-pack.sh, 1a8291b9…f666dc58…). The shared doc was updated in an earlier feature commit but the Claude validator's expected hash was not regenerated in that commit (the Codex validator already carried the correct hash), so validate-skill-pack.sh had been failing that one check. Both validators now pass; no doc content changed — only the stale expected hash.

  • Shrank the shared governance spine below the Claude Code 40,000-char context warning without losing any enforceable rule. shared/AGENTS.shared.md (installed verbatim as AGENTS.md, so always loaded by the main conversation) was 61,905 chars and tripped Claude Code's per-file performance warning; it is now ~39,815. Method: each rule's elaboration (rationale, worked examples, recovery procedures, the full glossary, common-skill layout, delegation prose, the full verification-discipline prose) moved to verbatim on-demand extracts under the new shared/references/spine/ directory, while the compact operational form of every rule — triggers, fail-closed conditions, banned-phrase lists, gate names, probe commands, status labels — stays in the always-loaded spine and is self-sufficient for compliance on its own. A new scripts/validate-agents-spine.py — now run on every pytest tests/ via tests/test_agents_spine_validator.py, not only as a manual CLI — enforces the size cap (39,900, a guard band below the warning) plus a 63-token "lose-nothing" protection manifest. Honest scope: the manifest is a token-presence + size check that fails if a pinned enforcement token leaves the spine, a reference pointer dies, a discipline card is orphaned, or the cap is exceeded; it is NOT a semantic-binding check and would not by itself catch a rule reworded from MUST to optional while its pinned token stays present. The existing pack validators independently require key formula/terminology/verification phrases to remain in the installed AGENTS.md itself. Verified by a three-angle review loop (extracts confirmed superset copies — the spine compacts the prose, so they are not byte-identical; both pack validators 344/344 and 408/0; full suite green) which caught and forced restoration of ~10 enforceable specifics the first cut had over-compressed (e.g. the git reset --hard "and nothing depends on them" recovery guard, the split-brain UI-observability clause, the AskUserQuestion verification mechanism). Why it matters: the always-loaded governance no longer degrades main-conversation performance, and new governance DETAIL now lands in extracts rather than re-growing the spine. Boundaries: shared/references/spine/ extracts are English-only depth for maintainers and are not copied to install targets — the installed spine is self-sufficient, and the spine carries a single note pointing maintainers to the repo-source extracts rather than per-rule pointers that would be dead at install sites.

  • Backported the $bug-hunting common skill's two-sided "Rule 0" from the user's live global Claude install into the canonical pack source (both src.claude/skills/bug-hunting/ and src.codex/skills/bug-hunting/, kept byte-identical per the dual-shipping contract). Rule 0 now binds at BOTH ends of every fix: prove the CAUSE with a runtime log before editing (unchanged), AND prove the FIX is perfectly correct with a clean runtime log after editing — re-run with diagnostics still in, confirm the symptom is gone and no adjacent behavior regressed; "it should work now" / "this is the right fix" is never confirmation, and a fix asserted correct without a runtime log proving it is a Rule 0 violation as severe as patching on an unverified cause. Why it matters: the original Rule 0 only guarded the cause side, so a correctly-diagnosed bug could still ship an unverified fix and be called "done"; this closes the after-the-fix gap.

  • Fixed the bugfix-discipline PreToolUse hook's long-session false positives (check-bugfix-discipline.py + hook_common.py, both packs, kept byte-identical). Root cause (verified against a real transcript): the hook matched bug-trigger phrases against the most recent role=user transcript entry, but in Claude Code tool_result blocks and harness injections (<system-reminder>, <task-notification>) are also recorded under role=user — so in a tool-heavy turn the "last user message" was a tool_result or notification dense with trigger words (fix, Error:, broken, regression), firing the guard on legitimate edits (it false-fired ~15× in one session). Fix: a new last_genuine_user_message() walks the transcript in reverse, skips entries that carry no human-typed text (pure tool_result / pure injection) and strips <system-reminder>/<task-notification>/<local-command-stdout> spans, so triggers match only what the human actually typed. This also fixes a latent false-NEGATIVE (a real bug report buried behind many tool_result entries was never seen). Separately fixed the override escape that never worked: the deny message tells the model to place [skip-bugfix-discipline] in its own reply, but the hook only scanned the user message for the marker — it now scans the assistant's current-turn text too. The marker (and the discipline signals) are matched against assistant-authored text ONLY, not tool output: 8+ tracked repo files literally contain the [skip-bugfix-discipline] string, so a broad scan would have let the guard disable itself whenever a session read one of those files in a bug-report turn. Also added real Codex transcript-shape support: Codex rollouts nest the message under payload with input_text blocks ({"type":"response_item","payload":{"type":"message","role":"user",...}}), which the shared helpers did not recognize — so the Codex copy of the hook had been inert on real transcripts; is_user_message/extract_text/extract_user_typed_text now handle the payload shape (and correctly exclude Codex function_call_output tool I/O). Covered by tests/test_bugfix_discipline_hook.py (17 cases incl. tool_result/system-reminder/task-notification pollution, the buried-bug-report false negative, the override-in-assistant-reply vs override-in-tool-output distinction, a discipline-signal-in-tool-output non-bypass, and the Codex payload shape). The improved hook is restored to live configs on the next pack install (it had been temporarily removed during implementation because the old version false-fired).

Added

  • --tool-matcher parameter on scripts/install-hypothesis-hook.py. The structural-hook installer can now register a hook with a custom PreToolUse matcher (default unchanged: Edit|Write|NotebookEdit|apply_patch), so a hook can fire on a different tool set without disturbing the existing entries. Idempotency still keys on the per-hook --script-marker, so multiple hooks coexist as independent entries. This matters because it is the enabling mechanism for hooks that must watch tools beyond the code-mutating set.

  • machine-local-path PreToolUse audit hook (auto-installed, warn-only). Warns when a machine-local absolute path — a concrete user home (C:\Users\<name>) or workstation dev root (D:\dev\...) — is written into a non-.scratch/ tracked file (the machine-local-path-provenance smell), while allowing placeholders (<you>, %USERPROFILE%, ${CLAUDE_PROJECT_DIR}, $HOME, and common example tokens). It matches the edit's own tool_input (not session context, so it does not false-positive on unrelated discussion), emits a UTF-8 stderr warning, never blocks (AUDIT mode — promotion to a blocking deny is a separate reviewed step once the false-positive rate is measured), and fails open. Per the source-hygiene rule the hook lives in the new typed agents/hooks/ (Claude) / skills/lead/hooks/ (Codex) directory; hook_common.py stays in the grandfathered scripts/ dir and is imported from there. Covered by the existing --no-hypothesis-hook opt-out. A companion no-trash-in-repo audit hook ships alongside it (also auto-installed, warn-only, fail-open, in the same typed hooks/ dir, registered with a Bash-inclusive matcher so it sees the git worktree add command). It is the stray-artifact guard: it warns when a Bash command confidently runs git worktree add — the unrequested-worktree side effect — and stays silent on git worktree list/remove/prune, git add (not git worktree add), git inside a quoted string, and non-git commands; the parser is shell-aware (shlex tokenization, command-position tracking across &&/;/|/(, env-assignment-prefix and git-global-option skipping) and fails open on any tokenizer error. Two earlier same-day iterations were discarded before this one: a first version flagged personal-workflow dir NAMES (dev/, notes/, …) and false-fired on ordinary structure, and a second narrowed the list to kosyaks/mistake-log — but name-matching was the wrong axis entirely, because those are the user's personal-process vocabulary, not names the agent (the actor a PreToolUse hook guards) ever creates, so the hook never fired on real agent behavior. The redesign (recommended by a Codex gpt-5.5 xhigh design consultation) keys on the OPERATION instead, which is what the user actually reported (the agent creating worktrees it was not asked for). Deferred: the Claude Agent tool's isolation: "worktree" form (needs a captured PreToolUse envelope to confirm the field shape). Dropped as too FP-prone for a warn-only signal: outside-repo writes (a static allow-list floods on legitimate installs/temp/global-config/memory writes) and arbitrary in-repo trash (no reliable non-name signal — that stays governance). The filename and install-marker check-no-trash-in-repo are retained for install continuity (a rename to check-stray-artifact is a tracked follow-up). Covered by tests/test_no_trash_hook.py (19 cases: every git worktree add form incl. cd-chained / -C-prefixed / env-prefixed / subshell / absolute-git, plus the silent cases and fail-open). Why it matters: a warn-only AUDIT signal only earns its keep if it almost never false-fires AND actually fires on the real problem — the name-based versions could never fire on the reported behavior (they keyed on the wrong actor's vocabulary), while operation-based worktree detection catches the concrete unrequested side-effect with a shell-aware parser that under-warns rather than false-fires on exotic constructs.

2026-05-17

Added

  • Added two governance rules to shared/AGENTS.shared.md ### Scope and ownership discipline: Directory-level entity separation (organize source directories around one primary entity type + ownership/lifecycle contract, classify by owner/lifecycle + I/O contract before placement, build output / caches / __pycache__ / .scratch/ exempt; grandfathered exception clause for documented co-located dirs) and Trash hygiene and archival (delete or archive obsolete files within the admitted change surface and subject to Worktree safety; archival path with breadcrumb when deletion loses recoverable history). Codex $architecture-reviewer designed the directory rule wording across a two-stage delegation chain: first critique of main conv's draft (numeric ≥3 entity types threshold flagged as gameable, "surface convenience" wording flagged as undefined, examples list flagged as too repo-specific for shared governance), then counter-proposal with classification test (owner/lifecycle + I/O contract signal pair) replacing the numeric threshold. The trash hygiene clause came from user-stated principle (verbatim: "не оставлять мусор, хотя бы складывать его в архив"). Plan went through two review rounds (bkjese79w → REVISE on missing shared/references/ update + Claude-overlay-reachability + Worktree-safety guard; beztgful6 → REVISE-1 on shared/references/README.md scope clarification because the new source-hygiene doc intentionally names pack/source paths which the existing README scope excluded; v3 applied each finding's required fix and landed text-only). New shared design narrative + grandfathered-exception text now lives at shared/references/repository-source-hygiene.md + RU mirror shared/references/ru/repository-source-hygiene.md. Both repo-overlay docs (AGENTS.md Codex-maintainer overlay + CLAUDE.md Claude-maintainer overlay) carry one-line pointers at the shared ref doc so future maintainers from either side see the rule + exceptions without text duplication. Three deliberately co-located directories (src.claude/agents/scripts/, src.codex/skills/lead/scripts/, repo-root scripts/) are grandfathered exceptions for user-copy / install-script convenience: install scripts hardcode current source-tree paths and destination paths, hooks.json / settings.json point at current command paths, user docs reference current paths, and forcing a refactor on every existing operator install does not justify the per-entity-type filesystem cleanup gain since the existing naming convention (check-* / invoke-* / validate-* / install-* / etc.) already provides per-entity-type discoverability. The exception is grandfathered, not extensible — future new entity types (new hook, new wrapper, new validator) MUST follow the shared rule (typed subdirectory or canonical home), not extend the legacy co-located dirs. Both rules land text-only: no repo refactor, no file moves, no installer or hooks.json changes, no Codex re-trust, no live install propagation. This matters because the repo previously had no governance rule for directory-level entity organization — the closest existing rules (Ownership / extension-seam hygiene, Change-surface minimization, Interface and encapsulation hygiene) all govern code-level / commit-level / role-level ownership but did not address how files cluster in filesystem directories; without a rule, future maintainers had no principled basis for "does this new script belong in scripts/ or its own scripts/<category>/?" and naturally defaulted to junk-drawer accretion.

Changed

  • Rewired the hook from git push trigger to pre-fix Edit/Write/apply_patch trigger; renamed check-hypothesis-disclosurecheck-bugfix-discipline. The old hook was conceptually misplaced — it fired on git push to check whether the HEAD commit message contained VERIFIED: / ASSUMPTION (UNVERIFIED) markers. As the user pointed out repeatedly during the rewire conversation, git has nothing to do with hypothesis verification: hypothesis discipline lives in thinking before editing, which is upstream of any publication mechanism. By the time git push fires, any unverified-hypothesis edit has already happened and committed — the harm is done, the hook just adds late ceremony around a commit-message format check that's trivially bypassable (write a fake VERIFIED: line or use a whitelisted commit type prefix). The previous design was theatre dressed up in "third structural enforcement layer" language. The new hook catches the actual failure mode: the model is about to invoke Edit/Write/NotebookEdit (or Codex's apply_patch) in response to a user message that contains a bug-report or change-request signal (fix, change, broken, не работает, исправь, пофикси, поменяй, traceback, Error:, etc.), but did NOT first invoke /agents-bugfix or otherwise capture diagnostic data. The hook reads the PreToolUse envelope's transcript_path, parses the recent JSONL transcript, and: if last user message contains no bug-trigger phrase → exit 0; if it contains the per-turn override marker [skip-bugfix-discipline] → exit 0; if the current turn shows discipline signals (/agents-bugfix invocation, agents-bugfix skill load, text containing diagnostic/hypothesis/reproducing/VERIFIED:) → exit 0; otherwise emit a structured permissionDecision: "deny" payload that tells the model exactly how to comply (invoke skill, capture diagnostic data, or use override marker). The trigger phrase list is deliberately broad — it includes routine change-request verbs (fix, change, исправь, пофикси, поменяй) because those are the most common user requests on which the model jumps to first hypothesis without diagnostics; false positives on legitimate non-bug change requests (e.g. "fix this typo" — really a docs edit) are handled by the per-turn [skip-bugfix-discipline] override marker. Friction is by design. Files changed: new src.{codex,claude}/.../check-bugfix-discipline.{py,sh,ps1} (python brain + thin Bash/PowerShell wrappers); deleted old check-hypothesis-disclosure.{sh,ps1} from both packs; updated scripts/install-hypothesis-hook.py matcher from Bash (with if: "Bash(git push *)" filter) to Edit|Write|NotebookEdit|apply_patch regex, SCRIPT_MARKER renamed; scripts/install-{claude,codex}.{sh,ps1} paths updated to new script name; Codex+Windows entry switched from broken bash 'C:\...' (which silently resolved to WSL bash → "No such file or directory" → exit 1 on every Bash call) to native powershell.exe -NoProfile -ExecutionPolicy Bypass -File '<abs>\check-bugfix-discipline.ps1'; Bootstrap blocks in src.claude/CLAUDE.md and src.codex/AGENTS.codex.md rewritten for the new trigger model; INSTALL.md structural-enforcement section updated; both shared/references/evidence-based-answer-pipeline.md and its Russian mirror rewritten — the three-layer pre-fix/pre-commit/pre-push audit chain became a two-layer pre-fix/pre-commit one (with explicit note that pre-push enforcement was the wrong design). Tests updated for the new entry shape (18 pass + 1 Windows-symlink skip). The --no-hypothesis-hook flag and ORCHESTRARIUM_NO_HYPOTHESIS_HOOK env var names retain the legacy prefix for back-compat with operators who scripted them. Operator action required after install: Codex marks the modified hook as untrusted (new script content hash + new command shape both trigger Codex's "modified since last trusted" gate), so users must re-trust the entry via the interactive codex TUI hook browser once before the new hook fires; Claude Code has no analogous trust gate and fires immediately. This matters because the previous hook was actively harmful on Codex+Windows installs with WSL on PATH (every Bash tool call failed with "hook exited with code 1"), and the conceptual mistake of tying hypothesis discipline to git push meant the discipline never actually caught the failure mode it was supposed to (model jumping to Edit without thinking) — now it does.

Fixed

  • Fail-open hardening of the check-hypothesis-disclosure.sh hook script in both packs after a real-world bug surfaced from a Codex CLI session: the user reported PreToolUse hook (failed) — error: hook exited with code 1 firing on every Bash invocation, blocking the whole session. Empirical reproduction this session confirmed the root cause — the canonical script's python3 -c 'import json,sys;d=json.load(sys.stdin);print(...)' raises JSONDecodeError when stdin is empty or non-JSON; combined with set -euo pipefail at the script top, the pipefail propagation triggers exit 1 before the script reaches its self-filter that would have allowed non-git push Bash calls to pass through. Claude Code's hook entry carries an if: "Bash(git push *)" permission-rule filter that limits firing to actual push attempts, so the bug never surfaced for Claude users — but Codex CLI's hook matcher is regex-on-tool-name only (no analogous if: filter), so the script fires on every Bash tool call in Codex, and any one with bad/empty stdin crashes the entire session's tool dispatch. The fix adds two defensive layers in src.codex/skills/lead/scripts/check-hypothesis-disclosure.sh and src.claude/agents/scripts/check-hypothesis-disclosure.sh (mirrored): (1) the inline python3 -c call now wraps json.loads(sys.stdin.read()) in try/except Exception so any parse failure sets d = {} instead of raising — python always exits 0; (2) shell-level 2>/dev/null swallows any python stderr and || command_str="" belt-and-suspenders on the command-substitution falls back to empty string if python itself somehow errors (missing binary, PATH issue, OOM). When command_str ends up empty, the existing git push regex check below fails to match and the script exits 0 (the pass-through branch). The hook only ever blocks when a behaviour-changing commit lacks VERIFIED:/ASSUMPTION (UNVERIFIED) markers; any bad input gracefully degrades to no-op. The .ps1 siblings of both scripts were audited and already handle bad stdin correctly ([string]::IsNullOrWhiteSpace($stdinText) early exit + try/catch on ConvertFrom-Json); no .ps1 change needed. Empirical verification this session: 5-case test (empty stdin via echo "", completely empty via < /dev/null, malformed JSON, valid non-git-push envelope, valid git-push envelope) — all exit 0 on the patched script, where the unpatched script exited 1 on the first three. The capture-investigation that was supposed to record Codex's actual stdin envelope was blocked by Codex's trust-refusal of the temporarily-installed logging wrapper (Codex hashes file content for trust, not just command text); the fail-open fix is the right answer regardless of envelope shape because it gracefully handles every malformed/missing-input case. Operator action required after install: Codex CLI marks the modified hook script as untrusted (per Codex's "Modified since last trusted - review required" mechanism), so users must re-trust the entry via the Codex interactive TUI hook browser once before the fixed script fires; Claude Code's hook does not have an analogous trust gate.

  • Repaired the invoke-codex-prompt.ps1 and invoke-claude-prompt.ps1 prompt-orchestration wrappers for default Windows hosts. The wrappers are the canonical external-CLI prompt-orchestration surface for the Claude pack on Windows, but were silently broken on the most common install topology (default powershell.exe host launching the wrapper + npm/nvm4w-installed Codex or Claude CLI resolving to a .ps1 shim) — a recent dispatch attempt this session that bypassed the wrappers and used inline & codex instead surfaced the regression. Four compounding bugs identified empirically through sequential smoke testing and fixed in one coordinated change: (1) System.Diagnostics.ProcessStartInfo.ArgumentList (used to build the child argv) is a .NET Core 2.1+ / PowerShell 7+ API; on Windows PowerShell 5.1 (the default powershell.exe interpreter) the property is null, and .Add(...) raises InvokeMethodOnNull. (2) [System.Diagnostics.Process]::Start with UseShellExecute = $false can only launch native .exe binaries; when Get-Command codex/claude resolves to a .ps1 shim (the npm/nvm4w default), Start throws "not a valid application for this OS platform". (3) With the wrapper's top-of-file $ErrorActionPreference = 'Stop' inherited into the native call, any byte the child process writes to stderr — including normal progress messages like Codex's "Reading prompt from stdin..." — becomes a fatal NativeCommandError that kills the wrapper before $LASTEXITCODE can be captured, leaving .out/.err files at 0 bytes. (4) PowerShell 5.1's 1> / 2> redirection writes UTF-16 LE with BOM via Out-File's default Unicode encoding; PowerShell 7+ writes UTF-8 — cross-version inconsistency that produces UTF-16 captured artifacts on the default Windows install and breaks any downstream consumer that expects UTF-8 bytes. The coordinated fix in both wrappers replaces ProcessStartInfo/[Process]::Start with PowerShell's native call operator & $cmd @args (which handles .exe/.cmd/.ps1 shim resolution on both PS 5.1 + PS 7+, mirroring the pattern already proven in invoke-claude-api.ps1 line 221), sets $OutputEncoding + both Console encodings to UTF-8 no-BOM around the native call to fix the pipeline encoding, relaxes $ErrorActionPreference to Continue around the native call only (restored in finally) so child stderr does not crash the script, and adds a post-process step that re-encodes the captured .out/.err files via Get-Content -Raw + [System.IO.File]::WriteAllText(..., [UTF8Encoding]::new($false)) so the artifacts are UTF-8 no-BOM regardless of which PowerShell host launched the wrapper. The stdin-write path was also switched from Set-Content -Encoding UTF8 (which adds a BOM on PS 5.1) to explicit [System.IO.File]::WriteAllText(..., [UTF8Encoding]::new($false)) so a prompt body provided via piped stdin doesn't get a leading BOM that some CLIs interpret as content. End-to-end smoke verified on Windows: powershell -File invoke-codex-prompt.ps1 against a real codex.ps1 shim returns exit 0 with the codex response in .out as plain UTF-8. The .sh siblings of both wrappers were not affected and required no change — they invoke through the POSIX shell which handles shim resolution and stream redirection at the shell layer. This matters because the wrapper IS the canonical contract for file-based external-CLI prompt delivery on the Claude pack, and a silent regression to non-functional means any operator dispatching an external review or worker through the documented pattern on Windows would have been forced into inline workarounds (as happened this session for the 6th hypothesis-disclosure-doc review) that lose the file-based prompt persistence and capture guarantees the wrapper exists to provide.

  • Second-pass hardening of the invoke-{codex,claude}-prompt.ps1 wrappers addressing every finding from the full Codex security review + full Codex architecture review of the first-pass wrapper fix (commit ff821d3 above). Four distinct defect classes fixed in one coordinated patch to both .ps1 wrappers, each with empirical reproduction recorded in .scratch/security-review-20260517/: (F1, MEDIUM) TopicSlug was concatenated into captured filenames in $outputDir without validation; a caller-supplied ..\..\tracked-leak resolved outside .scratch/codex-prompts and legit:hidden exposed NTFS Alternate Data Stream syntax — fixed by rejecting .., path separators (/, \), drive/ADS separator (:), Windows-invalid filename chars, NUL, and overlong (>64 char) slugs at the boundary. (F2, MEDIUM) --PromptFile validated Test-Path -PathType Leaf (returns True for symlinks pointing at regular files) and then Copy-Item -LiteralPath followed the symlink to the target — the reviewer pointed a symlink at a SECRET_PROBE file and watched the wrapper persist and forward its contents to the external CLI. Fixed by checking Get-Item -Force for the ReparsePoint attribute and refusing the input with a clear FAIL message naming the path. (F3, MEDIUM) prompt directory inherited parent ACLs (empirically Authenticated Users Modify + Users ReadAndExecute on .scratch), and timestamp-only filenames were predictable enough for a same-machine racer to anticipate captured paths and either pre-create symlinks at those paths or race-modify the captured content between native-call exit and the post-process re-encode loop — fixed by adding 8 hex chars of GUID entropy to the slug (-{8-hex} suffix appended after the timestamp) and by setting a restrictive ACL on the prompt directory when the wrapper just created it (disable inheritance, grant only the current user; skip ACL hardening if the directory already existed so we don't tighten a path the caller may share, and warn-but-proceed on non-Windows / sandboxed filesystems where the ACL operation legitimately fails). (A1, BLOCKING architecture finding) the first-pass fix used Get-Content -Raw -LiteralPath $promptPath to reread the prompt body before piping to the CLI; on Windows PowerShell 5.1 this reads UTF-8 no-BOM files via the system ANSI codepage and corrupts non-ASCII text — empirically the arch reviewer sent Привет (codepoints 1055,1088,1080,1074,1077,1090) and the child saw garbled (1056,1119,1057,1026,1056,1105,...). Additionally the prior $ErrorActionPreference = 'Continue' scope covered Get-Content (wrapper code, which should stay strict) not only the native call — fixed by reading the prompt body via [System.IO.File]::ReadAllText($promptPath, [System.Text.UTF8Encoding]::new($false)) before relaxing the error preference, and piping the resulting variable into the native call. End-to-end smoke verified on Windows: powershell -File invoke-codex-prompt.ps1 with a Cyrillic prompt (Ответь...ПРИВЕТ) returns exit 0 with .out containing exactly ПРИВЕТ in plain UTF-8, no garbling. F1 also verified by 3 inline reject-tests (path traversal, ADS syntax, overlong slug — all exited 1 with the expected FAIL message). This matters because every operator who dispatches an external review or worker through the documented wrapper pattern on Windows now gets the same filesystem-boundary defences that the API-side wrapper has had for months, and non-ASCII prompt content reaches the CLI unmangled regardless of which PowerShell host launched the wrapper.

  • Validator cap bumps to accommodate the just-landed common-skill description and Bootstrap "two trigger moments" extension — three caps in src.codex/skills/lead/scripts/validate-skill-pack.sh bumped with documented rationale per the existing "visible decision rather than silent budget growth" convention. (1) CODEX_SKILL_DESCRIPTION_MAX_CHARS bumped 180 -> 440 for the $mathtype-book-page long-form 10-trigger description adopted from active downstream use (416 chars actual; the prior 180-char cap was a pre-common-skills UX guideline); future common skills should still aim for ≤180 chars where the trigger surface is narrow, and this cap is the wide-trigger allowance rather than a license for verbose role-style descriptions. (2) CODEX_SKILL_DESCRIPTION_TOTAL_MAX_CHARS bumped 3500 -> 3700 to fit the same long-form description in the total budget (3595 chars actual; headroom ~105 chars for one future common skill at modest length). (3) Codex AGENTS.md pack section line budget bumped 360 -> 380 to accommodate the Bootstrap restructuring from one trigger moment to two (366 lines actual; +14 ceiling preserves headroom for the next governance addition). Each bump comment names the predecessor cap, the version, the driving change, and the expected headroom for next addition so future bisects can reconstruct the policy intent without spelunking commit messages. Pack validator now passes 406 / 0 / 0 against the merged Codex AGENTS.md and skill metadata.

Added

  • Added the passive-polling Stop hook beside the existing bugfix-discipline PreToolUse hook. The new check-passive-polling-stop hook closes the user-reported failure mode where the assistant ends a turn with жду ответа бота (3-5 мин) or equivalent passive async-wait language without checking current time, bot/review/CI/job status, task output, or captured logs. The implementation mirrors the existing hook family shape instead of building a monolith: shared envelope/transcript helpers now live in hook_common.py, while check-bugfix-discipline.py and check-passive-polling-stop.py remain separate rule brains because their hook contracts differ (permissionDecision: deny for PreToolUse, top-level decision: block for Stop). The Stop hook reads last_assistant_message from the Stop envelope, exits on stop_hook_active=true, uses two-tier phrase detection (strong phrases like жду ответа бота, waiting for review, CI pending, 3-5 мин; weak phrases like waiting for or жду only when paired with an async noun or time estimate), exempts user handoffs such as waiting for your response / жду твоего подтверждения, and supports the per-stop [acknowledge-passive-stop] override. To prevent no-op bypasses, it accepts only relevant current-turn probes (date, Get-Date, gh pr view, gh run list, gh api, curl, process/task output, output/log/task reads) and rejects irrelevant calls such as Bash: true. Installers now write both hook entries into Claude settings.json and Codex hooks.json; the shared installer gained --hook-event and --script-marker so PreToolUse and Stop entries are idempotent and independently removable. The Codex AGENTS.md validator cap was bumped 380 -> 420 with an in-script rationale because the installed platform section now documents two hook events, two override markers, independent removal commands, and the Stop JSON shape. Codex users must trust both entries in the interactive TUI after install or upgrade; Claude fires them immediately. This matters because the earlier text rule and memory feedback did not stop the passive-wait behavior, and the structural guard now fires exactly at turn termination, where the failure occurs.

  • Added gpt-5.3-codex-spark as a 4th allowed value for externalCodexProfile in shared/agents-mode.schema.json (was default | gpt-5.5-fast | gpt-5.5-xhigh, now adds the spark variant). The shipped default stays gpt-5.5-xhigh and no preset is rewired to spark — this addition is purely the allowed-value list growing so operators can opt in by setting externalCodexProfile: gpt-5.3-codex-spark in their ~/.codex/.agents-mode.yaml or ~/.claude/.agents-mode.yaml. Spark is a supplement, not a replacement for the gpt-5.5-* profiles — the user-stated framing was explicitly "это не замена gpt 5.5, дополнение для экономии и скорости", parallel to "sonnet не замена opus, а для скорости и экономии". Use case: very routine non-thinking mechanical tasks where mainconv (or an external-worker invocation) wants to save the main-tier token budget — spark has separate rate limits from the gpt-5.5 family, so calling it doesn't burn the main-tier quota. Mandatory constraint (user emphasized as load-bearing): spark output requires verification by a heavier tier before any commit/push, because spark trades depth for speed; treat its output as "implemented, not yet verified" by default. The shared/agents-mode.defaults.yaml comment was updated to mention the new value and the supplement framing; both the English (shared/references/subagent-operating-model.md) and Russian (shared/references/ru/subagent-operating-model.md) operating-model references gained the same wording in their externalCodexProfile enumeration paragraphs. The validator at src.codex/skills/lead/scripts/validate-skill-pack.sh was not bumped (the existing checks pass against the additional enum value). This matters because mainconv routing decisions now have an explicit allowed-value path for "this is purely mechanical and maximally clear" tasks — previously the only knob below gpt-5.5-xhigh was gpt-5.5-fast (same family, different reasoning effort), which still burns the main gpt-5.5 rate budget.

  • Merged 73 net lines of workflow improvements into the $mathtype-book-page common skill from active downstream use, propagating refinements back to the canonical pack source so all adopters benefit on next install. Both pack copies (src.codex/skills/mathtype-book-page/SKILL.md and src.claude/skills/mathtype-book-page/SKILL.md) updated identically per the common-skill dual-shipping contract. Additions: an expanded skill description with 10 specific trigger contexts (improves Skill-tool description matching); a new "Typography Normalization Against Exemplar (LOAD-BEARING)" section documenting the 5-step procedure for reconciling chunk-level word/styles.xml against an accepted exemplar when an OMML→DOCX pipeline emits non-book defaults (Arial 10.5pt body instead of TNR 11pt, oversized headings, missing pandoc-specific styles); a new "Heading Case Normalization (Russian sentence case)" section documenting the 6-step procedure for converting ALL CAPS Russian headings to production-book sentence case with multi-run text-run merge/redistribute and <w:rPr> preservation; 8 new defect-pattern table rows under "Common defect patterns" (body font/size drift vs exemplar, body page label banners, ALL CAPS section headings, multi-number formula cell concatenation, multi-row OLE display untouched after multi-number split, stale manifest status, skipping no-Word cleanup sweep before promote, render gate skipped after no-Word edits); 5 new closing checklist items 20-24 (exemplar styles.xml audit, heading-case audit, body-flow page label audit, manifest gate-state honesty audit, per-chunk skill review record audit); and a new "Defect Class Index" section enumerating defect classes A-M (12 classes at initial merge; later extended to 13 with Class N — Code listing provenance pointer — as the operational form of the new Results-table provenance discipline) with surface description and repair lane for each. Per the "No mechanical application" rule in CLAUDE.md, four downstream-project-specific references (empirical citation date, manifest status enum, internal doc path, RF-engineering acronym example list) were generalized for canonical pack consumption so other projects can adopt the procedures without inheriting one project's specific schema. This matters because the skill was actively used in a translation project and accumulated empirical procedure improvements that other projects adopting the skill should inherit — leaving the workflow improvements only in a downstream user's local ~/.codex/skills/ directory would mean every subsequent adopter would have to rediscover the same defect patterns and re-derive the same procedures.

  • Extended the $mathtype-book-page common skill with a new "Listing Provenance Pointer (Class N)" section that operationalizes the just-landed Results-table provenance discipline for the "program listing" surface, plus a matching Class N row in the Defect Class Index and a new checklist item 25. Program listings printed inside a translated technical-book chunk are OCR-cleaned samples from the source book; the verified, compilation-tested source typically lives separately in the project's verified-source root. A chunk that prints or narratively references a program without a provenance pointer is now classified as a Class N defect — a future reader cannot reproduce the run or distinguish the printed code from a cross-validated translation. The new section documents the 3-line provenance block (Verified source: <path> (<compile invocation>) + Input artifact: <path> + Output: <path> (<one-line numeric summary>)), the cross-language cross-validation form (when a chapter has e.g. an original Turbo Pascal and a Fortran restoration, both source paths + compile invocations + cross-validation row count + max absolute difference belong in the block), the find heuristic (grep word/document.xml for monospace run properties + narrative program identifiers + named convention markers Листинг N / Program N / Listing N), and the skip rule (numerical data tables or trivial illustrative fragments without verified-code counterparts). Project-specific references in the upstream Itoh2 version of the section (chapter-to-path coverage map for one RF-engineering book, specific compile invocations, internal manifest paths) were generalized for canonical-pack consumption per the "No mechanical application" rule. Both pack copies updated identically per the common-skill dual-shipping invariant.

  • Updated the canonical evidence-pipeline reference at shared/references/evidence-based-answer-pipeline.md (plus its Russian mirror at shared/references/ru/evidence-based-answer-pipeline.md) to reflect the new Pre-fix diagnostic gate and Results-table provenance discipline rules added in commit 5634d61, addressing the architecture reviewer's mandatory finding that governance/protocol/gate/routing semantic changes MUST update shared/references/ per AGENTS.md:18. The "Relevance to our governance" section now lists Pre-fix diagnostic gate, Hypothesis disclosure discipline, Evidence-citation discipline, Results-table provenance discipline, and Visual artifact verification discipline alongside the prior Ambiguity resolution / Evidence-based completion / Failure transparency / Treat-external-content-as-untrusted entries. A new closing paragraph names the three structural enforcement points for code-bearing work — pre-fix (Bootstrap steps 1-3 before first Edit/Write in bug-report context), pre-commit (all 5 Bootstrap steps before commit), and pre-push (auto-installed check-hypothesis-disclosure hook on git push) — and explains the audit chain: wasted edit cycles caught at pre-fix, wasted commits caught at pre-commit, wasted shared-state writes caught at pre-push. This matters because the shared reference doc is the canonical methodology source-of-truth for the evidence-citation discipline that both packs and any external consumer would read to understand "how does Orchestrarium operationalize verify-before-claim"; without this update, that doc would have silently lagged the rule catalogue.

  • Added two new globally-binding governance rules to shared/AGENTS.shared.md and extended the per-pack Bootstrap blocks in src.claude/CLAUDE.md and src.codex/AGENTS.codex.md to fire at an additional pre-fix trigger moment in addition to the existing pre-commit trigger. New rule 1: Results-table provenance discipline — every table of computed results in repository documentation, reports, scientific docs, generated output, or release artifacts must cite, near the table, the provenance triad of (a) the formula/model/named procedure that produced the numbers, (b) the code/script/notebook path that ran the computation (with file:line or section anchor when the computation is one of many in the file), and (c) the input artifacts the computation consumed (file paths, dataset versions, parameter values, fixed seeds, environment markers). Fires only for computed values (any numeric output of script/formula/simulation/fit/regression/model/calibration/benchmark), not for source-data tables copied from external datasets which only need source citation. The discipline closes the failure mode where numbers exist in documentation but the computation provenance is lost, so reviewers cannot independently audit, reproduce, or distinguish the values from fabrication. For one-shot computations not preserved as a script, name the procedure inline and either commit a minimal reproduction script or label the cells ASSUMPTION (UNVERIFIED — one-shot computation not preserved). New rule 2: Pre-fix diagnostic gate — when a session encounters a bug report, runtime failure, error trace, regression, "не работает" claim, or any user message naming a defect in behavior, steps 1-3 of the existing Ambiguity resolution discipline + Hypothesis disclosure discipline (capture concrete observable data → form a hypothesis chain → verify each link or label ASSUMPTION (UNVERIFIED)) must complete before the first Edit / Write / NotebookEdit / file-write / apply_patch tool call. The trigger is the start-of-fix-attempt moment, not the commit moment — the existing Bootstrap fires at commit time and the auto-installed hook fires at push time, but neither catches mainconv between "user reported a bug" and "main conv started editing files on a guess about what the bug is". This closes the recurring failure pattern (recorded in user memory as feedback_pre_fix_discipline and surfaced this session as "main conv keeps fixing without confirming hypotheses outside /agents-bugfix flow") where the discipline existed in the rule catalog but wasn't catching the session reliably at the moment that actually matters. Bootstrap structural update: both per-pack Bootstrap blocks now have a "two trigger moments" preamble explicitly naming (a) pre-fix (steps 1-3 before any code-mutating tool call in a bug-report context, regardless of whether /agents-bugfix was invoked) and (b) pre-commit (all 5 steps), and the violation-triggers list grew a new "Pre-fix triggers" sub-block listing specific shortcut patterns ("I see the bug, let me edit X" without symptom citation, "the fix is to add Y" without verified hypothesis, starting an Edit/Write in a bug-report context with no diagnostic data captured). The combined effect is that the same hypothesis-disclosure discipline is now auditable at every level of action irreversibility: pre-fix (in-session guard against wasted edit cycles), pre-commit (Bootstrap, guards against wasted commits), and pre-push (the auto-installed hook, guards against wasted shared-state writes). This matters because the discipline already existed structurally — what was missing was the structural enforcement at the moment when an unverified hypothesis most often produces wasted work, which is the gap between "user described a bug" and "main conv started editing files in response".

2026-05-16

Changed

  • Reframed the externalCodexProfile: gpt-5.5-fast documentation across both packs, both example provider trees, the shared docs (docs/agents-mode-reference.md, docs/external-worker-design.md), and the public-facing README.md and INSTALL.md to make explicit that "fast" is a model-tier choice (the underlying Codex model variant), not an effort downgrade — model_reasoning_effort stays xhigh on this profile. Earlier wording such as "explicitly requests a Codex fast profile" or "speed override" was ambiguous and could be read as implying that gpt-5.5-fast swapped reasoning effort below xhigh; the corrected wording now states "selects the fast Codex model tier (model variant only — reasoning_effort still stays xhigh, this is not an effort downgrade)" at every documented occurrence. This matters because operators who picked gpt-5.5-fast for the max-speed preset or via explicit override were getting two independent properties they could not previously distinguish from the docs alone — a faster model variant and the same xhigh reasoning effort as the best-effort profile — and the old wording made the trade-off look worse than it actually is, biasing operators toward default or gpt-5.5-xhigh when gpt-5.5-fast would have been the better choice for their lane.

Added

  • Refined the Codex+Windows skip implementation per the fourth architecture re-review's two REVISE findings. (1) Windows skip is now BEFORE Python resolution in scripts/install-codex.sh — previously the case statement was inside the else branch of if [ ! -f "$hook_installer" ], so a Windows host without Python would hard-fail with "FAIL: python or python3 is required" instead of cleanly skipping. The case statement moved up to the outer block so the MINGW/MSYS/CYGWIN branch is evaluated first; the POSIX branch keeps the Python check. (2) Skip messages and docs are now path-aware — the previous WARN messages always pointed at ~/.codex/skills/... and ~/.codex/hooks.json (global paths), incorrect for --target project-local installs which place skills under <project>/.agents/skills/ and hooks.json under <project>/.codex/hooks.json. Bash installer now interpolates $TARGET and $AGENTS_ROOT into the message and switches phrasing based on $MODE; PowerShell installer uses $TargetRoot and $AgentsRoot similarly. src.codex/AGENTS.codex.md documents both global and project-local paths explicitly. New regression test test_codex_windows_skip_does_not_load_target_file proves the helper exits BEFORE any filesystem I/O when given Codex+Windows + a deliberately bogus target path that would fail any read or write attempt. Tests now 17 pass + 1 skip.

  • Reverted the previous "skip Codex hook auto-install on Windows" defensive design and now auto-install the Codex hook entry on all platforms, including Windows. The earlier skip was an over-correction: empirical testing showed that the real blocker is not Windows-specific shell ambiguity but Codex's universal hook trust requirement (Codex marks every newly-installed or modified hook as "untrusted" until the user explicitly trusts it via the interactive codex TUI, on both POSIX and Windows). The installer cannot satisfy this requirement programmatically because Codex does not currently expose a trust API — only managed-hook bundles deployed via requirements.toml with managed_dir/windows_managed_dir bypass the trust prompt, and that is out of scope for a pack-level installer. With the trust step explicitly documented as the user's responsibility, the installer's job becomes "write the hook entry and the script"; the user runs codex once and trusts the entry via the on-screen hook browser. Codex+Windows now uses the same bash <quoted_sh_path> shell form as Codex+POSIX (Codex hook schema does not support args exec form or shell field per https://developers.openai.com/codex/hooks, so shell form is the only option). The Windows form assumes Codex's hook interpreter can locate bash on PATH — typical when Git Bash is installed alongside Codex, but if a user's runtime uses a different shell they can edit the persisted hooks.json entry. The marker-based merge is for entry identification, not edit preservation: the installer's idempotent re-install keys on the check-hypothesis-disclosure substring marker only to find the pack's entry across upgrades and to prevent duplicate entries from accumulating; on every reinstall it then normalizes the matched entry back to the canonical bash <script.sh> form, so a hand-edited command shape will not survive the next install. To keep a manual override, users must pass --no-hypothesis-hook (PowerShell -NoHypothesisHook) on every future install or set ORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1 in the environment so the installer skips the hook step entirely. The opt-out env var does not block --remove, so legacy cleanup remains possible while a standing opt-out is in effect. End-to-end smoke test on Windows confirmed powershell -File scripts/install-codex.ps1 -Global now writes ~/.codex/hooks.json with the bash shell-form entry. Tests grew to 17 (16 pass + 1 Windows-symlink skip); test_codex_windows_is_skipped_with_warn replaced by test_codex_windows_writes_entry_with_bash_form and test_codex_windows_creates_parent_directory to lock in the new behavior. src.codex/AGENTS.codex.md and INSTALL.md document the manual trust step prominently so users are not surprised by the silent-skip-until-trusted behavior of Codex hooks. Empirical investigation evidence: Codex 0.130.0 binary strings include "New hook - review required", "Modified since last trusted - review required", "Trust to view hooks; to trust; to toggle", "Managed hooks are always on", confirming the trust mechanism applies to all platforms; a probe hook configured at ~/.codex/hooks.json with full bash form was empirically verified to load but not fire from codex exec because of the untrusted state.

  • Closed the Codex-Windows residual finding from the third architecture re-review by skipping the Codex hook auto-install on Windows hosts entirely with an explicit WARN. The third re-review correctly identified that the Codex Windows path was still hypothesis-bearing: Codex hooks docs (https://developers.openai.com/codex/hooks) do not document Windows default shell selection, do not support args exec form, and do not support a shell field. The previous attempt to emit powershell -NoProfile -ExecutionPolicy Bypass -File <shlex.quote(path)> was unverifiable on multiple axes (POSIX shlex.quote is not PowerShell-equivalent quoting; Git Bash POSIX path namespace /c/Users/... is not invocable by powershell -File). Applied the no-kostyl rule to my own work: rather than silently emitting an unverified command string, the helper now refuses the Codex+Windows lane with a WARN message explaining the limitation. The Claude pack's auto-install on Windows is unaffected and uses the verified native PowerShell exec form. Implementation: install-hypothesis-hook.py exits early with SKIP+WARN when --platform codex --host-os windows (unless --remove, which still works so users can clean up legacy entries); both install-codex.{sh,ps1} skip the helper call on Windows hosts with the same explanatory message. The hook script itself is still installed to ~/.codex/skills/lead/scripts/ so a user who has verified their Codex Windows shell behavior can manually configure ~/.codex/hooks.json. New regression tests: test_codex_windows_is_skipped_with_warn (verifies no file write + WARN on stderr) and test_codex_windows_remove_still_works (legacy entry cleanup remains possible). src.codex/AGENTS.codex.md documents the Windows limitation explicitly. Tests grew to 17 (16 pass + 1 Windows-symlink skip by design). End-to-end smoke test on Windows confirmed bash scripts/install-codex.sh --global now prints the SKIP message and does NOT modify ~/.codex/hooks.json.

  • Closed the remaining architecture-review finding on commit cd09cdb's Windows portability by adopting Claude Code's documented exec form for the hypothesis-disclosure hook entry — native cross-platform invocation with no Git Bash dependency on Windows. scripts/install-hypothesis-hook.py now accepts a --host-os {posix,windows} flag and emits per-platform: Claude+POSIX uses exec form command: "bash", args: ["<sh_path>"]; Claude+Windows uses native PowerShell exec form command: "powershell", args: ["-NoProfile", "-ExecutionPolicy", "Bypass", "-File", "<ps1_path>"]; Codex stays on shell form (Codex hook schema per https://developers.openai.com/codex/hooks does not support args or shell selection) but now also emits the powershell command string on Windows hosts. Exec form is documented as "preferred for portable cross-platform hooks" by Claude per hooks reference and eliminates shell-quoting concerns entirely (each args element passes through as a literal argument with no shell interpretation). All four installers (scripts/install-{claude,codex}.{sh,ps1}) now detect host OS: Bash variants check uname -s for MINGW*/MSYS*/CYGWIN* to enable the Windows path; PowerShell variants always pass --host-os windows and reference the .ps1 script. The marker-search logic in the merge script was extended to recognize the marker substring check-hypothesis-disclosure in both command (legacy/Codex shell form) and args[k] (Claude exec form), so legacy entries from earlier installs are automatically collapsed into the new exec form on next reinstall without leaving stale duplicates. Tests grew from 12 → 16 (15 pass + 1 Windows-symlink skip): added test_install_claude_windows_powershell_exec_form, test_install_codex_posix_shell_form, test_install_codex_windows_powershell_shell_form, and split the command-injection test into separate Claude-exec-form and Codex-shell-form variants. End-to-end smoke test on Windows under Git Bash confirmed the real ~/.claude/settings.json now contains the PowerShell exec form. The architecture re-review of cd09cdb ("Leaving native shell:powershell/args for later is not acceptable for this gate") is now fully closed; the security re-review was already PASS.

  • Hardened scripts/install-hypothesis-hook.py and the installer scripts against findings from Codex code review and full Codex security review of commit 79aa5eb. Seven distinct findings addressed in one cohesive follow-up commit: (1) HIGH — command injection via unquoted --script-path: empirically demonstrated by the security reviewer with --script-path '/tmp/safe path; echo PWNED' producing "command": "bash /tmp/safe path; echo PWNED". Now uses shlex.quote() on the script path before embedding into the JSON command field, defending against paths containing spaces, semicolons, $(...), quotes, or backslashes. (2) HIGH — silent fail when Python is missing: installers now fail closed with FAIL: python or python3 is required... instead of silently skipping the hook install (leaving the user thinking the hook is active). PowerShell variants check $LASTEXITCODE after the merge-script call and propagate non-zero exits. (3) MEDIUM — duplicate marker entries not cleaned: find_our_entry_indices() now returns ALL marker matches; install() collapses duplicates to a single entry, --remove deletes every match. (4) MEDIUM — opt-out env var blocked --remove: ORCHESTRARIUM_NO_HYPOTHESIS_HOOK check now bypasses --remove, so a standing opt-out can still uninstall an existing hook entry. (5) MEDIUM — no regression tests for the merge logic: added tests/test_install_hypothesis_hook.py with 12 tests covering empty-install, idempotent reinstall, user-key preservation, duplicate handling, opt-out semantics (block install / allow remove), command-injection defense, malformed-but-valid JSON ({"hooks": {"PreToolUse": [{"hooks": 5}]}}), symlink-target refusal, and invalid-JSON fail-closed. 11/12 pass on Windows (symlink test requires elevated privileges and is skipped by design on Windows). (6) LOW — symlink/TOCTOU clobber risk: writer now refuses to write through a symlinked target (Path.is_symlink() check) and uses atomic write via tempfile.mkstemp() + os.replace() in the same directory. (7) LOW — TypeError on malformed entry["hooks"]: per-entry hooks field is now validated as a list before iteration; non-list values are skipped gracefully instead of crashing the merge helper. Architecture review's structural recommendation (use shell: "powershell" + args for Windows-native invocation, per the documented Claude Code hooks reference) is acknowledged as a future improvement; the current bash form works on macOS, Linux, and Windows under Git Bash, and the shlex-quoting closes the actual injection vector.

  • Auto-installed the hypothesis-disclosure PreToolUse hook in all four installer scripts (scripts/install-{claude,codex}.{sh,ps1}), reversing the previous "opt-in only" design that the user correctly identified as kostyl (avoidance of the settings.json merge problem rather than a real fix). New helper scripts/install-hypothesis-hook.py performs idempotent JSON-merge: reads existing settings.json (Claude) or hooks.json (Codex), finds our entry by check-hypothesis-disclosure substring marker in the command field, inserts or updates in place, and preserves all other user keys and other hooks. Five smoke tests verified the merge logic: install into empty file produces correct structure; re-install is no-op; user keys (env, attribution, permissions, enabledPlugins, other hook lanes like Stop) are preserved on insert; --remove flag cleans only our entry; ORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1 env var blocks the install entirely. Installers added a --no-hypothesis-hook CLI flag (-NoHypothesisHook on PowerShell) for explicit opt-out at install time; the env var is the runtime opt-out for users who want the installer to skip the step on every invocation. The hook is installed in both --global mode (~/.claude/settings.json, ~/.codex/hooks.json) and --target project mode (<project>/.claude/settings.json, <project>/.codex/hooks.json). End-to-end test against a real settings.json with pre-existing user keys (model, theme, permissions with custom Bash allow rules, enabledPlugins block) confirmed user state is preserved byte-for-byte except for the appended hook entry. Gemini and Qwen example packs do not get the hook because their runtimes don't expose comparable hook surfaces; the Bootstrap text rule remains binding on all platforms regardless. CLAUDE.md, AGENTS.codex.md, and INSTALL.md docs updated from "opt-in" framing to "auto-installed by default, opt out with --no-hypothesis-hook or ORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1". This matters because the previous "opt-in" framing was a documentation kostyl that hid the design avoidance — the installer's job is to make the pack work out of the box, not ship a script in a passive directory and ask the user to wire it up themselves.

  • Extended the Hypothesis disclosure discipline with a "Fix means correct logic, not workaround (no kostyl)" clause that defines what counts as a real fix versus what counts as a workaround that hides the symptom. A "fix" must name the root cause (a specific function, contract, invariant, or boundary that produced the wrong behavior) and correct it; a workaround silences a visible failure mode without changing the underlying logic. Explicit examples of kostyl enumerated in the rule: catch-and-swallow error handling that suppresses a diagnostic signal, defensive checks added without understanding why the original input was malformed, type assertions that silence the type contract, fallback values that mask missing-data bugs, renaming a misleading identifier without correcting the logic it labels, adding a retry loop around a non-idempotent operation, hardcoding a value to dodge a configuration-resolution bug, try/except: pass around recurring failures, and log filtering to hide real errors. Two operational tests apply before commit: (1) root-cause naming — the commit message must name the root cause in concrete terms, not just describe the suppressed symptom; (2) symptom-only suppression detection — if the fix removes a visible failure mode without changing the underlying logic and there is no separate verified hypothesis that the suppression itself is the correct behavior (legitimate boundary fallback, intentional graceful degradation per a documented contract), it is a kostyl. Kostyl is permitted only as an explicitly labeled WORKAROUND commit that (a) names the root cause separately, (b) states scope and lifetime, and (c) discloses that the underlying defect is unfixed. The Bootstrap blocks in src.claude/CLAUDE.md and src.codex/AGENTS.codex.md carry a new step 4.5 ("no-kostyl check") that pairs with step 4's scope proportionality check, so the operational checklist now reads: diagnostic data → hypothesis inventory → verify-each-or-label → scope proportionality → no-kostyl check → recovery readiness. Codex AGENTS.md pack section line budget in src.codex/skills/lead/scripts/validate-skill-pack.sh was bumped 340 → 360 to accommodate the new step (~2 lines actual, ~18 lines headroom). This matters because the existing Bug-fix scope and Failure transparency rules covered some of this implicitly, but there was no explicit anti-kostyl clause that named the failure pattern; the user's complaint that "main conv keeps making fixes by hypothesis without diagnostics" and that "fix needs to mean implementation of correct logic, not workaround/hiding" required the rule to make the test operational and the violation pattern named.

  • Extended shared cross-pack global agents-mode.yaml symmetry to all four pack installers and all read-order documentation surfaces. Gemini and Qwen installers (scripts/install-gemini.{sh,ps1}, scripts/install-qwen.{sh,ps1}) now also create/normalize ~/.agents-mode.yaml during --global install, matching the Claude and Codex behavior. 4-way idempotency verified empirically: SHA-256 of ~/.agents-mode.yaml remained identical across sequential --global installs from all four packs (Claude → Codex → Gemini → Qwen). 13 documentation surfaces in src.gemini/skills/ and src.qwen/skills/ (consultant, external-brigade, external-reviewer, external-worker, init-project, lead/external-dispatch, second-opinion in each pack) updated to the same per-key read-order wording as the production-line packs. A repo-wide grep "fall back to global" now returns zero remaining instances, confirming consistent governance across all four packs. This matters because the previous commit (0f749db) was production-line-only; example-pack documentation drift would have meant a user who reads src.gemini/skills/consultant/SKILL.md would see whole-file fallback while everyone else sees per-key composition — exactly the kind of governance contradiction the rule "do not let one surface silently redefine thin upstream artifacts" exists to prevent.

  • Added a shared cross-pack global agents-mode.yaml layer at ~/.agents-mode.yaml (alongside ~/.claude.json), serving as the single source of truth shared between Claude Code and Codex CLI. The new layer fits into the config-resolution stack with per-key precedence: project-local .agents-mode.yaml > pack-local global ~/.claude/.agents-mode.yaml or ~/.codex/.agents-mode.yaml > shared cross-pack global ~/.agents-mode.yaml > built-in defaults. Each key resolves to the highest layer that defines it; layers compose rather than replacing each other wholesale. All four installers (scripts/install-claude.{sh,ps1} and scripts/install-codex.{sh,ps1}) now call sync_agents_mode_file for the shared global target during --global install only; project-target installs are unaffected. Behavior is normalize-not-overwrite: the file is created on first install if absent, and re-normalized to current canonical format on subsequent installs while preserving user-customized values. Cross-pack idempotency was verified empirically — Claude install creates the file, Codex install on the same file produces SHA-256-identical content (byte-for-byte). Read-order documentation updated in src.claude/CLAUDE.md, src.codex/AGENTS.codex.md, both external-dispatch.md contracts, src.codex/skills/lead/subagent-contracts.md, INSTALL.md, and docs/agents-mode-reference.md. This matters because pack-local globals (~/.claude/.agents-mode.yaml and ~/.codex/.agents-mode.yaml) could previously drift between each other (each installer wrote independently), and there was no place to express cross-pack-shared operator preferences without duplicating them. The shared layer becomes the single point of truth for common settings; pack-local globals stay as platform-specific overrides where needed.

  • Corrected two findings in commit 58b9273's structural-enforcement design. (A) The Claude hook config example used a relative path (bash .claude/agents/scripts/check-hypothesis-disclosure.sh) which, per the official Claude Code hooks docs, is unreliable because the hook runs with the session's current working directory, not the directory of settings.json. Recommended forms now use $HOME/... for global hooks and ${CLAUDE_PROJECT_DIR}/... for project-local hooks. (B) The previous note in src.codex/AGENTS.codex.md claimed "Codex CLI does not currently expose an analogous hook surface" — this was a hypothesis-driven kosyak. Direct WebFetch of https://developers.openai.com/codex/hooks verified that Codex CLI exposes a PreToolUse hook surface (stable feature hooks, default-on) with the same JSON envelope (stdin) + structured deny (hookSpecificOutput.permissionDecision = "deny" or exit code 2 + stderr) protocol as Claude Code. The Codex pack now ships the same hook script at src.codex/skills/lead/scripts/check-hypothesis-disclosure.{sh,ps1} (smoke-tested against Codex envelope shape — same JSON shape so the existing script works), and src.codex/AGENTS.codex.md carries the corrected hook config example using ~/.codex/hooks.json (Codex matcher is regex on tool name only, so the script self-filters by parsing tool_input.command). INSTALL.md was updated to cover both packs symmetrically.

  • Added an opt-in structural enforcement hook for the Hypothesis disclosure discipline rule on Claude Code clients. New scripts src.claude/agents/scripts/check-hypothesis-disclosure.sh and .ps1 machine-check the HEAD commit message before git push: behavior-changing commit types (feat/fix/refactor) must carry a VERIFIED: or ASSUMPTION (UNVERIFIED) marker in the body, while whitelisted types (docs/chore/style/merge/ci/build/perf/test/revert) pass through unchecked. The hook is wired as a Claude Code PreToolUse hook with the if: "Bash(git push *)" permission-rule filter so it only fires on actual push attempts. It denies with a structured JSON payload (hookSpecificOutput.permissionDecision = "deny" + actionable reason) that surfaces back to Claude and the user. The pack ships the script but does not modify .claude/settings.json; opt-in instructions are documented in src.claude/CLAUDE.md (section "Optional structural enforcement") and INSTALL.md. Cross-platform parity: both the Bash and PowerShell scripts were smoke-tested against the four positive/negative cases (HEAD with disclosure → allow; non-push command → allow; behavior-changing commit without disclosure → deny with reason; whitelisted-type commit → allow; behavior-changing commit with disclosure → allow). Codex CLI does not currently expose an analogous shell-tool hook surface; Codex-side structural enforcement is a follow-up when the runtime exposes a hookable layer. The text rule and Bootstrap block remain binding on both platforms regardless of hook installation — the hook is a structural backstop for sessions where the text alone is insufficient, which is exactly the residual risk the QA review of e36fb67 flagged.

  • Introduced the Hypothesis disclosure discipline as a globally-binding governance rule in shared/AGENTS.shared.md (under "Verification and decision discipline"), plus an operational Bootstrap block at the top of src.claude/CLAUDE.md and src.codex/AGENTS.codex.md. The rule requires every fix or implementation commit to (1) decompose the proposed change into distinct interpretive claims, (2) verify each claim via the four Evidence-citation discipline categories — or via direct user confirmation for user-intent claims — or explicitly label it ASSUMPTION (UNVERIFIED) in the commit body with the verification step that would resolve it (silent unverified claims are violations), (3) keep scope proportionate to what the verified hypothesis actually requires (wider scope is allowed only when the hypothesis itself names it), (4) prefer git reset --hard HEAD~N over git revert when a hypothesis is later invalidated on a local-only commit, and (5) treat named shortcut phrases (most likely means, presumably, I believe it refers to, this should map to, extrapolating from, based on training data, while I'm here let me also) as forbidden when used as the load-bearing reasoning for a commit, though they remain acceptable in open exploration. The Bootstrap block is the operational 5-step checklist (diagnostic data → hypothesis inventory → verify-each-or-label → scope proportionality → recovery readiness) plus a violation-trigger list naming the banned phrases as the activation signal. The Codex pack section line budget in src.codex/skills/lead/scripts/validate-skill-pack.sh was bumped 300 → 340 to accommodate the Bootstrap (~32 lines). This matters because the existing Ambiguity resolution discipline and Evidence-citation discipline state the principle ("do not guess; verify") and provide the evidence categories, but they did not require disclosure of the hypothesis chain itself — a commit could pass evidence-citation while still resting on an unverified interpretive leap hidden inside a confident-sounding claim. The new rule makes the hypothesis chain an explicit artifact reviewers can audit.

  • Added externalCodexProfile: gpt-5.5-xhigh as the shipped best-effort default and as the consultant-lane mandatory profile, restoring symmetry with the long-shipped externalClaudeProfile: opus-max default. Schema (shared/agents-mode.schema.json) now lists gpt-5.5-xhigh alongside default and gpt-5.5-fast, and the shipped scalar default flips from default to gpt-5.5-xhigh. Per-preset assignments in shared/agents-mode.presets.json now correspond to each preset's stated intent: best-effort presets (default, correctness-first, power-mode) ship gpt-5.5-xhigh; balanced presets (absolute-balance, external-aggressive) keep default (which inherits externalModelMode); the maximum-speed preset (max-speed) ships gpt-5.5-fast, which selects the fast Codex model tier (model variant only — reasoning_effort still stays xhigh even on this profile, so it is not an effort downgrade of gpt-5.5-xhigh, just a faster underlying model). The invoke-codex-prompt.sh / .ps1 prompt-orchestration wrappers now include -c model_reasoning_effort=xhigh in their default Codex flag set, so a default wrapper invocation honors the best-effort contract automatically. Consultant role files (src.claude/agents/consultant.md and src.codex/skills/consultant/SKILL.md) make the rule unconditional: consultant Codex calls always use gpt-5.5-xhigh regardless of operator-set profile or externalModelMode, mirroring the consultant Claude rule that always uses --model opus --effort max. External-worker, external-reviewer, and lead/dispatch contracts on both pack sides now document the third allowed value and its semantic. This matters because the previous shipped default (externalCodexProfile: default + externalModelMode: runtime-default) silently downgraded consultant Codex memos to the runtime default (high), and re-dispatch on failure repeated the downgrade — the user reported the consultant landing on high instead of xhigh and the fix restores symmetric best-effort across the Codex and Claude consultant paths, with the wrapper default and the consultant-lane unconditional rule providing two independent enforcement layers.

  • Added a ## Coexistence with the superpowers plugin section to src.claude/CLAUDE.md clarifying how Orchestrator templates and superpowers skills compose. superpowers skills shape main conv's process discipline (HOW to think and work — brainstorming, systematic-debugging, verification-before-completion, writing-plans, TDD, subagent-driven-development), Orchestrator templates shape delegation routing (WHO does what and through which gate — quick-fix, research, review, full-delivery, etc.). Standard composition: process skill first if applicable (e.g. brainstorming before full-delivery; systematic-debugging before quick-fix), then Orchestrator template for delegation, then subagents may invoke common-skills via Skill tool inside their own context. The section includes a per-skill overlap table (brainstorming ↔ research/full-delivery, systematic-debugging ↔ quick-fix+bug-hunting, writing-plans ↔ planner, verification-before-completion ↔ qa-engineer, TDD ↔ implementer roles, subagent-driven-development ↔ team templates, requesting/receiving-code-review ↔ reviewer roles) and a quick decision rubric for "do I invoke a superpowers skill or pick a template?". This matters because operators running both packs together previously had no documented integration contract, and the resulting role-routing ambiguity in main conv was the user's stated concern: "mainconv claude unclear when it takes which roles, especially when using superpowers".

  • Closed a recurring lazy-verification failure mode where a subagent would infer "provider unavailable" from the absence of a repo-side wrapper script and silently substitute its own opinion for the requested external one. The fix covers the three-level failure pattern the consultant's own post-mortem surfaced: (1) lazy verification — never ran command -v codex despite the 5-second cost; (2) pattern-match bias — extrapolated one missing wrapper artifact to "the whole subsystem is broken"; (3) curiosity failure — observed dispatch rounds return Resolved provider: none and accepted that as proof of unavailability rather than as evidence the routing layer itself had a defect to investigate. Three coordinated changes: (a) new Active-availability probe discipline in shared/AGENTS.shared.md requiring availability claims to be backed by a direct probe (command -v, which, Get-Command, Test-Path, curl -I) executed in the current session, with indirect inferences explicitly forbidden and anomalous orchestration output (Resolved provider: none with a profile that lists a primary) explicitly framed as a routing-investigation signal rather than evidence of unavailability; (b) Bootstrap — first action block prepended to both src.claude/agents/consultant.md and src.codex/skills/consultant/SKILL.md ordering the consultant role to verify provider availability as a real shell call, write the prompt to a temporary file, shell out via the prompt-orchestration wrapper, and treat any response drafted without an actual tool call as a discipline violation; (c) new file-based prompt-orchestration wrappers src.claude/agents/scripts/invoke-codex-prompt.sh/.ps1 and invoke-claude-prompt.sh/.ps1 that encapsulate the shared External CLI prompt delivery rule in one invocation — they probe provider availability, persist the prompt body to .scratch/<provider>-prompts/<topic>-<timestamp>.md, run the CLI with prompt redirected from that file, capture stdout/stderr to sibling files, and propagate the exit code. The wrappers carry no SECRET.md, no env injection, and no auth-mode switching; the existing invoke-claude-api.* wrapper remains the separate secret-backed transport used only when reserveResolver: claude-wrapper resolves through it. This matters because the failure pattern recurs across the kosyak corpus and a generic "verify before claiming" rule was not enough; the discipline now has a specific named trigger for availability claims, a structural enforcement order in the consultant role itself, and an automation surface that removes the per-invocation file/redirect chain construction that subagents tend to skip under pressure.

  • Added slash-command auto-invocation contract so main conversation can apply the right entry-point command's flow when the user describes the intent without typing the slash command. Each of the six entry-point commands (/agents-bugfix, /agents-implement, /agents-design, /agents-research, /agents-review, /agents-refactor) now carries a ## When to auto-invoke block at the top of its file listing trigger phrases, intent patterns, and explicit do-not-auto-invoke exceptions; src.claude/CLAUDE.md adds a ## Slash command auto-invocation dispatch index that maps intent signals to command files, with resolution rules for ambiguous matches and an explicit fallback to the template decision tree when no auto-invoke trigger fires. Auto-invocation requires main conv to announce the routing decision in its first response so the user can override if the auto-routing was wrong. This matters because the slash commands previously had no documented signal for when to invoke them outside the user typing the command explicitly, and the user's stated concern was that main conv could not auto-route ("важно, чтобы mainconv мог сам их вызывать, если я забуду"). The fix lives in the existing command surface — no new auto-trigger skill (which would compete with superpowers' own routing) and no settings.json hook (which would land outside the pack).

  • Expanded shared/AGENTS.shared.md with eight new globally-binding rules and five extensions to existing rules, distilled from a 20-incident post-mortem corpus and a personal rules collection. New rules cover wire-shape verification before cross-component integration (most common failure pattern in the corpus, three same-day incidents), operational-contract scope (status codes, log severity, alert paging are part of the contract), destructive-default polarity (Apply=true not DryRun=true), guard-precondition discipline (defensive checks must verify their own inputs, not just arithmetic on present values), all-return-paths discipline for cross-cut fixes (slow path AND cache-hit fast path AND inner re-check), polling-anchor discipline (anchor on system's authoritative latest state, not self-derived filter), race-window assertion discipline (engineer windows deterministically large, not nondeterministically tiny), end-to-end channel verification for durability and observability fixes (trace producer → receiver, not just producer-side green), and user-repetition / visual-reference fidelity (repetition = high-confidence signal of misread intent; visual = ground truth). Existing rules extended: Ambiguity resolution discipline now lists four evidence categories (file:line, installed-dependency surface, official documentation, smoke test) plus banned correctness-drivers (should work, probably, etc.); Evidence-based completion now includes background-task notification ≠ artifact verification and authorization-fidelity (do not substitute weaker verification for the authorized surface); Markdown formula rendering format now requires traceability of each input quantity, classification of each formula (analytical exact / common generalized / approximation / fit / numerical policy), and preservation of formula semantics during readability-only edits; External CLI prompt delivery now includes long-run checkpoint discipline (Claude opus/max ≥30 min is a checkpoint not a failure, use 45-60 min timeouts or background processes with periodic polling); core delegation completion-reconciliation now includes roadmap-slice ≠ final answer and user continuation signals (дальше, go, продолжай).

  • Added role-specific governance bullets pointing at the cross-platform/cross-language port discipline (Go regex vs PS1 case sensitivity, POSIX reparenting vs Windows parent-alive, CSV parser strictness) in $platform-engineer, $toolchain-engineer, and $backend-engineer across both Claude and Codex source trees. Added the subagent thread-limit discipline (practical ≤4 concurrent in-session agents, route extras to external CLI) to $lead on both sides. Added the feature-toggle planning discipline (stable identifier, owner, default state, single capability-registry gate, disabled = fully inert across UI/hotkey/watcher/persistence) to $planner and $architect on both sides. This matters because rules that only fire when a specific kind of work happens belong in the relevant role definition where they will actually be read, not buried in the shared catalog.

  • Added Rule 8 to the $bug-hunting common skill: announce launch parameters before non-trivial runs (batch jobs, optimization stages, multi-hour compute, simulations, benchmarks) so diagnostic vs production mode is never silently swapped, and verify expected output artifact existence at the declared destination after — completion notifications are not by themselves evidence of success. This matters because pre-launch parameter omission and post-run notification-trusting are two specific failure modes the corpus surfaced that overlap conceptually with bug-hunting's diagnostic discipline.

  • Introduced the common skills category as a first-class governance concept alongside the role index. Common skills are workflow-focused capabilities that any role or the main conversation can invoke when the skill's description matches the current task; they package reusable methodology, gates, and evidence requirements without owning delivery. This matters because workflow knowledge that used to live as ad hoc per-project Codex skills (for example manual Qt/Windows UI verification, MathType technical-book formatting, systematic bug-hunting via diagnostic logging) now has a named lane in the pack with the same installer, validation, and cross-provider distribution path as roles, instead of being scattered across user-machine .codex/skills/ directories.

  • Documented two common-skill archetypes in shared/AGENTS.shared.md: knowledge-style (loaded into the caller's current context to inform how the caller performs work) and delegate-style (spawnable as a fresh-context subagent that returns one self-contained artifact, also invocable inline via the skill loader). This matters because the two archetypes have meaningfully different invocation semantics — knowledge-style ships SKILL.md plus optional references/ only, while delegate-style additionally carries agents/openai.yaml on Codex and a thin Agent-tool wrapper at .claude/agents/<name>.md on Claude so the main conversation can fork a clean context when delegation is intended.

  • Shipped the first four common skills across all four provider source trees: $windows-gui-manual-testing (delegate-style; Qt and native Windows desktop GUI manual visual verification, now including a screen-capture recipe so the agent can record a fresh video when none exists), $analyzing-video-bugs (knowledge-style; ffmpeg-based frame extraction, scene-change detection, dense sampling around transitions, and native-resolution verification for any UI/animation bug video), $bug-hunting (knowledge-style; systematic runtime-bug investigation via diagnostic logging — log first, never patch on unverified theory, never re-roll on guesses), and $mathtype-book-page (knowledge-style; bring translated technical-book DOCX pages to accepted MathType format with source-PDF authority and a defective-chunk repair reference). This matters because production Codex and Claude installs now distribute these workflows through the standard pack install path, so operators no longer need to manually mirror them between user-machine and project-local directories.

  • Wired explicit cross-references between the three UI/bug skills: $windows-gui-manual-testing owns video capture and hands off to $analyzing-video-bugs for frame analysis; $analyzing-video-bugs accepts any video regardless of who recorded it; $bug-hunting Rule 4 points at the video skills for visual evidence rather than duplicating ffmpeg recipes. This matters because the three skills have overlapping concerns (general bug methodology, visual UI evidence, video frame extraction) and explicit pointers prevent each one from drifting into the others' scope or duplicating recipes.

  • Added explicit common-skill pointers to the most relevant production-line role files: $qa-engineer, $ui-test-engineer, $backend-engineer, $frontend-engineer, and $qt-ui-engineer in both Claude (src.claude/agents/<role>.md) and Codex (src.codex/skills/<role>/SKILL.md). Each role now has a one-line working-rule pointing at the relevant common skill(s) for its primary work surface. This matters because while Skill-tool description matching can in principle discover these skills automatically, an explicit pointer in the role body raises recall in practice and documents the intended pairing of role + skill for new contributors.

  • Extended the Claude installer's pack directory list with skills in both scripts/install-claude.sh and scripts/install-claude.ps1. Codex, Gemini, and Qwen installers already iterated skills/, so the change is Claude-only and reuses the existing per-item install loop, manifest verification, and user-file preservation behavior. This matters because the Claude pack can now ship .claude/skills/<name>/SKILL.md content reachable by the Skill tool from both the main conversation and subagents, without introducing a new copy mechanism or separate validator path.

2026-05-04

Changed

  • Added the shared externalCodexProfile: default | gpt-5.5-fast operator key to the agents-mode contract while keeping every shipped preset on default. This lets operators opt into a Codex fast profile explicitly after provider resolution, while preserving existing production externalProvider: auto behavior: when auto resolves to Codex, default inherits externalModelMode instead of silently changing the model/profile choice. The shared YAML exemplar is now grouped with section comments so the flat installer-compatible shape is easier to read without introducing a nested-schema migration.
  • Tightened release-gate coverage and operator wording for the same agents-mode change. The installer regression now exercises the Codex PowerShell global install path when PowerShell is available, the docs describe Codex pinned-top execution through supported config/profile keys instead of an unsupported raw CLI flag, and the install guide now states that model-only edits to pack-owned built-in Codex agent templates are treated as stale template drift rather than preserved custom overrides.
  • Refreshed the Russian reference layer so shared and provider-local ru/ trees no longer lag the English source references. The evidence-based answer pipeline now has a shared Russian reference plus provider-local compatibility pointers, the Russian workflow and subagent-operating-model references include the current change-classification, external-adapter, completion-reconciliation, and periodic-control semantics, and the Codex Russian operating-model diagram now matches the current sequential Codex runtime model. The Codex pack validator also now checks that every reference Markdown file has a matching Russian counterpart, so future operator-facing reference drift is caught before publication.

2026-05-03

Changed

  • Updated the shipped Codex top-model path from gpt-5.4 to gpt-5.5 across built-in subagent overrides and current model-policy documentation. This matters because the Codex CLI does not expose a documented fresh-model alias in its local help, so Orchestrarium's pinned top-model contract must not leave production agents on a stale model name; Claude still uses the opus-max profile, which resolves through Claude's opus alias.
  • Hardened Codex reinstall behavior for built-in agent override files. Reinstall now refreshes stale Orchestrarium-owned default.toml, worker.toml, and explorer.toml templates even when the stale model string is not the current default, while preserving files whose prompt or structure was actually customized by the user. This matters because pack-owned model and prompt updates should reach long-lived global installs without silently destroying real operator customizations.
  • Added a shipped quality-first production provider-order profile alongside the quiet balanced default. Operators can now set externalPriorityProfile: quality-first when they want maximum result quality from production auto routing: the profile biases near-tie advisory, source-bound, and review lanes toward Codex while preserving Claude-first lanes where the benchmark evidence gives Claude a clearer compact or visual-worker edge. This matters because model-priority tuning is now a named routing profile, not an init-time preset or a hidden host-line default; Gemini and Qwen remain explicit example-only paths.
  • Routed the init-time power-mode preset through the shipped quality-first provider-order profile instead of leaving it on the quiet balanced profile. This matters because power-mode should mean maximum useful result end to end: quality-first provider order plus top model policy, forced delegation, forced MCP, parallel fan-out, and two-opinion advisory/review lanes, while preserving the production-only Codex/Claude provider boundary.
  • Added the first machine-readable work-item execution tracking contract. Active work items can now keep agent-runs.jsonl beside status.md, and scripts/validate-work-item-state.* --work-item <path> checks duplicate run IDs, stale running agents, missing evidence, missing artifacts for PASS, and inconsistent BLOCKED / REVISE gates before closeout. This matters because the existing rule "verify subagents before trusting them" now has an operator-checkable workflow instead of relying only on narrative session summaries.
  • Hardened that validator before publication so PASS artifacts must resolve inside the work-item directory, evidence entries must carry only schema-approved fields with a valid kind plus reference, and required timestamp/run identifiers must satisfy the shared schema length contract. Codex and Claude source validators now run a real schema-plus-validator smoke check instead of only grepping for ledger strings. This matters because machine-readable closeout evidence should not be satisfied by a path that escapes local task memory, by an empty or shape-invalid evidence object, or by source validators that cannot catch schema/implementation drift.
  • Added a ledger append/init helper for local task-memory operators. scripts/agent-run-ledger.* --work-item <path> init prepares legacy work items by adding missing status sections and agent-runs.jsonl, while append writes one event and immediately reuses the shared validator, rolling the file back if the event would make closeout invalid. This matters because real agent tracking now has a practical command path instead of asking operators to hand-edit JSONL during an active delivery flow.
  • Added a periodic active work-item state checker and installed the runtime helper surface with the production Codex and Claude packs. Operators can now run scripts/check-work-items-state.* --root . --stale-hours 24 to scan all active work items for invalid ledgers or stale running agents, while global/project installs copy agent-run-ledger.*, validate-work-item-state.*, and check-work-items-state.* into the provider runtime script directories. This matters because interruption recovery and publication review can verify every active task without requiring the Orchestrarium source checkout.

2026-05-01

Changed

  • Added machine-readable agents-mode contract sources and a shared contract validator for the current scalar keys, provider sets, production priority profile, opinion counts, and init-time preset expansions. The new shared/agents-mode.schema.json, shared/agents-mode.presets.json, and scripts/validate-agents-mode-contract.py are wired into the Codex, Claude, Gemini, and Qwen source validators and backed by a unittest smoke check. This matters because future changes to power-mode, reserve, production auto routing, or example-provider boundaries now have a single executable drift check instead of relying on manual table comparison across YAML, docs, and init helper prose.
  • Moved agents-mode runtime normalization onto the same schema-backed provider/lane policy used by the new contract check. scripts/normalize-agents-mode.py now reads shared/agents-mode.schema.json next to the shared YAML template for production providers, supplemental advisory/review candidates, and provider-scoped scalar defaults, while keeping a compatibility fallback for older standalone layouts. This matters because reinstall/read-time normalization no longer carries a second hardcoded Codex/Claude/reserve universe that can drift from the documented contract.
  • Extended the agents-mode contract validator beyond Markdown tables so it now checks provider init canonical-shape YAML snippets and the correctness-first / power-mode raised opinion-count bullet lists. This matters because init helpers can no longer drift by keeping the table correct while their copied write-shape block or lane-count prose silently goes stale.
  • Added installer-level agents-mode regression coverage that runs the Bash installers for Codex, Claude, Gemini, and Qwen against disposable stale overlays under /.scratch/, then verifies the emitted provider files against the schema-backed profile and count contract. The new check also caught and fixed a Gemini/Qwen Bash installer sed delimiter bug that could break managed AGENTS.md import rewriting during project installs. This matters because the contract is now proven through the real install path, not only through standalone normalizer and documentation checks.
  • Added a generated-doc sync tool for the agents-mode contract. scripts/sync-agents-mode-docs.py --check now verifies that the operator reference and all provider init surfaces match the JSON schema/preset sources for preset tables, raised opinion-count lists, and canonical YAML snippets, while --write refreshes those generated blocks after intentional contract edits. The main contract validator invokes the check directly, so generated docs can no longer drift silently while the schema remains correct.
  • Added an init-time power-mode preset for the hardest tasks where the operator wants maximum useful result rather than merely more speed or more fan-out. The preset combines forced delegation, forced parallelism, forced MCP use, external worker/reviewer preference, pinned top production-provider policy, Opus-max Claude profile on the Codex line, and two-opinion advisory/review lanes while keeping neutral workdirs and production-only auto routing. This matters because complex work now has a named high-power posture without changing the quiet shared default or promoting Gemini/Qwen into production profiles.
  • Aligned the Qwen init preset surface with the full shared preset ladder instead of keeping a shorter expansion summary. Qwen now documents the same key-by-key default, absolute-balance, external-aggressive, correctness-first, power-mode, and max-speed matrix as Gemini and the production packs, including fixed two-opinion advisory/review lanes for correctness-first and power-mode; the Qwen validator now checks this parity. This matters because Qwen remains example-only, but its integration example should be as complete as Gemini and should not drift from the canonical agents-mode contract.
  • Added canonical reserveResolver support to agents-mode so the symbolic reserve profile candidate can be bound without bloating every lane order. Operators can now use reserveResolver: claude-sonnet, claude-wrapper, wrapper:<command>, or disabled; wrapper:<command> is documented as a PATH-resolved command or repo-relative wrapper path, while normalization strips reserve when the resolver is disabled and preserves legacy externalClaudeApiMode: disabled by converting it to reserveResolver: disabled. This matters because advisory/review lanes can intentionally use a weaker or broader reserve transport such as Sonnet or a local wrapper without turning that transport into a production provider or worker/editing fallback.
  • Renamed the old claude-secret profile candidate to symbolic reserve and retired externalClaudeApiMode from the canonical agents-mode schema. New defaults and init/toggle flows now express the supplemental read-only path only through advisory/review externalPriorityProfiles, while normalization maps legacy claude-secret entries to last-position reserve, preserves legacy externalClaudeApiMode: disabled by stripping reserve, and drops the retired scalar from canonical output. This matters because operators can now attach any approved read-only reserve resolver without making the schema look Claude-specific or allowing the reserve path into worker, editing, installer, or publication lanes.
  • Rebased the shipped external provider priority profile on the 2026-05-01 full-v2-hard-r2 benchmark release. The repo now carries a release-backed routing evidence note with the /40 model totals, all 40 score slots, and the final 12 + 1 line-priority read; shared/agents-mode.defaults.yaml, operator docs, Codex/Claude external-dispatch references, and normalization checks now use the consolidated lanes (design.ui-ux-structure, worker.reasoning-constraints, worker.ui-implementation, worker.visual-graphics-visualization, review.security, and review.ui-visual-correctness) instead of the older split UI/decorative/review-visual profile names. This matters because production externalProvider: auto now follows the accepted Codex/Claude professional-fit evidence while still keeping owner roles out of generic external adapters and keeping Gemini/Qwen out of production profiles.
  • Compressed Codex skill frontmatter descriptions into balanced startup metadata and tightened the Codex skill catalog budget from 5000 to 3000 description characters. This preserves role-selection anchors such as Qt, ETL, IaC, SLOs, high-DPI, auth, secrets, CI/CD, and external-provider provenance while keeping detailed role contracts in each skill body. This matters because Codex loads descriptions before selecting skill bodies, and compact metadata reduces startup truncation warnings without degrading role discovery quality.

2026-04-30

Changed

  • Added shared Engineering hygiene rules for mechanism inventory and state-synchronization ownership before introducing new architecture paths. Agents now must identify the existing owner and propagation pipeline for non-trivial state sync, validation, parsing, persistence, routing, scheduling, retry, cache, auth, config, or cross-surface behavior before adding a parallel mechanism; cross-surface state changes must avoid split-brain UI/API/tray/CLI paths, preserve canonical backend semantics, define lifecycle ownership and diagnostics, and verify representative surfaces against the same final state. This matters because architecture fixes should extend the maintained owner instead of creating hidden second paths that bypass events, validation, audit, or user-visible reconciliation.
  • Promoted the global Markdown formula hotfix back into the shared governance source and operating-model references. Human-facing Markdown docs that add or materially revise formulas now must use dollar-delimited inline math by default, avoid multi-line $$...$$ display blocks unless the target renderer is explicitly verified, avoid \sb / \sp compatibility hacks, scan for unbalanced dollars and broken Markdown table pipes, keep formulas out of headings, brace subscripts and superscripts, and state each formula's applicability, assumptions, units or dimensions, variable meanings, and owning source when those facts are not obvious. This matters because scientific and engineering notes should render reliably while making special-case formulas and reduced models visibly scoped.
  • Revised the existing shared "never guess" rule instead of adding a second competing AXIOM. Root-cause, bug-fix, runtime/UI, native API, and "it does not work" claims now require concrete observable data that rules out plausible alternatives before behavior changes proceed; the Claude feedback memory carries the same measurement-first guidance. This matters because faster iteration is only useful when each iteration is anchored in evidence rather than plausible-sounding diagnosis.
  • Clarified the installed runtime-entrypoint split for the production Codex and Claude lines. Root docs now state explicitly that installed Codex AGENTS.md should stay the compact universal minimum while detailed installed role/runtime guidance lives in the installed skill bodies and built-in .codex/agents/*.toml overrides; shared/provider references remain source-tree maintainer canon, and Claude remains the analogous short CLAUDE.md entrypoint plus .claude/agents/*.md role-file model. This matters because maintainers now have one clear place for detailed guidance without re-expanding the installed Codex AGENTS.md, and the production-vs-example routing boundary for Gemini/Qwen is unchanged.
  • Aligned operator-facing README and install documents with the shared terminology discipline by adding concise terms-and-abbreviations sections to the main repo docs, provider runtime layout reference, and provider source-tree READMEs. This matters because the production-vs-example provider boundary, runtime/install surfaces, and mixed CLI/MCP/agents-mode vocabulary should be readable without relying on tribal context.
  • Hardened the Windows PowerShell validation wrappers so installed global Codex and Claude packs can validate from their runtime roots even when the current directory is not inside a git checkout. This makes the documented global install verification path reliable for ordinary Windows operator shells.

2026-04-29

Changed

  • Added an explicit shared verification rule for subagent output: a subagent PASS, summary, or claimed test result is now treated as a claim that the orchestrating owner or next accountable gate must verify against files, diffs, artifacts, logs, command output, or other repo-standard evidence before accepting, forwarding, or using it for completion. This matters because parallel and external helper work should reduce latency without turning agent reports into unverified truth; the Codex, Claude, Gemini, and Qwen validators now check that the governance surface keeps this rule.
  • Added a shared visual-artifact verification rule for generated images, diagrams, drawings, renders, charts, plots, screenshots, CAD/exported drawings, and similar artifacts. These outputs now require actual visual inspection before acceptance, commit, user delivery, or evidentiary use; file existence, generation logs, metadata, hashes, and model claims are not enough. Pack validators now guard both the shared governance rule and the shared operating-model reference where that source is available. This matters because visual artifacts can be syntactically generated but visually blank, clipped, mislabeled, corrupted, or semantically wrong.
  • Added a shared documentation terminology rule for new and materially revised human-facing documents. Documents that use domain terms, role names, provider/model names, workflow labels, acronyms, or potentially unclear English terms now need a final Terms and Abbreviations section, or a localized equivalent, that expands and explains those terms. This matters because mixed English/Russian governance and skill-pack docs should remain readable to operators without forcing them to infer abbreviations from context; existing documents are not mechanically rewritten unless a glossary cleanup pass is explicitly in scope.
  • Tightened the root router default install so pressing Enter now installs only the production Codex/Claude pair, and removed the "all available root installs" menu path that could install Gemini and Qwen together with production packs. Gemini and Qwen remain individually selectable explicit example integrations, but they are no longer reachable through a default-style aggregate installer path. This matters because providers classified as WEAK MODEL / NOT RECOMMENDED must stay out of default installs, not only out of production auto routing.

Fixed

  • Removed the duplicated source-side Gemini and Qwen AGENTS.shared.md copies. shared/AGENTS.shared.md is now the only source-tree governance canon for those example integrations: src.gemini/GEMINI.md and src.qwen/QWEN.md import it directly in the monorepo, while the installers still materialize provider-local runtime AGENTS.md files for installed Gemini and Qwen targets. This matters because source governance updates now have one maintained owner instead of near-identical provider copies that could drift.
  • Fixed the provider installers so repeated installs skip unchanged runtime payload directories or files instead of deleting and copying every pack-owned item on each refresh. Codex now skips unchanged skill directories, Claude now skips unchanged agent subdirectories and command or agent files, Gemini/Qwen now skip unchanged extension payload items, and the example extension READMEs plus installed validators now carry the same not-recommended provider label without confusing source-root README requirements with installed extension README requirements. This matters for global installs whose runtime payload target is a junction into a OneDrive-backed directory: active provider sessions can scan metadata while an install is running, and unnecessarily rewriting unchanged files can surface transient Windows os error 32 file-lock reads on unrelated runtime files. Changed payload items are still replaced, intentional legacy cleanup still runs, and user-added items are still preserved.
  • Fixed Codex skill frontmatter descriptions that used unquoted plain scalars containing :, which made YAML parsers reject affected SKILL.md files such as visualization-engineer. The Codex validator now also checks skill frontmatter as YAML, so future metadata-budget edits cannot pass while still being invalid startup metadata. This matters because Codex and editor tooling read the frontmatter before the skill body, and a single invalid description can break skill discovery even when the rest of the pack validates structurally.

2026-04-28

Changed

  • Reduced Codex skill frontmatter descriptions to compact startup metadata and added a validator budget gate for the Codex skill catalog. This matters because Codex loads all installed skill descriptions before selecting a skill body, so verbose role metadata could trigger startup truncation warnings and hide useful routing cues. Detailed role contracts remain in each SKILL.md body, while validate-skill-pack now fails if descriptions grow past the per-skill or aggregate metadata budget.
  • Hardened installed Codex validation for real global CODEX_HOME layouts. The validator now resolves symlinked skills/ and AGENTS.md without losing the actual ~/.codex/agents/ runtime root, and it keeps preserved user-added skills as non-blocking orphan warnings instead of treating their metadata as Orchestrarium pack-budget failures. This matters because long-lived global installs can keep custom skills and Windows/WSL link layouts while still receiving a strict pack-owned validation gate.
  • Added an explicit file-based prompt delivery rule for provider-backed CLI runs. External consultant, worker, and reviewer launches that carry substantive task prompts now write those prompts to temporary files and feed them through stdin or a provider-supported file-input mechanism, keeping argv limited to flags, model/profile options, and file paths. This matters because long inline prompts are fragile across Windows shells, quoting layers, logs, and process listings; tiny smoke checks and documented provider limitations remain allowed deviations when recorded.
  • Reclassified the secret-backed Claude wrapper from a fallback transport into the separate claude-secret advisory/review profile candidate. The shared defaults now place claude-secret only after primary claude and codex on advisory and review lanes, worker lanes strip it, and the normalizer plus pack validators enforce that Gemini/Qwen stay out of production profiles while claude-secret never becomes a scalar externalProvider. This matters because weaker secret-backed Claude can still provide an additional advisory or review opinion when explicitly enabled, without silently replacing primary Claude or leaking into implementation, editing, installer, publication, or other mutating work.
  • Reframed the root integration surfaces around the current provider posture instead of treating every provider line as equally production-ready. README.md, INSTALL.md, the shared runtime-layout note, the external-worker design spec, and the root router installers now present Codex plus Claude as the shipped production path, keep Gemini and Qwen labeled as explicit WEAK MODEL / NOT RECOMMENDED example integrations, remove the retired production-schema references to gemini-crosscheck and Gemini-specific fallback/workdir knobs, and make the root router expose Qwen only when matching scripts/install-qwen.* entrypoints actually exist. The agents-mode normalizer now also strips Gemini and Qwen out of any existing custom externalPriorityProfiles provider list during read or reinstall normalization, and the pack validators cover that migration plus the Qwen not-recommended label. This matters because maintainers now see the real production-vs-example boundary at the root entrypoints, stale local auto profiles cannot silently preserve Gemini/Qwen production routing, and the install menu no longer promises a Qwen root path before the repo has the necessary launcher scripts.

2026-04-17

Changed

  • Clarified the maintainer contract for the Orchestrarium monorepo itself so runtime gates do not invent a local Codex install by accident. The repo-local AGENTS.md, README.md, and INSTALL.md now state explicitly that this repository is the installer/source tree, that maintainers may intentionally rely on the global Codex install at ~/.codex/, and that a missing local .agents/.agents-mode.yaml inside Orchestrarium/ is not automatically a broken state. This matters because lead- or consultant-driven maintenance inside the monorepo must ask whether to use the global install or to create a repo-local install through the installers, instead of hand-writing .agents/ just to satisfy a gate.
  • Changed the Codex-line first-write default for externalClaudeProfile from sonnet-high to opus-max across the installer seed, init/toggle flows, and reference docs. This matters because new .agents/.agents-mode.yaml files created by Codex install or by Codex-side helper flows now start from the stronger Claude profile automatically, while explicit preset choices and manual overrides can still select sonnet-high where a lighter profile is intentional.
  • Fixed the consultant closeout contract so consultantMode: disabled now really disables consultant usage instead of leaving a hidden lead closeout blocker. Lead, consultant, and operator-reference surfaces across the packs now treat consultant sweeps as explicit or repo-policy-driven advisory work, and a disabled consultant mode now waives consultant closeout rather than forcing a formal batch-close escalation for tasks that did not need consultant input in the first place.
  • Simplified the Claude external transport contract by collapsing the old split externalClaudeSecretMode and externalClaudeApiMode story into one honest wrapper toggle, externalClaudeApiMode. The shared defaults, provider addenda, lead dispatch contracts, helper docs, and validators now treat the named API path as the secret-backed Claude wrapper itself, move that key higher in the canonical agents-mode layout, and annotate inline YAML comments with both allowed values and the default so operators no longer have to reverse-engineer which Claude flag actually owns the wrapper route.
  • Tightened the operator-facing guidance around cost and routing expectations. README and the routing references now state plainly that Orchestrarium is tuned for maximum execution effectiveness rather than minimum token spend, and the visual or Gemini-first routing examples are now described only as explicit provider overrides or repo-local custom profiles instead of hidden shipped defaults. This matters because maintainers and operators now get an upfront warning about rapid token burn on large or heavily fanned-out runs and are less likely to mistake sample routing postures for built-in behavior.
  • Fixed the installer-side agents-mode maintenance contract so reinstall now repairs stale installed overlays instead of silently preserving obsolete pack-owned schema. When an existing .agents-mode.yaml is present, the Codex, Claude, and Gemini installers now normalize it to the current canonical form, preserve effective scalar choices such as consultant/delegation/MCP/workdir preferences, refresh shipped profile and opinion-count blocks to the current pack version, drop retired keys such as externalClaudeSecretMode, keep externalClaudeProfile only on the Codex line, and restore the current inline default comments. This matters because schema/default updates now actually land in long-lived global and project installs without forcing operators to delete their overlay by hand.
  • Changed the overlay-resolution contract so a missing project-local .agents-mode.yaml no longer means "act as if no operator state exists" when the matching global Orchestrarium install is present. Codex, Claude, and Gemini docs plus role surfaces now resolve overlays in the same order: local canonical file, local legacy file, matching global canonical file, then matching global legacy file; read-time normalization happens in the scope that supplied the effective config, and read-only fallback no longer fabricates a project-local override just to make routing work. This matters because globally installed operator defaults now apply honestly across projects until an explicit project-local override is created, instead of leaving agents to report misleading no file or disabled-state results.
  • Added a shared parallelMode: manual | auto | force operator key so parallel helper routing is no longer only implied through external-only surfaces. The canonical agents-mode reference, init/toggle flows, provider operating-model addenda, and the shared defaults now treat parallelMode as the general fan-out rule for any independent helper or subagent lanes, while external opinion counts and brigade routing remain explicit overlays on top. This matters because operators can now force safe parallel work for speed, keep it judgment-based, or keep it explicit-only without conflating that choice with external-provider settings.

2026-04-14

Changed

  • Fixed the Claude secret-backed transport wrappers so they no longer depend on a separate claude-api executable. Both .claude/agents/scripts/invoke-claude-api.ps1 and .sh now read the nearest Claude SECRET.md, export the declared ANTHROPIC_* environment, and then launch plain claude; this matters because the documented Claude fallback path can now run in environments where claude is installed but no extra claude-api shim exists.
  • Synced the operator docs and validation surface to the new wrapper contract. The Claude transport reference, install/help text, and skill-pack validator now describe the secret-backed wrapper as the canonical Claude API path, prefer CLAUDE_BIN as the explicit binary override, and stop claiming that a direct claude-api command must exist on PATH for the repo-local Claude fallback to work.

2026-04-13

Changed

  • Fixed the legacy .agents-mode migration path across all six installers (Codex, Claude, Gemini — PowerShell and Bash). The legacy file variable was set to the same .agents-mode.yaml path as the canonical target, so the migration function could never actually find or migrate an extensionless .agents-mode file. The legacy variable now correctly points to .agents-mode (no extension), so operators with older extensionless overlays get migrated forward into .agents-mode.yaml as documented.
  • Added a shared externalModelMode: runtime-default | pinned-top-pro policy across the live agents-mode schema, operator reference, init surfaces, consultant and external-adapter contracts, and release-facing docs. runtime-default keeps the resolved provider on its own runtime-selected model/profile, while pinned-top-pro means one strongest documented provider-native path plus one bounded same-provider fallback used only for usage-limit or quota exhaustion: Codex uses the configured top model/profile path documented in the current agents-mode reference and allows gpt-5.3-codex-spark only on fully autonomous low-reasoning worker lanes, Claude keeps opus-max and falls back by transport through claude-api instead of dropping to sonnet-high, and Gemini uses gemini-3.1-pro then gemini-3-flash.
  • Tightened the external-execution contract so the new model policy does not get confused with transport choice or host-subagent routing. Claude transport now hangs off the single wrapper toggle externalClaudeApiMode, Codex-line externalClaudeProfile remains the narrower Claude override, and Gemini fallback semantics now apply only under pinned Gemini runs instead of being described as a generic always-on hop.
  • Clarified the fallback semantics so operators do not treat every fallback as the same kind of downgrade. Repo-local policy now records that gpt-5.3-codex-spark and gemini-3-flash are bounded mechanical overflow paths for separate-limit or budget relief only, while claude-api is the accepted economical near-full-strength Claude transport that may be used either after Claude CLI limit exhaustion or immediately through the explicit externalClaudeApiMode: force toggle.
  • Lead-side external launch now has an explicit Windows shell fallback rule without changing the normal launch path or adding a new agents-mode toggle. When a direct external run fails because the native Windows shell is blocked by shell/bootstrap, execution-policy, or environment-policy problems, the lead may retry once through Git-for-Windows Bash / MSYS; the WSL bash.exe stub stays forbidden, and ordinary provider auth/quota/model failures still remain provider failures rather than shell-fallback triggers.

2026-04-12

Changed

  • Added externalGeminiFallbackMode to the canonical agents-mode schema and live operator surfaces. The new default is auto: Gemini-backed external work now targets gemini-3.1-pro first and allows one fallback retry on gemini-3-flash only for quota, limit, capacity, HTTP 429, or RESOURCE_EXHAUSTED-style Gemini failures instead of keeping older pre-Gemini-3 references around.
  • work-items/ is no longer treated as publication-facing tracked repository content in Orchestrarium. The directory remains the local recovery and handoff surface for admitted work on the operator machine, but root .gitignore, repo-local task-memory guidance, and the publication gate now keep it out of tracked git so execution memory no longer leaks into GitHub by default.
  • Project installers now reinforce that local-only task-memory rule instead of relying on operator memory. Codex, Claude, and Gemini project installs now ensure both /.reports/ and /work-items/ are ignored in target repositories, so freshly bootstrapped projects inherit the same local-only recovery boundary the monorepo now expects.
  • agents-mode init now exposes one explicit safe-start preset plus a small family of operator postures instead of implying that one default should cover every workflow. default stays the quiet safe-init path, absolute-balance is the everyday center, and the more aggressive external-aggressive, correctness-first, and max-speed shortcuts now let operators choose external saturation, verification density, or throughput deliberately without confusing that choice with the persisted provider-order profile name balanced.
  • External lane routing is now more explicit about what each provider is actually good at. The canonical lane matrix now separates broad UI modernization from surgical UI cleanup, keeps performance and architecture review as a distinct reviewer lane, and adds worker.systems-performance-implementation so Rust hot paths, media-pipeline work, and other systems/perf-sensitive implementation can route to codex > claude > gemini without overloading the ordinary implementation lane. Shared scalar defaults, the provider universe auto | codex | claude | gemini, and the rule that claude-api is transport-only under the Claude provider all stay unchanged.
  • The install surfaces, root manuals, and provider references now describe the same current routing contract instead of forcing operators to reconstruct it from mixed older prose. README, INSTALL, the canonical agents-mode reference, and the provider addenda now agree that externalProvider: auto is lane/profile-driven, that claude-api is a secondary Claude transport rather than a fourth provider, and that provider-local references are addenda to the live canon rather than a second hidden source of truth.
  • Standalone Codex, Claude, and Gemini entry docs now expose the same release-relevant operator cues as the monorepo entry surfaces. Their README and INSTALL front doors now call out the current init-time preset family and point operators directly at the explicit systems/perf lane in the canonical agents-mode reference, while docs/provider-runtime-layouts.md now names the Gemini and root-router install surfaces among its authoritative inputs so release audits do not rely on an incomplete source list.

2026-04-11

Changed

  • Gemini no longer validates or installs as a decorative lead-only scaffold. The Gemini source tree, standalone pack, and runtime docs now carry the full shared role catalog in skills/, the preview specialist-team layer in agents/, and the same team-template inventory used by the neighboring packs. Gemini validators now fail if that role or team surface is missing, and Gemini installers now materialize .gemini/agents/ in real installs instead of silently shipping only skills/ and commands/.
  • Gemini installs now materialize the shared-governance layer as installed AGENTS.md, and GEMINI.md imports that file through Gemini CLI's standard markdown import mechanism instead of pretending shared governance should be baked into .gemini/settings.json. The source tree still keeps src.gemini/AGENTS.shared.md as the canonical shared module, but install output now matches the cross-provider runtime convention better.
  • Gemini reinstalls now preserve user-side @... imports that live in the managed GEMINI.md import block next to @./AGENTS.md, matching the Claude-side reinstall contract instead of silently dropping local includes on refresh. Project installs also keep an existing root AGENTS.md intact, so repo-owned shared governance can be imported without overwriting user-owned project policy files.
  • The root router installers, install.sh and install.ps1, now install all three provider lines instead of stopping at Codex and Claude. Maintainers can choose Codex, Claude Code, Gemini CLI, or all three from one entrypoint, and the new root scripts/install-gemini.* path uses the same validated Gemini payload shape as the standalone Gemini branch.
  • Added missing PowerShell validator wrappers to all provider source trees. src.codex, src.claude, and src.gemini now each carry a ps1 companion for their source-surface validation script, so Windows maintainers no longer have to drop into Git Bash just to run the canonical pack checks.
  • Shared governance and runtime-layout docs now explicitly require provider/runtime claims to distinguish between official provider behavior, Orchestrarium repo-local conventions, and observed installed behavior. This tightens the review and documentation contract around provider-native surfaces so source-tree patterns or local install choices are less likely to be misreported as official provider guarantees.
  • External routing docs now spell out a fail-fast eligibility gate before any provider or CLI probing. Operators and leads can now see in one place that $consultant stays advisory-only, $external-worker covers the full non-owner worker lane, $external-reviewer covers review and QA, and only owner roles such as $product-manager and $lead remain generic-external-unsupported by default, which prevents wasted runtime investigation when a user explicitly asks for external.
  • agents-mode now supports switchable named external-priority profiles and per-lane multi-opinion counts instead of hiding one fixed provider order behind externalProvider: auto. The shipped balanced profile preserves the quiet defaults, while gemini-crosscheck is the explicit repo-local profile for bringing Gemini into broader non-visual advisory and review second-opinion sets when one external opinion is not enough. This keeps scalar externalProvider overrides intact, keeps claude-api as a Claude transport rather than a fourth provider, and makes the broader Gemini routing policy visible in both docs and live overlays.
  • Removed the retired consultant fallback mode from the live contract across Codex, Claude, and Gemini. consultantMode is now external | internal | disabled, external means external-only rather than “external with internal escape hatch”, the second-opinion toggles no longer expose auto, and pack validators now fail if the old consultantMode: auto or fallback-approved wording resurfaces.
  • Reading agents-mode flags is now a canonical migration point, not just a passive lookup. Codex, Claude, and Gemini runtime docs, init helpers, second-opinion toggles, dispatch contracts, and external adapter surfaces now require comment-free or stale-format overlays to be rewritten into the current canonical layout before their flags are trusted. This means pack upgrades and doc/schema drift can self-heal existing .agents-mode.yaml files during ordinary reads instead of leaving old comments, missing keys, or retired layout fragments silently in place.
  • Parallel external helper support is now explicit instead of being an implicit side effect of scattered routing prose. Shared governance and all three provider packs now state that external concurrency is not capped at one instance per helper or provider, that externalOpinionCounts still governs distinct-provider opinions for a single lane rather than generic helper multiplicity, and that a new first-class brigade surface exists for launching and aggregating one bounded batch of independent external helper runs.
  • External execution reporting now separates adapter-host wrappers from the real provider run. Shared governance, operator docs, consultant docs, external-dispatch contracts, and validators now require Adapter host runtime as a separate field whenever an internal helper only launches the external call, and they explicitly forbid treating that helper as Actual execution path for $external-worker or $external-reviewer. This keeps logs honest when a pack uses an internal wrapper to reach Claude, Codex, or Gemini, and it closes the misleading case where a Codex helper could appear to have already become a Claude execution.
  • External adapter execution is now direct-launch only for provider-backed consultant execution in external mode plus $external-worker and $external-reviewer. Internal agent/helper/subagent host layers are no longer a valid execution shape; if the orchestrating runtime cannot launch the selected provider directly, the route must fail closed or reroute honestly instead of proxying through a fake internal lane.
  • Gemini preview subagent loading no longer trips over a non-agent agents/README.md. Gemini pack docs were moved out of the loader-visible agents/ path, validators now fail if any top-level agents/*.md file lacks YAML frontmatter, and Gemini reinstall explicitly removes the old legacy agents/README.md from installed runtimes so existing ~/.gemini trees self-heal on refresh instead of continuing to throw agent-definition errors.
  • Installers now seed the canonical default agents-mode file directly into the active target for all three provider lines instead of leaving that overlay to ad hoc first-use creation. In the monorepo, the only editable source default now lives in shared/agents-mode.defaults.yaml; local installs seed the project-local file, global installs seed the provider-home file, and the init helpers now review or update that installed default rather than pretending they are always bootstrapping from nothing. This keeps first-run behavior, docs, and reinstall semantics lined up across Codex, Claude, and Gemini without leaving duplicate seed files in each src.<provider>/ tree.
  • Provider-local agents-mode.defaults.yaml files were removed from the monorepo source trees entirely. Main installers now derive their seeded defaults straight from shared/agents-mode.defaults.yaml, Codex applies its one provider-specific externalClaudeProfile addition at install time, and the main validators now fail if those deleted src.<provider>/agents-mode.defaults.yaml files come back. Standalone pack repositories still ship one pack-root default each so they remain self-contained outside the monorepo.
  • Gemini installers now materialize the official Gemini extension package instead of leaving src.gemini/extension/ as a source-only placeholder. Project and global installs both populate .gemini/extensions/orchestrarium-gemini/ with the active gemini-extension.json, extension README, mirrored skills/, agents/, commands/, and runtime GEMINI.md plus AGENTS.md, and Gemini validators now fail if that installed extension surface is missing.
  • Gemini installs no longer mirror the same Orchestrarium payload into top-level .gemini/skills/, .gemini/agents/, and .gemini/commands/ alongside the installed extension. The Gemini runtime now treats .gemini/extensions/orchestrarium-gemini/ as the canonical Orchestrarium payload while leaving the higher-precedence user/workspace tiers free for deliberate overrides, and reinstall now cleans legacy duplicated top-level Gemini items so users do not hit noisy skill or command conflict warnings on startup.
  • External launch rules are now explicit instead of being reconstructible only from audit failures. The canonical operator reference, external-dispatch contracts, install docs, and Claude runtime docs now spell out that PowerShell should use the .ps1 Claude wrapper, Bash / Git Bash should use the .sh wrapper, CLAUDE_API_BIN is the escape hatch when a shell cannot see claude-api, plain unauthenticated Claude CLI runs should switch to the allowed Claude API transport instead of retrying blindly, Codex commit review should use codex review --commit <sha> without a free-form prompt, and very wide neutral-dir release audits should be split into narrower admitted slices.
  • Fixed the Claude PowerShell API wrapper so it no longer depends on ConvertFrom-Json -AsHashtable, which is unavailable in Windows PowerShell 5.1. The wrapper now accepts both -PrintSecretPath and --print-secret-path, resolves claude-api, claude-api.cmd, or claude-api.exe more defensively, and preserves the same repo-local .claude/SECRET.md then ~/.claude/SECRET.md lookup order.

2026-04-10

Added

  • Added a repo-local publication gate at scripts/check-publication-gate.sh and scripts/check-publication-gate.ps1. The gate now combines the existing staged leak scan with an explicit RELEASE_NOTES.md check for release-relevant staged changes, so maintainers no longer have to rely on prose-only repo rules to remember this step before publication.
  • Added canonical RELEASE_NOTES.md tracking for release-relevant tracked changes in the Orchestrarium monorepo. This gives the repository one stable place to describe important behavior, workflow, and governance changes that matter at publication time instead of scattering that context across commit messages and session reports.
  • Made release-note updates mandatory before publication when staged tracked changes affect installed behavior, governance, routing, role contracts, install surface, or developer and operator workflow. This closes a documentation gap in the publication path: significant changes now need an explicit human-readable explanation before release instead of relying on readers to reconstruct impact from diffs alone.
  • Added a Gemini CLI provider-pack scaffold in src.gemini/. The scaffold follows the official Gemini-preferred model with GEMINI.md as the native entrypoint, skills/<name>/SKILL.md as the expertise layer, commands/*.toml as user command shortcuts, and extension/gemini-extension.json as the future MCP and extension boundary.
  • Added a Codex utility skill, $init-project, for first-time project bootstrap. It configures ## Project policies in the root AGENTS.md and initializes .agents/.agents-mode.yaml from the canonical Codex policy catalog and external-dispatch schema, so Codex now has a direct onboarding entry point comparable to Claude's project-init flow instead of making operators piece that setup together manually.

Changed

  • Added explicit task-continuity and execution-continuity language to the canonical agents-mode reference and the Codex, Claude, and Gemini init entrypoints. Initialized projects now document that side requests pause but do not replace the primary task unless the user explicitly reprioritizes, and that accepted phases should continue forward by default until a real gate blocks progression.
  • Completed the first-time init contract across all three provider lines in the main canonical docs. Codex still bootstraps AGENTS.md plus .agents/.agents-mode.yaml, Gemini stays split between native /init and the optional Orchestrarium overlay helper, and Claude init now also bootstraps .claude/.agents-mode.yaml instead of stopping at project policies alone.
  • Migrated the canonical release log away from a single ## Unreleased bucket into dated sections. Future updates now append under the current ## YYYY-MM-DD heading, which keeps publication history chronological without repeatedly rewriting one shared scratch section.
  • Reframed the public repository wording around cross-provider orchestration instead of describing the monorepo only as two current agent packs. README, shared reference guidance, and GitHub About now present Codex and Claude Code as provider-specific packs built on one shared governance and reference core, which makes the repository's intended extension path for future providers such as Gemini much clearer.
  • Fixed the Claude installer so reinstalling the Claude pack no longer drops user-side @... imports that live in the installed .claude/CLAUDE.md import block next to @AGENTS.md. This matters for local customizations such as @memory/...: pack source still owns the installed Claude section, but user-side imports in the target file now survive reinstall instead of being silently erased.
  • Reworked Engineering hygiene application guidance with explicit rule precedence, working definitions, thematic grouping, and repo-local concretization guidance. This keeps the underlying standards intact while making them much easier to apply consistently during implementation and review, especially when multiple rules pull in different directions or when a reader needs to know which repo-local details must still be defined.
  • Synced Codex and Claude publication-safety and operating-model references so the new hygiene structure and release-notes publication gate are documented consistently across both packs. This preserves cross-pack alignment: the installed policy, maintenance overlays, and methodology references now describe the same expectations instead of leaving one surface stale and another current.
  • Clarified the root development overlays and root install/docs wording so they describe Codex and Claude Code as repo-local monorepo maintenance surfaces instead of mixing current pack purpose with stale Claude-side naming. This makes it clearer who reads AGENTS.md and CLAUDE.md, what those files are for, and what the root installer is asking the maintainer to install.
  • Renamed the remaining Claude-side Claudestrator user-facing strings in the installed pack, Claude commands, maintenance references, and install scripts to consistent Claude Code pack wording, while broadening installer detection so already-installed legacy # Claudestrator files still upgrade cleanly. This removes stale branding from maintainer and runtime surfaces without breaking the reinstall path for repositories that still carry the older Claude header.
  • Aligned the remaining Codex-side installer, menu, and validation wording on Codex pack instead of mixing installed-artifact labels with repo/product labels such as Orchestrarium skill-pack. This removes the last user-facing naming asymmetry between the two packs while preserving repo-context wording where it still matters, such as instructions that refer to the Orchestrarium repository root or layout detection.
  • Reframed the local operator config from the retired consultant-only overlay into broader pack-local agents-mode files. The new canonical schema separates consultantMode from two operator-level preferences, delegationMode and mcpMode, so users can persist always-on vs explicit delegation behavior and auto vs force MCP usage without overloading the consultant-only mode field or repeating those instructions in every prompt.
  • Tightened the semantics of the two operator-level toggles. delegationMode now uses manual | auto | force: manual keeps delegation explicit-by-request, auto leaves delegation enabled by routing judgment, and force makes delegation a standing instruction whenever a matching specialist and viable tool path exist. If the requested delegation path is technically unavailable, the agent must report that constraint explicitly instead of silently pretending the forced delegation happened locally. mcpMode remains auto | force, where force means the config itself is an explicit instruction to use relevant available MCP tools instead of leaving MCP usage to optional judgment.
  • Codex-line agents-mode keeps the optional externalClaudeProfile setting so external dispatch to Claude can deliberately choose between sonnet-high and opus-max instead of relying on provider defaults. Claude-line agents-mode drops that field from its canonical schema because Claude-side external dispatch goes to Codex CLI, so preserving a Claude profile there only created inert local state and confused the contract.
  • First-time agents-mode creation now writes the full default shape instead of a partial file. Codex writes consultantMode, externalClaudeApiMode: auto, delegationMode, mcpMode, both external-preference flags, externalProvider, and externalClaudeProfile; Gemini writes the same shared routing keys plus externalClaudeApiMode: auto; Claude writes the shared shape without the Claude-target keys. This makes new workspace state self-describing and removes later need to infer missing settings after bootstrap.
  • agents-mode files now annotate each key with an inline YAML comment listing the allowed values for that setting. This makes the local operator config easier to inspect and edit safely without reopening the reference docs just to remember which modes are valid.
  • Shared governance now makes two verification expectations explicit without splitting truth across duplicate reminders: verify claims against canonical sources or live state before acting on them, and update the owning canonical artifact in the same change whenever behavior, policy, workflow, config schema, or runtime layout changes.
  • Added docs/agents-mode-reference.md as the canonical value-by-value operator reference for consultantMode, externalClaudeApiMode, delegationMode, mcpMode, preferExternalWorker, preferExternalReviewer, shared externalProvider, and Codex-only externalClaudeProfile. Root manuals now point there instead of relying on scattered prose summaries.
  • Extended the shared operator-mode model to cover Gemini explicitly without pretending Gemini is just another copy of the Codex or Claude lines. The reference docs now distinguish Gemini's official runtime bootstrap (/init), official runtime config (.gemini/settings.json), and Orchestrarium's repo-local .gemini/.agents-mode.yaml overlay for shared consultant/delegation/MCP/external-routing semantics.
  • Hardened Codex reinstall merging by wrapping the installed pack-managed AGENTS.md section in explicit begin/end markers instead of guessing the section end from the last non-blank line. Reinstall now has a deterministic machine-readable boundary when preserving user-managed content below the pack block.
  • Added a Gemini-side init-project helper so the scaffold now has an official split bootstrap path: Gemini /init owns GEMINI.md, while the Orchestrarium helper owns .gemini/.agents-mode.yaml when shared cross-provider routing toggles are needed.
  • Generalized provider-backed external dispatch so Codex or Claude can explicitly route consultant or external-adapter work through Gemini CLI instead of being locked to the historical Codex<->Claude pair. Existing defaults are preserved with externalProvider: auto, so current operator workflows stay stable until a repository explicitly opts into Gemini as the external provider.
  • Fixed the Codex installer's AGENTS.md merge behavior so user-managed content such as ## Project policies can survive reinstall instead of being wiped by a full file replace. This matters now that Codex has an explicit project-init flow: operator bootstrap data is meant to live with the target project, not disappear the next time the pack is refreshed.
  • Promoted the duplicated design-only methodology references out of references-codex/ and references-claude/ into one canonical shared/references/ layer, and converted the old pack-local paths into compatibility pointers. This removes avoidable cross-pack drift in documents that were already semantically shared, keeps exact operational commands out of design references, and creates the extension point a future Gemini pack can reuse without minting a third copy set.
  • Tightened the boundaries around that new shared-reference layer. shared/references/README.md now explicitly names periodic-control-matrix as an intentional pack-local exception because it still carries pack/runtime vocabulary, task-memory layout, and runtime-doc links, and the Gemini wording is softened from "plug in a new pack directly" to "reuse the shared layer as a foundation with local overlays where needed." The pack validators now also check that the canonical shared reference files exist and that the per-pack compatibility pointers still target them, so shared-reference drift is caught before publication instead of slipping past structure-only validation.
  • Extracted subagent-operating-model into a shared core plus pack-local addenda instead of leaving two near-duplicate full copies. The new shared core now owns the common blueprint, routing, role, and governance-model text, while references-codex/ and references-claude/ retain only runtime-specific paths, provider dispatch details, execution-model notes, and repository concretization. This reduces drift in one of the densest methodology documents without pretending away the real Claude/Codex differences that still need local addenda.
  • Closed the remaining risk from that extraction by teaching both pack validators to check the ownership boundary of the new split, not just file presence. They now verify that the shared subagent-operating-model stays free of pack-specific runtime concretization, that each pack-local addendum keeps the required runtime-specific semantics such as agents-mode location, external-dispatch direction, periodic-control ownership, and task-memory concretization where applicable, that the addenda stay structurally bounded instead of regrowing into full near-duplicate methodology copies, and that the canonical shared core plus both English addenda still match their normalized fingerprints. The shared core now points periodic-control readers back to the pack-local matrix through the addendum instead of pretending there is a shared matrix file. Root maintenance docs and INSTALL.md were also updated so maintainers can see the same shared-core plus addendum model directly in the repo entry points instead of having to reconstruct it from the extracted reference files alone.
  • Clarified report and plan traceability for provider-backed execution. The canonical session-log contract now requires external or provider-backed runs to record Execution role, Assigned / replaced internal role, Requested provider, Resolved provider, Actual execution path, Model / profile used, and Deviation reason as separate fields instead of collapsing them into ambiguous labels. Codex and Claude external-dispatch contracts now mirror that same execution-record schema, and the reporting rule now treats runtime-default provider selection as Requested provider: internal plus an explicit Resolved provider rather than leaking auto into the artifact. Adapter-driven reviews such as external-reviewer standing in for qa-engineer can therefore be read unambiguously in .reports/ and .plans/.
  • Fixed the publication-safety gate to inspect the actual staged file set instead of the whole index, which removes self-inflicted false positives from the checker scripts' own regex catalogs. The Claude PowerShell wrapper now also resolves its sibling bash checker correctly from src.claude/agents/scripts/, so the same publication gate works again from the repository source tree on Windows instead of pointing at a non-existent installed .claude/... path.
  • Claude-target external dispatch now treats the secret-backed wrapper path as the single named transport toggle under externalClaudeApiMode: disabled | auto | force. auto keeps the ordinary Claude CLI path first and allows one wrapper-backed fallback after CLI/auth/quota-style failures; force starts on the wrapper immediately. The contract still forbids silent profile downgrades or provider switches, so if the allowed Claude path still fails the provider is reported unavailable in the normal way.
  • Added a dedicated provider runtime-layout reference at docs/provider-runtime-layouts.md that splits global and local installed surfaces for Codex, Claude Code, and Gemini. This closes a recurrent source of confusion between monorepo source trees such as src.codex/ and the paths the actual provider runtimes read after installation, and it also records one important Gemini boundary early: Gemini CLI is native around GEMINI.md, .gemini/commands/, and extensions rather than the skills/ or agents/ trees used by the current Codex and Claude packs.

Terms and Abbreviations

  • AGENTS.md: the Codex-readable governance and instruction file installed into a repository or provider home.
  • agent-run-ledger.*: helper script family that initializes legacy work-item ledger files and appends validated agent-runs.jsonl events.
  • agent-runs.jsonl: JSONL execution ledger stored beside status.md for machine-readable work-item state.
  • API: Application Programming Interface, a programmatic contract exposed by a tool, runtime, or service.
  • AXIOM: a top-priority rule label; in this change it refers to a global-only "never guess" rule that was folded into the existing shared discipline instead of copied as a competing section.
  • Bash: a Unix-style command shell used by the .sh installers.
  • CAD: Computer-Aided Design, software and file formats used for technical drawings, geometry, or engineered layouts.
  • CLI: Command-Line Interface, a terminal command surface such as codex, claude, or gemini.
  • Codex: the OpenAI Codex runtime and provider pack maintained by this repository.
  • Claude Code: Anthropic's Claude Code runtime and the matching provider pack maintained by this repository.
  • data point: one concrete observed value, log line, field, return code, screenshot fact, or command result used as evidence.
  • Gemini CLI: Google's Gemini command-line runtime and the example provider pack maintained by this repository.
  • hash: a deterministic digest of file bytes or content; useful for identity checks but not a substitute for visual inspection.
  • HTTP: Hypertext Transfer Protocol, the web protocol whose status codes include quota and rate-limit signals such as 429.
  • INSTALL.md: this repository's installation guide.
  • JSONL: JSON Lines; one JSON object per line, used here for append-only execution events.
  • junction: a Windows directory link that redirects one path to another path on disk.
  • Markdown: a lightweight markup format used for repository documentation.
  • MCP: Model Context Protocol, a protocol used to expose tools and resources to agent runtimes.
  • metadata: descriptive file or runtime information such as dimensions, timestamps, MIME type, or generator fields.
  • native API: an operating-system or runtime interface called directly by code, where return codes and structure fields often carry the evidence needed for debugging.
  • MSYS: a Windows POSIX-style shell/runtime family commonly used by Git Bash and similar tooling.
  • PATH: the operating-system environment variable used to find executable commands.
  • PASS: a gate or validator result meaning the checked artifact may proceed.
  • QA: Quality Assurance, verification work focused on tests, regressions, and acceptance criteria.
  • Qwen Code: the Qwen command-line runtime and the example provider pack maintained by this repository.
  • README.md: the primary human-facing overview document for a repository or pack.
  • root router: the top-level install.sh or install.ps1 entrypoint that dispatches to pack-specific installers.
  • SECRET.md: a local-only file used for secret-backed Claude transport configuration; it must not be published.
  • SKILL.md: the entrypoint file for a Codex skill definition.
  • status.md: human-readable recovery summary for the active work item.
  • OneDrive-backed directory: a directory stored under Microsoft OneDrive synchronization, where sync and hydration behavior can affect file access timing.
  • TeX: a typesetting system and math-notation language used by many Markdown math renderers.
  • UI: User Interface, the visible or interactive surface presented to a user.
  • visual artifact: an image, diagram, drawing, render, chart, plot, screenshot, CAD/exported drawing, or similar visible output.
  • WEAK MODEL / NOT RECOMMENDED: the repository's classification for example-only providers that must stay out of production auto routing.
  • WSL: Windows Subsystem for Linux, a Linux compatibility environment on Windows.
  • YAML: YAML Ain't Markup Language, the structured data format used by files such as .agents-mode.yaml.
  • YYYY-MM-DD: date format with four-digit year, two-digit month, and two-digit day.