This file is the canonical release log for tracked Orchestrarium monorepo changes that matter at publication time.
Update RELEASE_NOTES.md in the same change whenever tracked modifications affect installed behavior, governance, routing, role contracts, install surface, developer or operator workflow, or other release-relevant user-facing expectations.
Release-note entries should explain improvements, not just name them. For each release-relevant entry, state what changed, why it matters, which user or operator workflow is affected, and any important preserved behavior, boundary, or exemption when that context helps the reader understand the impact.
Keep the log in reverse-chronological ## YYYY-MM-DD sections. Add new explanatory bullets under the current date heading, or create today's date heading if it is missing. Do not keep a long-lived ## Unreleased bucket or silently rewrite older dated sections in place.
Do not add entries for purely local-only hygiene edits such as formatting, link fixes, report-only churn, scratch cleanup, archive moves, or non-semantic wording cleanup.
-
Added a new-session orientation guide so maintainers do not have to rediscover the source/runtime boundary every session.
docs/new-session-guide.mdis now the first-read guide for Orchestrarium maintenance sessions, linked fromREADME.md,docs/README.md,AGENTS.md, andCLAUDE.md. It states the source-first rule directly: tracked Orchestrarium source is canon, while~/.codex/,~/.claude/, project.agents/, and project.claude/are installed runtime outputs to sync only after the owning source file is fixed. It also maps the main source owners, runtime surfaces, common maintenance patterns, validation commands, and "do not" cases such as patching Claude-specific Agent mechanics into Codex by analogy. Why it matters: recent incidents came from starting in stale installed surfaces or having to re-explain that this monorepo is the primary source; the guide makes that orientation durable for new sessions. -
Conversational terminology discipline added to the shared governance spine. The existing
Documentation terminology disciplinerule required a## Terms and Abbreviationssection only in human-facing DOCS; the main conversation's live CHAT replies still emitted unexplained jargon, acronyms, and domain/role/provider/tool terms, which the user reported as unreadable ("в чате mainconv постоянно сыплет ими и ничего не поймешь что он говорит"). The rule inshared/AGENTS.shared.mdnow also binds term-dense conversational replies: expand jargon inline or list the terms. It was added as a compact clause to the spine's existing rule rather than as a new card or a cap bump — per the spine validator's own documented policy (scripts/validate-agents-spine.py), a genuinely new always-on rule is the trigger to keep the operational card compact, not to raise the size ceiling; the spine went 39,815 → 39,885 chars, still 15 under the 39,900 cap. Why it matters: governance previously made published docs readable to outside readers but left the live working conversation dense with insider shorthand, so the human steering the work could not follow it.
- Codified "read broad, change narrow" in the shared governance spine. The minimization disciplines (
Change-surface minimization,smallest safe reversible change) govern what a change touches, not what it reads — but that boundary was implicit and was being misapplied as license to read code narrowly (a few lines at a time) to conserve context, which under-reads and yields wrong changes. TheReadability and local reasoningrule inshared/AGENTS.shared.mdnow states it explicitly: "Before changing, read what the change needs, not a narrow slice; minimization bounds the change, not the read." It was added as a compact clause with no cap bump (per the spine validator's own policy a genuinely new always-on rule keeps its operational card compact rather than raising the size ceiling); the adjacentsecondary checksline was tightened to recover headroom, leaving the spine at 39,887 / 39,900. Why it matters: correctness depends on reading enough surrounding context — the whole function, its call sites and dependents — so reading too little to save tokens is a false economy, while the minimization rules correctly bound only the change surface, not the reading.
-
Made "stop closeout, work" a hard continuation trigger instead of an acknowledgement trap. The existing continuity rules already said a passed slice is not goal completion and that
дальше/go/продолжайcontinue the next incomplete item, but they did not explicitly cover user corrections such as "завязывай с closeout, работай". Agents could therefore answer with a bare acknowledgement like "OK, closeout removed" and still stop. Shared governance now treatsstop closeout,завязывай с closeout,работай, and equivalent continue-working corrections as instructions to immediately take the next concrete action inside the active task unless genuinely blocked. The rule is mirrored into the lead operating surfaces, agents-mode continuity reference, shared subagent operating model, and new-session guide. Why it matters: the user should not have to spend every new session re-teaching Codex that a closeout correction means continue working, not produce another meta-summary. -
Made context compaction a first-class continuity trigger instead of a license to summarize and stop. The shared spine already said side requests do not replace the primary task and that partial batches are not completion, but it did not name the context-compaction/resume-from-summary transition explicitly. In practice that allowed an agent to wake up after a compacted summary, trust stale sliced evidence, emit a final-style status, or stop instead of restoring the active task. The shared
AGENTS.shared.mdprimary-task rule now says context compaction may interrupt but not replace the task; on resume from a summary, the agent must restore the primary task, next unchecked step, and open evidence gates before acting. The same rule is mirrored into the lead operating surfaces and theagents-modecontinuity references for Codex, Claude, Gemini, and Qwen. Why it matters: a compacted summary is a recovery artifact, not a closeout signal; required gates such as full-window crop validation must survive the context transition. -
Removed the Qt-only bias from
$windows-gui-manual-testingand made crop validation an early hard gate across the visual-video workflow. The Codex UI metadata exposed the common skill as "Qt Windows GUI Manual Testing" and its default prompt asked for "a Qt desktop UI", while the skill description and shared common-skill index also led with Qt wording. That made the skill look toolkit-specific even though its workflow already covered native Windows, WebView/native-child, and other desktop surfaces. The Codexagents/openai.yamlnow names it "Windows GUI Manual Testing", the provider skill descriptions and overview text name Windows desktop GUI broadly, and the shared index/README stop describing it as Qt-first. Separately, the crop rule that previously lived deep inside$windows-gui-manual-testingstep 3 is now a top-level hard crop gate in both$windows-gui-manual-testingand$analyzing-video-bugs: before judging cropped screenshots, videos, or extracted frames as full-window evidence, the agent must verify the crop against an uncropped full-frame/full-desktop image and confirm the full target window is present, including the top-right close button for normal overlapped windows; clipped crops must be discarded and re-extracted. The live~/.codex/skills/copies were updated with the same wording so current Codex sessions see the corrected metadata before the next reinstall. Why it matters: the reported MarkMello failures were not caused by missing visual methodology — the methodology existed — but by trigger/UI wording and buried crop validation that let agents take a "convenient" crop before proving it was valid. -
Closed the UNC machine-local-path leak gap and fixed the U+2026 ellipsis-placeholder false positive in the shared allowlist owner. Two follow-ups against
check-machine-local-path.py(both packs, kept byte-identical) — the single allowlist owner consumed by themachine-local-pathPreToolUse hook directly and by the publication leak-scanner via in-process import offind_machine_paths, so each change applies to both surfaces at once. (1) UNC host/share homes now block. The owner previously detected only drive-letter and POSIX/MSYS homes, so a concrete UNC home such as\\host\Users\<name>or\\server\share\...\Users\<name>written into a tracked file slipped past both the hook warning and the scanner block — the gap the prior scanner entry recorded as a tracked follow-up. A new left-anchored_PATTERNSentry now flags genuine UNC user homes (host or host+share, including deep shares) while a deliberate left-anchor plus host-label requirement keeps it from self-matching a drive-prefixed escaped Windows path literal such as the JSON"C:\\Users\\test"(a doubled\\Users\\with no host label before it is not a UNC path); the\\?\and\\.\device namespaces are excluded by a negative lookahead because their embedded drive form is already caught by the drive-letter pattern. Accepted residual under-match (left out to avoid device-namespace / URL false positives):\\?\UNC\host\share\...and forward-slash//host/Users/.... (2) U+2026 ellipsis placeholder no longer false-positives. A documentation path segment written with the Unicode horizontal ellipsis U+2026 instead of an ASCII...run — e.g.C:\Users\...in backslash, forward-slash, or bare form, or a mixed dot+ellipsis segment — was treated as a concrete account name and flagged; the placeholder filter now accepts the U+2026 character (and any dot/ellipsis mix) as an ellipsis placeholder, matching the existing ASCII...behavior. The no-Python fallback regex incheck-publication-safety.shis intentionally NOT extended for UNC (it is a rare degraded path that already over-blocks). Covered by new gate-safe regression rows in bothtests/test_machine_local_path_hook.pyandtests/test_publication_safety_scanner.py(UNC and ellipsis fixtures are assembled from runtime fragments so the test sources never self-trip the scanner). Why it matters: the hook and the publication gate now catch a real machine-local-path leak class they previously missed (UNC homes), while stopping a false alarm on a legitimate documentation placeholder, so the gate stays both stricter where it should block and quieter where it should pass. -
Publication leak-scanner now allowlists placeholder paths — and in doing so closed three real machine-local-path leak gaps it had.
check-publication-safety.sh(both packs) blockedgit pushwhenever a staged file containedC:\Users\<name>/C:\Users\...— placeholder documentation (the machine-local-path hook's own examples in INSTALL.md / RELEASE_NOTES.md / AGENTS.codex.md) that is safe and already public — because its[A-Za-z]:\\Users\\pattern matched the full content of staged files with no allowlist. It recurred on every commit touching those docs and forced a separate security-reviewer clearance each time. The scanner now routes path-pattern candidate lines through the existingcheck-machine-local-path.pyallowlist (find_machine_paths, resolved relative to each scanner's own location — single source of truth, no duplicated token list), so placeholders pass while concrete machine-local paths still block; non-path patterns (secrets, tokens, transcript markers) stay an unconditional block path; and if Python is unavailable it falls back to a refined regex plus a degraded-mode stderr notice, never fail-open (fail-safe = over-block). A security-engineer designed the change (approach B) and a security-reviewer gated it. In the process it closed three leak gaps the old scanner silently had: (1) forward-slash and leading-slash home forms (C:/Users/,/home/,/Users/,/c/Users/) were never matched on Windows because the bundled MSYS bash rewrites leading-/arguments beforegit grepsees them — the in-process Python matcher is immune; (2) workstation dev roots (D:\dev\, even the repository's own root) were not detected at all; (3) lowercasec:\users\slipped past the case-sensitive pattern. Covered by a new gate-safe regression suitetests/test_publication_safety_scanner.py(leak strings are assembled at runtime so the test file does not self-trip the scanner); the pre-existingtests/test_machine_local_path_hook.pywas given the same gate-safe treatment because the stronger scanner would otherwise flag its legitimate test data. UNC paths (\\host\Users\...) were a known remaining gap in the shared allowlist owner at the time of this change; that follow-up is now closed by the UNC + U+2026 entry above. Why it matters: the gate stops crying wolf on safe placeholder docs (a gate that fails on every commit gets bypassed) while actually catching more real machine-local-path leaks than before.
-
Backported the
$mathtype-book-pagecommon skill's new "Full-Text Coverage Gate" from active Codex downstream use into the canonical pack source. The user iterates on this skill in their live~/.codex/skills/mathtype-book-page/install; this change pulls those edits back so other adopters inherit them on next install instead of rediscovering the gap. What changed:SKILL.mdgained a## Full-Text Coverage Gatesection (verify the translated DOCX covers all source text — no silently dropped paragraphs, sections, or list items — before claiming PASS) and its description trigger list gained "full source-text coverage";references/defective_chunk_repair.mdgained a### Coverage Lanesubsection; the Codexagents/openai.yamlshort_descriptionbecame "Full-coverage MathType…" and itsdefault_promptnow names full source-text coverage as a verification step. Scope: applied tosrc.codex/skills/mathtype-book-page/(all three files, exact backport of the edited global) andsrc.claude/skills/mathtype-book-page/(SKILL.md+references/; Claude has noagents/openai.yaml), keeping the Claude and Codex copies byte-identical per the common-skill dual-shipping contract. Gemini and Qwen copies are intentionally deferred — they sit on an older base (also missing the earlier Typography Normalization / Heading Case sections) and are scheduled for a separate reconciliation pass consistent with their secondary-provider priority; this is a tracked follow-up, not an oversight. This matters because full-text coverage is a load-bearing correctness gate for book-page translation: without it a chunk can pass formula, typography, and figure checks while silently omitting source paragraphs, and the defect is invisible to every downstream check that does not compare against the source PDF's full text. -
Backported active-use methodology into three more common skills, after a pre-install divergence gate caught the repo lagging the live installs. Before re-installing the packs globally, a per-skill comparison of the repo source against the live
~/.codex/skills/install found the repo was behind on ~59 lines of methodology the user had iterated downstream but never backported. Merged into both packs ($analyzing-video-bugsand$mathtype-book-pagestay byte-identical across packs;$windows-gui-manual-testingdiffers only in the Claude/Codex description provider word):$windows-gui-manual-testinggained the full Windows capture recipe (top-level HWND +GetWindowRect+ virtual-screen metrics +DwmGetWindowAttributeextended-frame bounds, UIA preflight with the close-button crop anchor, a verification-frame-before-repro discipline, multi-monitor / high-DPI handling, and the$procId-not-$pidPowerShell gotcha) while keeping the repo's Qt-aware description and intro;$analyzing-video-bugsgained 4fps-coarse-then-60fps-dense extraction, multi-monitor target-window cropping, contact-sheet building, and the Windows-pattern_type globcaveat, plus a fixed dangling§ Capture videocross-reference (now points at the real step);$mathtype-book-pagegained connector-prose-between-display-formulas and inline-condition punctuation / OCR-residue guidance. The two visual skills ($windows-gui-manual-testing,$analyzing-video-bugs) join$bug-huntingas one cross-referenced family ($bug-huntingRule 4 ⇄ the two visual skills' Related sections and shared decision tree);$mathtype-book-pagestays a separate category and is not linked into that family. Why it matters: skills are a two-way channel — they get iterated both in active use and in the repo — so a blind repo→global install silently downgrades any un-backported downstream edits. The pre-install gate makes that divergence visible per skill and direction, and this backport reconciles the repo with both global installs rather than clobbering them. -
Hardened the external-review invocation contract in
src.codex/skills/consultant/SKILL.md. It now forbids--permission-mode planfor a non-interactive Claude review (plan mode makes the reviewer present a plan/handoff instead of emitting the verdict; underclaude -pthe captured stdout is then only the handoff turn, so the actual review never lands in the file) and requires a single-turn self-contained verdict betweenBEGIN_REVIEW/END_REVIEWplus a post-run check that the captured file actually contains the verdict. This matters because a real review dispatched with--permission-mode plansilently returned no usable verdict (the captured file held only "review complete — waiting for your direction") and exit-0 was nearly accepted as PASS; the rule closes the recurring review-output-contract failure (see the 2026-05-28 kosyak). Affected workflow: any operator dispatching a Claude reviewer via CLI. -
Regenerated the stale
subagent-operating-model.mdnormalized fingerprint in the Claude validator (src.claude/agents/scripts/validate-skill-pack.sh,1a8291b9…→f666dc58…). The shared doc was updated in an earlier feature commit but the Claude validator's expected hash was not regenerated in that commit (the Codex validator already carried the correct hash), sovalidate-skill-pack.shhad been failing that one check. Both validators now pass; no doc content changed — only the stale expected hash. -
Shrank the shared governance spine below the Claude Code 40,000-char context warning without losing any enforceable rule.
shared/AGENTS.shared.md(installed verbatim asAGENTS.md, so always loaded by the main conversation) was 61,905 chars and tripped Claude Code's per-file performance warning; it is now ~39,815. Method: each rule's elaboration (rationale, worked examples, recovery procedures, the full glossary, common-skill layout, delegation prose, the full verification-discipline prose) moved to verbatim on-demand extracts under the newshared/references/spine/directory, while the compact operational form of every rule — triggers, fail-closed conditions, banned-phrase lists, gate names, probe commands, status labels — stays in the always-loaded spine and is self-sufficient for compliance on its own. A newscripts/validate-agents-spine.py— now run on everypytest tests/viatests/test_agents_spine_validator.py, not only as a manual CLI — enforces the size cap (39,900, a guard band below the warning) plus a 63-token "lose-nothing" protection manifest. Honest scope: the manifest is a token-presence + size check that fails if a pinned enforcement token leaves the spine, a reference pointer dies, a discipline card is orphaned, or the cap is exceeded; it is NOT a semantic-binding check and would not by itself catch a rule reworded from MUST to optional while its pinned token stays present. The existing pack validators independently require key formula/terminology/verification phrases to remain in the installedAGENTS.mditself. Verified by a three-angle review loop (extracts confirmed superset copies — the spine compacts the prose, so they are not byte-identical; both pack validators 344/344 and 408/0; full suite green) which caught and forced restoration of ~10 enforceable specifics the first cut had over-compressed (e.g. thegit reset --hard"and nothing depends on them" recovery guard, the split-brain UI-observability clause, theAskUserQuestionverification mechanism). Why it matters: the always-loaded governance no longer degrades main-conversation performance, and new governance DETAIL now lands in extracts rather than re-growing the spine. Boundaries:shared/references/spine/extracts are English-only depth for maintainers and are not copied to install targets — the installed spine is self-sufficient, and the spine carries a single note pointing maintainers to the repo-source extracts rather than per-rule pointers that would be dead at install sites. -
Backported the
$bug-huntingcommon skill's two-sided "Rule 0" from the user's live global Claude install into the canonical pack source (bothsrc.claude/skills/bug-hunting/andsrc.codex/skills/bug-hunting/, kept byte-identical per the dual-shipping contract). Rule 0 now binds at BOTH ends of every fix: prove the CAUSE with a runtime log before editing (unchanged), AND prove the FIX is perfectly correct with a clean runtime log after editing — re-run with diagnostics still in, confirm the symptom is gone and no adjacent behavior regressed; "it should work now" / "this is the right fix" is never confirmation, and a fix asserted correct without a runtime log proving it is a Rule 0 violation as severe as patching on an unverified cause. Why it matters: the original Rule 0 only guarded the cause side, so a correctly-diagnosed bug could still ship an unverified fix and be called "done"; this closes the after-the-fix gap. -
Fixed the bugfix-discipline PreToolUse hook's long-session false positives (
check-bugfix-discipline.py+hook_common.py, both packs, kept byte-identical). Root cause (verified against a real transcript): the hook matched bug-trigger phrases against the most recentrole=usertranscript entry, but in Claude Codetool_resultblocks and harness injections (<system-reminder>,<task-notification>) are also recorded underrole=user— so in a tool-heavy turn the "last user message" was a tool_result or notification dense with trigger words (fix,Error:,broken,regression), firing the guard on legitimate edits (it false-fired ~15× in one session). Fix: a newlast_genuine_user_message()walks the transcript in reverse, skips entries that carry no human-typed text (pure tool_result / pure injection) and strips<system-reminder>/<task-notification>/<local-command-stdout>spans, so triggers match only what the human actually typed. This also fixes a latent false-NEGATIVE (a real bug report buried behind many tool_result entries was never seen). Separately fixed the override escape that never worked: the deny message tells the model to place[skip-bugfix-discipline]in its own reply, but the hook only scanned the user message for the marker — it now scans the assistant's current-turn text too. The marker (and the discipline signals) are matched against assistant-authored text ONLY, not tool output: 8+ tracked repo files literally contain the[skip-bugfix-discipline]string, so a broad scan would have let the guard disable itself whenever a session read one of those files in a bug-report turn. Also added real Codex transcript-shape support: Codex rollouts nest the message underpayloadwithinput_textblocks ({"type":"response_item","payload":{"type":"message","role":"user",...}}), which the shared helpers did not recognize — so the Codex copy of the hook had been inert on real transcripts;is_user_message/extract_text/extract_user_typed_textnow handle the payload shape (and correctly exclude Codexfunction_call_outputtool I/O). Covered bytests/test_bugfix_discipline_hook.py(17 cases incl. tool_result/system-reminder/task-notification pollution, the buried-bug-report false negative, the override-in-assistant-reply vs override-in-tool-output distinction, a discipline-signal-in-tool-output non-bypass, and the Codex payload shape). The improved hook is restored to live configs on the next pack install (it had been temporarily removed during implementation because the old version false-fired).
-
--tool-matcherparameter onscripts/install-hypothesis-hook.py. The structural-hook installer can now register a hook with a custom PreToolUse matcher (default unchanged:Edit|Write|NotebookEdit|apply_patch), so a hook can fire on a different tool set without disturbing the existing entries. Idempotency still keys on the per-hook--script-marker, so multiple hooks coexist as independent entries. This matters because it is the enabling mechanism for hooks that must watch tools beyond the code-mutating set. -
machine-local-pathPreToolUse audit hook (auto-installed, warn-only). Warns when a machine-local absolute path — a concrete user home (C:\Users\<name>) or workstation dev root (D:\dev\...) — is written into a non-.scratch/tracked file (themachine-local-path-provenancesmell), while allowing placeholders (<you>,%USERPROFILE%,${CLAUDE_PROJECT_DIR},$HOME, and common example tokens). It matches the edit's owntool_input(not session context, so it does not false-positive on unrelated discussion), emits a UTF-8 stderr warning, never blocks (AUDIT mode — promotion to a blockingdenyis a separate reviewed step once the false-positive rate is measured), and fails open. Per the source-hygiene rule the hook lives in the new typedagents/hooks/(Claude) /skills/lead/hooks/(Codex) directory;hook_common.pystays in the grandfatheredscripts/dir and is imported from there. Covered by the existing--no-hypothesis-hookopt-out. A companionno-trash-in-repoaudit hook ships alongside it (also auto-installed, warn-only, fail-open, in the same typedhooks/dir, registered with aBash-inclusive matcher so it sees thegit worktree addcommand). It is the stray-artifact guard: it warns when a Bash command confidently runsgit worktree add— the unrequested-worktree side effect — and stays silent ongit worktree list/remove/prune,git add(notgit worktree add),gitinside a quoted string, and non-git commands; the parser is shell-aware (shlex tokenization, command-position tracking across&&/;/|/(, env-assignment-prefix and git-global-option skipping) and fails open on any tokenizer error. Two earlier same-day iterations were discarded before this one: a first version flagged personal-workflow dir NAMES (dev/,notes/, …) and false-fired on ordinary structure, and a second narrowed the list tokosyaks/mistake-log— but name-matching was the wrong axis entirely, because those are the user's personal-process vocabulary, not names the agent (the actor a PreToolUse hook guards) ever creates, so the hook never fired on real agent behavior. The redesign (recommended by a Codexgpt-5.5xhighdesign consultation) keys on the OPERATION instead, which is what the user actually reported (the agent creating worktrees it was not asked for). Deferred: the ClaudeAgenttool'sisolation: "worktree"form (needs a captured PreToolUse envelope to confirm the field shape). Dropped as too FP-prone for a warn-only signal: outside-repo writes (a static allow-list floods on legitimate installs/temp/global-config/memory writes) and arbitrary in-repo trash (no reliable non-name signal — that stays governance). The filename and install-markercheck-no-trash-in-repoare retained for install continuity (a rename tocheck-stray-artifactis a tracked follow-up). Covered bytests/test_no_trash_hook.py(19 cases: everygit worktree addform incl.cd-chained /-C-prefixed / env-prefixed / subshell / absolute-git, plus the silent cases and fail-open). Why it matters: a warn-only AUDIT signal only earns its keep if it almost never false-fires AND actually fires on the real problem — the name-based versions could never fire on the reported behavior (they keyed on the wrong actor's vocabulary), while operation-based worktree detection catches the concrete unrequested side-effect with a shell-aware parser that under-warns rather than false-fires on exotic constructs.
- Added two governance rules to
shared/AGENTS.shared.md### Scope and ownership discipline: Directory-level entity separation (organize source directories around one primary entity type + ownership/lifecycle contract, classify by owner/lifecycle + I/O contract before placement, build output / caches /__pycache__/.scratch/exempt; grandfathered exception clause for documented co-located dirs) and Trash hygiene and archival (delete or archive obsolete files within the admitted change surface and subject toWorktree safety; archival path with breadcrumb when deletion loses recoverable history). Codex$architecture-reviewerdesigned the directory rule wording across a two-stage delegation chain: first critique of main conv's draft (numeric≥3 entity typesthreshold flagged as gameable, "surface convenience" wording flagged as undefined, examples list flagged as too repo-specific for shared governance), then counter-proposal with classification test (owner/lifecycle + I/O contract signal pair) replacing the numeric threshold. The trash hygiene clause came from user-stated principle (verbatim: "не оставлять мусор, хотя бы складывать его в архив"). Plan went through two review rounds (bkjese79w→ REVISE on missingshared/references/update + Claude-overlay-reachability + Worktree-safety guard;beztgful6→ REVISE-1 onshared/references/README.mdscope clarification because the new source-hygiene doc intentionally names pack/source paths which the existing README scope excluded; v3 applied each finding's required fix and landed text-only). New shared design narrative + grandfathered-exception text now lives atshared/references/repository-source-hygiene.md+ RU mirrorshared/references/ru/repository-source-hygiene.md. Both repo-overlay docs (AGENTS.mdCodex-maintainer overlay +CLAUDE.mdClaude-maintainer overlay) carry one-line pointers at the shared ref doc so future maintainers from either side see the rule + exceptions without text duplication. Three deliberately co-located directories (src.claude/agents/scripts/,src.codex/skills/lead/scripts/, repo-rootscripts/) are grandfathered exceptions for user-copy / install-script convenience: install scripts hardcode current source-tree paths and destination paths, hooks.json / settings.json point at current command paths, user docs reference current paths, and forcing a refactor on every existing operator install does not justify the per-entity-type filesystem cleanup gain since the existing naming convention (check-*/invoke-*/validate-*/install-*/ etc.) already provides per-entity-type discoverability. The exception is grandfathered, not extensible — future new entity types (new hook, new wrapper, new validator) MUST follow the shared rule (typed subdirectory or canonical home), not extend the legacy co-located dirs. Both rules land text-only: no repo refactor, no file moves, no installer or hooks.json changes, no Codex re-trust, no live install propagation. This matters because the repo previously had no governance rule for directory-level entity organization — the closest existing rules (Ownership / extension-seam hygiene,Change-surface minimization,Interface and encapsulation hygiene) all govern code-level / commit-level / role-level ownership but did not address how files cluster in filesystem directories; without a rule, future maintainers had no principled basis for "does this new script belong inscripts/or its ownscripts/<category>/?" and naturally defaulted to junk-drawer accretion.
- Rewired the hook from
git pushtrigger to pre-fix Edit/Write/apply_patch trigger; renamedcheck-hypothesis-disclosure→check-bugfix-discipline. The old hook was conceptually misplaced — it fired ongit pushto check whether the HEAD commit message containedVERIFIED:/ASSUMPTION (UNVERIFIED)markers. As the user pointed out repeatedly during the rewire conversation, git has nothing to do with hypothesis verification: hypothesis discipline lives in thinking before editing, which is upstream of any publication mechanism. By the timegit pushfires, any unverified-hypothesis edit has already happened and committed — the harm is done, the hook just adds late ceremony around a commit-message format check that's trivially bypassable (write a fakeVERIFIED:line or use a whitelisted commit type prefix). The previous design was theatre dressed up in "third structural enforcement layer" language. The new hook catches the actual failure mode: the model is about to invokeEdit/Write/NotebookEdit(or Codex'sapply_patch) in response to a user message that contains a bug-report or change-request signal (fix,change,broken,не работает,исправь,пофикси,поменяй, traceback,Error:, etc.), but did NOT first invoke/agents-bugfixor otherwise capture diagnostic data. The hook reads the PreToolUse envelope'stranscript_path, parses the recent JSONL transcript, and: if last user message contains no bug-trigger phrase → exit 0; if it contains the per-turn override marker[skip-bugfix-discipline]→ exit 0; if the current turn shows discipline signals (/agents-bugfixinvocation,agents-bugfixskill load, text containingdiagnostic/hypothesis/reproducing/VERIFIED:) → exit 0; otherwise emit a structuredpermissionDecision: "deny"payload that tells the model exactly how to comply (invoke skill, capture diagnostic data, or use override marker). The trigger phrase list is deliberately broad — it includes routine change-request verbs (fix,change,исправь,пофикси,поменяй) because those are the most common user requests on which the model jumps to first hypothesis without diagnostics; false positives on legitimate non-bug change requests (e.g. "fix this typo" — really a docs edit) are handled by the per-turn[skip-bugfix-discipline]override marker. Friction is by design. Files changed: newsrc.{codex,claude}/.../check-bugfix-discipline.{py,sh,ps1}(python brain + thin Bash/PowerShell wrappers); deleted oldcheck-hypothesis-disclosure.{sh,ps1}from both packs; updatedscripts/install-hypothesis-hook.pymatcher fromBash(withif: "Bash(git push *)"filter) toEdit|Write|NotebookEdit|apply_patchregex, SCRIPT_MARKER renamed;scripts/install-{claude,codex}.{sh,ps1}paths updated to new script name; Codex+Windows entry switched from brokenbash 'C:\...'(which silently resolved to WSL bash → "No such file or directory" → exit 1 on every Bash call) to nativepowershell.exe -NoProfile -ExecutionPolicy Bypass -File '<abs>\check-bugfix-discipline.ps1'; Bootstrap blocks insrc.claude/CLAUDE.mdandsrc.codex/AGENTS.codex.mdrewritten for the new trigger model;INSTALL.mdstructural-enforcement section updated; bothshared/references/evidence-based-answer-pipeline.mdand its Russian mirror rewritten — the three-layer pre-fix/pre-commit/pre-push audit chain became a two-layer pre-fix/pre-commit one (with explicit note that pre-push enforcement was the wrong design). Tests updated for the new entry shape (18 pass + 1 Windows-symlink skip). The--no-hypothesis-hookflag andORCHESTRARIUM_NO_HYPOTHESIS_HOOKenv var names retain the legacy prefix for back-compat with operators who scripted them. Operator action required after install: Codex marks the modified hook as untrusted (new script content hash + new command shape both trigger Codex's "modified since last trusted" gate), so users must re-trust the entry via the interactivecodexTUI hook browser once before the new hook fires; Claude Code has no analogous trust gate and fires immediately. This matters because the previous hook was actively harmful on Codex+Windows installs with WSL on PATH (every Bash tool call failed with "hook exited with code 1"), and the conceptual mistake of tying hypothesis discipline to git push meant the discipline never actually caught the failure mode it was supposed to (model jumping to Edit without thinking) — now it does.
-
Fail-open hardening of the
check-hypothesis-disclosure.shhook script in both packs after a real-world bug surfaced from a Codex CLI session: the user reportedPreToolUse hook (failed) — error: hook exited with code 1firing on every Bash invocation, blocking the whole session. Empirical reproduction this session confirmed the root cause — the canonical script'spython3 -c 'import json,sys;d=json.load(sys.stdin);print(...)'raisesJSONDecodeErrorwhen stdin is empty or non-JSON; combined withset -euo pipefailat the script top, the pipefail propagation triggers exit 1 before the script reaches its self-filter that would have allowed non-git pushBash calls to pass through. Claude Code's hook entry carries anif: "Bash(git push *)"permission-rule filter that limits firing to actual push attempts, so the bug never surfaced for Claude users — but Codex CLI's hook matcher is regex-on-tool-name only (no analogousif:filter), so the script fires on every Bash tool call in Codex, and any one with bad/empty stdin crashes the entire session's tool dispatch. The fix adds two defensive layers insrc.codex/skills/lead/scripts/check-hypothesis-disclosure.shandsrc.claude/agents/scripts/check-hypothesis-disclosure.sh(mirrored): (1) the inlinepython3 -ccall now wrapsjson.loads(sys.stdin.read())intry/except Exceptionso any parse failure setsd = {}instead of raising — python always exits 0; (2) shell-level2>/dev/nullswallows any python stderr and|| command_str=""belt-and-suspenders on the command-substitution falls back to empty string if python itself somehow errors (missing binary, PATH issue, OOM). Whencommand_strends up empty, the existinggit pushregex check below fails to match and the script exits 0 (the pass-through branch). The hook only ever blocks when a behaviour-changing commit lacksVERIFIED:/ASSUMPTION (UNVERIFIED)markers; any bad input gracefully degrades to no-op. The.ps1siblings of both scripts were audited and already handle bad stdin correctly ([string]::IsNullOrWhiteSpace($stdinText)early exit +try/catchonConvertFrom-Json); no.ps1change needed. Empirical verification this session: 5-case test (empty stdin viaecho "", completely empty via< /dev/null, malformed JSON, valid non-git-push envelope, valid git-push envelope) — all exit 0 on the patched script, where the unpatched script exited 1 on the first three. The capture-investigation that was supposed to record Codex's actual stdin envelope was blocked by Codex's trust-refusal of the temporarily-installed logging wrapper (Codex hashes file content for trust, not just command text); the fail-open fix is the right answer regardless of envelope shape because it gracefully handles every malformed/missing-input case. Operator action required after install: Codex CLI marks the modified hook script as untrusted (per Codex's"Modified since last trusted - review required"mechanism), so users must re-trust the entry via the Codex interactive TUI hook browser once before the fixed script fires; Claude Code's hook does not have an analogous trust gate. -
Repaired the
invoke-codex-prompt.ps1andinvoke-claude-prompt.ps1prompt-orchestration wrappers for default Windows hosts. The wrappers are the canonical external-CLI prompt-orchestration surface for the Claude pack on Windows, but were silently broken on the most common install topology (defaultpowershell.exehost launching the wrapper + npm/nvm4w-installed Codex or Claude CLI resolving to a.ps1shim) — a recent dispatch attempt this session that bypassed the wrappers and used inline& codexinstead surfaced the regression. Four compounding bugs identified empirically through sequential smoke testing and fixed in one coordinated change: (1)System.Diagnostics.ProcessStartInfo.ArgumentList(used to build the child argv) is a .NET Core 2.1+ / PowerShell 7+ API; on Windows PowerShell 5.1 (the defaultpowershell.exeinterpreter) the property isnull, and.Add(...)raisesInvokeMethodOnNull. (2)[System.Diagnostics.Process]::StartwithUseShellExecute = $falsecan only launch native.exebinaries; whenGet-Command codex/clauderesolves to a.ps1shim (the npm/nvm4w default),Startthrows "not a valid application for this OS platform". (3) With the wrapper's top-of-file$ErrorActionPreference = 'Stop'inherited into the native call, any byte the child process writes to stderr — including normal progress messages like Codex's "Reading prompt from stdin..." — becomes a fatalNativeCommandErrorthat kills the wrapper before$LASTEXITCODEcan be captured, leaving.out/.errfiles at 0 bytes. (4) PowerShell 5.1's1>/2>redirection writes UTF-16 LE with BOM viaOut-File's default Unicode encoding; PowerShell 7+ writes UTF-8 — cross-version inconsistency that produces UTF-16 captured artifacts on the default Windows install and breaks any downstream consumer that expects UTF-8 bytes. The coordinated fix in both wrappers replacesProcessStartInfo/[Process]::Startwith PowerShell's native call operator& $cmd @args(which handles.exe/.cmd/.ps1shim resolution on both PS 5.1 + PS 7+, mirroring the pattern already proven ininvoke-claude-api.ps1line 221), sets$OutputEncoding+ both Console encodings to UTF-8 no-BOM around the native call to fix the pipeline encoding, relaxes$ErrorActionPreferencetoContinuearound the native call only (restored infinally) so child stderr does not crash the script, and adds a post-process step that re-encodes the captured.out/.errfiles viaGet-Content -Raw+[System.IO.File]::WriteAllText(..., [UTF8Encoding]::new($false))so the artifacts are UTF-8 no-BOM regardless of which PowerShell host launched the wrapper. The stdin-write path was also switched fromSet-Content -Encoding UTF8(which adds a BOM on PS 5.1) to explicit[System.IO.File]::WriteAllText(..., [UTF8Encoding]::new($false))so a prompt body provided via piped stdin doesn't get a leading BOM that some CLIs interpret as content. End-to-end smoke verified on Windows:powershell -File invoke-codex-prompt.ps1against a realcodex.ps1shim returns exit 0 with the codex response in.outas plain UTF-8. The.shsiblings of both wrappers were not affected and required no change — they invoke through the POSIX shell which handles shim resolution and stream redirection at the shell layer. This matters because the wrapper IS the canonical contract for file-based external-CLI prompt delivery on the Claude pack, and a silent regression to non-functional means any operator dispatching an external review or worker through the documented pattern on Windows would have been forced into inline workarounds (as happened this session for the 6th hypothesis-disclosure-doc review) that lose the file-based prompt persistence and capture guarantees the wrapper exists to provide. -
Second-pass hardening of the
invoke-{codex,claude}-prompt.ps1wrappers addressing every finding from the full Codex security review + full Codex architecture review of the first-pass wrapper fix (commitff821d3above). Four distinct defect classes fixed in one coordinated patch to both.ps1wrappers, each with empirical reproduction recorded in.scratch/security-review-20260517/: (F1, MEDIUM)TopicSlugwas concatenated into captured filenames in$outputDirwithout validation; a caller-supplied..\..\tracked-leakresolved outside.scratch/codex-promptsandlegit:hiddenexposed NTFS Alternate Data Stream syntax — fixed by rejecting.., path separators (/,\), drive/ADS separator (:), Windows-invalid filename chars, NUL, and overlong (>64 char) slugs at the boundary. (F2, MEDIUM)--PromptFilevalidatedTest-Path -PathType Leaf(returns True for symlinks pointing at regular files) and thenCopy-Item -LiteralPathfollowed the symlink to the target — the reviewer pointed a symlink at aSECRET_PROBEfile and watched the wrapper persist and forward its contents to the external CLI. Fixed by checkingGet-Item -Forcefor theReparsePointattribute and refusing the input with a clear FAIL message naming the path. (F3, MEDIUM) prompt directory inherited parent ACLs (empiricallyAuthenticated Users Modify+Users ReadAndExecuteon.scratch), and timestamp-only filenames were predictable enough for a same-machine racer to anticipate captured paths and either pre-create symlinks at those paths or race-modify the captured content between native-call exit and the post-process re-encode loop — fixed by adding 8 hex chars of GUID entropy to the slug (-{8-hex}suffix appended after the timestamp) and by setting a restrictive ACL on the prompt directory when the wrapper just created it (disable inheritance, grant only the current user; skip ACL hardening if the directory already existed so we don't tighten a path the caller may share, and warn-but-proceed on non-Windows / sandboxed filesystems where the ACL operation legitimately fails). (A1, BLOCKING architecture finding) the first-pass fix usedGet-Content -Raw -LiteralPath $promptPathto reread the prompt body before piping to the CLI; on Windows PowerShell 5.1 this reads UTF-8 no-BOM files via the system ANSI codepage and corrupts non-ASCII text — empirically the arch reviewer sentПривет(codepoints 1055,1088,1080,1074,1077,1090) and the child saw garbled (1056,1119,1057,1026,1056,1105,...). Additionally the prior$ErrorActionPreference = 'Continue'scope coveredGet-Content(wrapper code, which should stay strict) not only the native call — fixed by reading the prompt body via[System.IO.File]::ReadAllText($promptPath, [System.Text.UTF8Encoding]::new($false))before relaxing the error preference, and piping the resulting variable into the native call. End-to-end smoke verified on Windows:powershell -File invoke-codex-prompt.ps1with a Cyrillic prompt (Ответь...ПРИВЕТ) returns exit 0 with.outcontaining exactlyПРИВЕТin plain UTF-8, no garbling. F1 also verified by 3 inline reject-tests (path traversal, ADS syntax, overlong slug — all exited 1 with the expected FAIL message). This matters because every operator who dispatches an external review or worker through the documented wrapper pattern on Windows now gets the same filesystem-boundary defences that the API-side wrapper has had for months, and non-ASCII prompt content reaches the CLI unmangled regardless of which PowerShell host launched the wrapper. -
Validator cap bumps to accommodate the just-landed common-skill description and Bootstrap "two trigger moments" extension — three caps in
src.codex/skills/lead/scripts/validate-skill-pack.shbumped with documented rationale per the existing "visible decision rather than silent budget growth" convention. (1)CODEX_SKILL_DESCRIPTION_MAX_CHARSbumped180 -> 440for the$mathtype-book-pagelong-form 10-trigger description adopted from active downstream use (416 chars actual; the prior 180-char cap was a pre-common-skills UX guideline); future common skills should still aim for ≤180 chars where the trigger surface is narrow, and this cap is the wide-trigger allowance rather than a license for verbose role-style descriptions. (2)CODEX_SKILL_DESCRIPTION_TOTAL_MAX_CHARSbumped3500 -> 3700to fit the same long-form description in the total budget (3595 chars actual; headroom ~105 chars for one future common skill at modest length). (3) Codex AGENTS.md pack section line budget bumped360 -> 380to accommodate the Bootstrap restructuring from one trigger moment to two (366 lines actual; +14 ceiling preserves headroom for the next governance addition). Each bump comment names the predecessor cap, the version, the driving change, and the expected headroom for next addition so future bisects can reconstruct the policy intent without spelunking commit messages. Pack validator now passes 406 / 0 / 0 against the merged Codex AGENTS.md and skill metadata.
-
Added the passive-polling
Stophook beside the existing bugfix-disciplinePreToolUsehook. The newcheck-passive-polling-stophook closes the user-reported failure mode where the assistant ends a turn withжду ответа бота (3-5 мин)or equivalent passive async-wait language without checking current time, bot/review/CI/job status, task output, or captured logs. The implementation mirrors the existing hook family shape instead of building a monolith: shared envelope/transcript helpers now live inhook_common.py, whilecheck-bugfix-discipline.pyandcheck-passive-polling-stop.pyremain separate rule brains because their hook contracts differ (permissionDecision: denyforPreToolUse, top-leveldecision: blockforStop). The Stop hook readslast_assistant_messagefrom the Stop envelope, exits onstop_hook_active=true, uses two-tier phrase detection (strong phrases likeжду ответа бота,waiting for review,CI pending,3-5 мин; weak phrases likewaiting fororждуonly when paired with an async noun or time estimate), exempts user handoffs such aswaiting for your response/жду твоего подтверждения, and supports the per-stop[acknowledge-passive-stop]override. To prevent no-op bypasses, it accepts only relevant current-turn probes (date,Get-Date,gh pr view,gh run list,gh api,curl, process/task output, output/log/task reads) and rejects irrelevant calls such asBash: true. Installers now write both hook entries into Claudesettings.jsonand Codexhooks.json; the shared installer gained--hook-eventand--script-markersoPreToolUseandStopentries are idempotent and independently removable. The Codex AGENTS.md validator cap was bumped380 -> 420with an in-script rationale because the installed platform section now documents two hook events, two override markers, independent removal commands, and the Stop JSON shape. Codex users must trust both entries in the interactive TUI after install or upgrade; Claude fires them immediately. This matters because the earlier text rule and memory feedback did not stop the passive-wait behavior, and the structural guard now fires exactly at turn termination, where the failure occurs. -
Added
gpt-5.3-codex-sparkas a 4th allowed value forexternalCodexProfileinshared/agents-mode.schema.json(wasdefault | gpt-5.5-fast | gpt-5.5-xhigh, now adds the spark variant). The shipped default staysgpt-5.5-xhighand no preset is rewired to spark — this addition is purely the allowed-value list growing so operators can opt in by settingexternalCodexProfile: gpt-5.3-codex-sparkin their~/.codex/.agents-mode.yamlor~/.claude/.agents-mode.yaml. Spark is a supplement, not a replacement for thegpt-5.5-*profiles — the user-stated framing was explicitly "это не замена gpt 5.5, дополнение для экономии и скорости", parallel to "sonnet не замена opus, а для скорости и экономии". Use case: very routine non-thinking mechanical tasks where mainconv (or an external-worker invocation) wants to save the main-tier token budget — spark has separate rate limits from thegpt-5.5family, so calling it doesn't burn the main-tier quota. Mandatory constraint (user emphasized as load-bearing): spark output requires verification by a heavier tier before any commit/push, because spark trades depth for speed; treat its output as "implemented, not yet verified" by default. Theshared/agents-mode.defaults.yamlcomment was updated to mention the new value and the supplement framing; both the English (shared/references/subagent-operating-model.md) and Russian (shared/references/ru/subagent-operating-model.md) operating-model references gained the same wording in theirexternalCodexProfileenumeration paragraphs. The validator atsrc.codex/skills/lead/scripts/validate-skill-pack.shwas not bumped (the existing checks pass against the additional enum value). This matters because mainconv routing decisions now have an explicit allowed-value path for "this is purely mechanical and maximally clear" tasks — previously the only knob belowgpt-5.5-xhighwasgpt-5.5-fast(same family, different reasoning effort), which still burns the maingpt-5.5rate budget. -
Merged 73 net lines of workflow improvements into the
$mathtype-book-pagecommon skill from active downstream use, propagating refinements back to the canonical pack source so all adopters benefit on next install. Both pack copies (src.codex/skills/mathtype-book-page/SKILL.mdandsrc.claude/skills/mathtype-book-page/SKILL.md) updated identically per the common-skill dual-shipping contract. Additions: an expanded skill description with 10 specific trigger contexts (improves Skill-tool description matching); a new "Typography Normalization Against Exemplar (LOAD-BEARING)" section documenting the 5-step procedure for reconciling chunk-levelword/styles.xmlagainst an accepted exemplar when an OMML→DOCX pipeline emits non-book defaults (Arial 10.5pt body instead of TNR 11pt, oversized headings, missing pandoc-specific styles); a new "Heading Case Normalization (Russian sentence case)" section documenting the 6-step procedure for converting ALL CAPS Russian headings to production-book sentence case with multi-run text-run merge/redistribute and<w:rPr>preservation; 8 new defect-pattern table rows under "Common defect patterns" (body font/size drift vs exemplar, body page label banners, ALL CAPS section headings, multi-number formula cell concatenation, multi-row OLE display untouched after multi-number split, stale manifest status, skipping no-Word cleanup sweep before promote, render gate skipped after no-Word edits); 5 new closing checklist items 20-24 (exemplar styles.xml audit, heading-case audit, body-flow page label audit, manifest gate-state honesty audit, per-chunk skill review record audit); and a new "Defect Class Index" section enumerating defect classes A-M (12 classes at initial merge; later extended to 13 with Class N — Code listing provenance pointer — as the operational form of the newResults-table provenance discipline) with surface description and repair lane for each. Per the "No mechanical application" rule inCLAUDE.md, four downstream-project-specific references (empirical citation date, manifest status enum, internal doc path, RF-engineering acronym example list) were generalized for canonical pack consumption so other projects can adopt the procedures without inheriting one project's specific schema. This matters because the skill was actively used in a translation project and accumulated empirical procedure improvements that other projects adopting the skill should inherit — leaving the workflow improvements only in a downstream user's local~/.codex/skills/directory would mean every subsequent adopter would have to rediscover the same defect patterns and re-derive the same procedures. -
Extended the
$mathtype-book-pagecommon skill with a new "Listing Provenance Pointer (Class N)" section that operationalizes the just-landedResults-table provenance disciplinefor the "program listing" surface, plus a matching Class N row in the Defect Class Index and a new checklist item 25. Program listings printed inside a translated technical-book chunk are OCR-cleaned samples from the source book; the verified, compilation-tested source typically lives separately in the project's verified-source root. A chunk that prints or narratively references a program without a provenance pointer is now classified as a Class N defect — a future reader cannot reproduce the run or distinguish the printed code from a cross-validated translation. The new section documents the 3-line provenance block (Verified source: <path> (<compile invocation>)+Input artifact: <path>+Output: <path> (<one-line numeric summary>)), the cross-language cross-validation form (when a chapter has e.g. an original Turbo Pascal and a Fortran restoration, both source paths + compile invocations + cross-validation row count + max absolute difference belong in the block), the find heuristic (grepword/document.xmlfor monospace run properties + narrative program identifiers + named convention markersЛистинг N/Program N/Listing N), and the skip rule (numerical data tables or trivial illustrative fragments without verified-code counterparts). Project-specific references in the upstream Itoh2 version of the section (chapter-to-path coverage map for one RF-engineering book, specific compile invocations, internal manifest paths) were generalized for canonical-pack consumption per the "No mechanical application" rule. Both pack copies updated identically per the common-skill dual-shipping invariant. -
Updated the canonical evidence-pipeline reference at
shared/references/evidence-based-answer-pipeline.md(plus its Russian mirror atshared/references/ru/evidence-based-answer-pipeline.md) to reflect the newPre-fix diagnostic gateandResults-table provenance disciplinerules added in commit5634d61, addressing the architecture reviewer's mandatory finding that governance/protocol/gate/routing semantic changes MUST updateshared/references/perAGENTS.md:18. The "Relevance to our governance" section now lists Pre-fix diagnostic gate, Hypothesis disclosure discipline, Evidence-citation discipline, Results-table provenance discipline, and Visual artifact verification discipline alongside the prior Ambiguity resolution / Evidence-based completion / Failure transparency / Treat-external-content-as-untrusted entries. A new closing paragraph names the three structural enforcement points for code-bearing work — pre-fix (Bootstrap steps 1-3 before firstEdit/Writein bug-report context), pre-commit (all 5 Bootstrap steps before commit), and pre-push (auto-installedcheck-hypothesis-disclosurehook ongit push) — and explains the audit chain: wasted edit cycles caught at pre-fix, wasted commits caught at pre-commit, wasted shared-state writes caught at pre-push. This matters because the shared reference doc is the canonical methodology source-of-truth for the evidence-citation discipline that both packs and any external consumer would read to understand "how does Orchestrarium operationalize verify-before-claim"; without this update, that doc would have silently lagged the rule catalogue. -
Added two new globally-binding governance rules to
shared/AGENTS.shared.mdand extended the per-pack Bootstrap blocks insrc.claude/CLAUDE.mdandsrc.codex/AGENTS.codex.mdto fire at an additional pre-fix trigger moment in addition to the existing pre-commit trigger. New rule 1:Results-table provenance discipline— every table of computed results in repository documentation, reports, scientific docs, generated output, or release artifacts must cite, near the table, the provenance triad of (a) the formula/model/named procedure that produced the numbers, (b) the code/script/notebook path that ran the computation (withfile:lineor section anchor when the computation is one of many in the file), and (c) the input artifacts the computation consumed (file paths, dataset versions, parameter values, fixed seeds, environment markers). Fires only for computed values (any numeric output of script/formula/simulation/fit/regression/model/calibration/benchmark), not for source-data tables copied from external datasets which only need source citation. The discipline closes the failure mode where numbers exist in documentation but the computation provenance is lost, so reviewers cannot independently audit, reproduce, or distinguish the values from fabrication. For one-shot computations not preserved as a script, name the procedure inline and either commit a minimal reproduction script or label the cellsASSUMPTION (UNVERIFIED — one-shot computation not preserved). New rule 2:Pre-fix diagnostic gate— when a session encounters a bug report, runtime failure, error trace, regression, "не работает" claim, or any user message naming a defect in behavior, steps 1-3 of the existingAmbiguity resolution discipline+Hypothesis disclosure discipline(capture concrete observable data → form a hypothesis chain → verify each link or labelASSUMPTION (UNVERIFIED)) must complete before the firstEdit/Write/NotebookEdit/ file-write /apply_patchtool call. The trigger is the start-of-fix-attempt moment, not the commit moment — the existing Bootstrap fires at commit time and the auto-installed hook fires at push time, but neither catches mainconv between "user reported a bug" and "main conv started editing files on a guess about what the bug is". This closes the recurring failure pattern (recorded in user memory asfeedback_pre_fix_disciplineand surfaced this session as "main conv keeps fixing without confirming hypotheses outside/agents-bugfixflow") where the discipline existed in the rule catalog but wasn't catching the session reliably at the moment that actually matters. Bootstrap structural update: both per-pack Bootstrap blocks now have a "two trigger moments" preamble explicitly naming (a) pre-fix (steps 1-3 before any code-mutating tool call in a bug-report context, regardless of whether/agents-bugfixwas invoked) and (b) pre-commit (all 5 steps), and the violation-triggers list grew a new "Pre-fix triggers" sub-block listing specific shortcut patterns ("I see the bug, let me edit X" without symptom citation, "the fix is to add Y" without verified hypothesis, starting an Edit/Write in a bug-report context with no diagnostic data captured). The combined effect is that the same hypothesis-disclosure discipline is now auditable at every level of action irreversibility: pre-fix (in-session guard against wasted edit cycles), pre-commit (Bootstrap, guards against wasted commits), and pre-push (the auto-installed hook, guards against wasted shared-state writes). This matters because the discipline already existed structurally — what was missing was the structural enforcement at the moment when an unverified hypothesis most often produces wasted work, which is the gap between "user described a bug" and "main conv started editing files in response".
- Reframed the
externalCodexProfile: gpt-5.5-fastdocumentation across both packs, both example provider trees, the shared docs (docs/agents-mode-reference.md,docs/external-worker-design.md), and the public-facingREADME.mdandINSTALL.mdto make explicit that "fast" is a model-tier choice (the underlying Codex model variant), not an effort downgrade —model_reasoning_effortstaysxhighon this profile. Earlier wording such as "explicitly requests a Codex fast profile" or "speed override" was ambiguous and could be read as implying thatgpt-5.5-fastswapped reasoning effort belowxhigh; the corrected wording now states "selects the fast Codex model tier (model variant only — reasoning_effort still staysxhigh, this is not an effort downgrade)" at every documented occurrence. This matters because operators who pickedgpt-5.5-fastfor themax-speedpreset or via explicit override were getting two independent properties they could not previously distinguish from the docs alone — a faster model variant and the samexhighreasoning effort as the best-effort profile — and the old wording made the trade-off look worse than it actually is, biasing operators towarddefaultorgpt-5.5-xhighwhengpt-5.5-fastwould have been the better choice for their lane.
-
Refined the Codex+Windows skip implementation per the fourth architecture re-review's two REVISE findings. (1) Windows skip is now BEFORE Python resolution in
scripts/install-codex.sh— previously the case statement was inside theelsebranch ofif [ ! -f "$hook_installer" ], so a Windows host without Python would hard-fail with "FAIL: python or python3 is required" instead of cleanly skipping. The case statement moved up to the outer block so the MINGW/MSYS/CYGWIN branch is evaluated first; the POSIX branch keeps the Python check. (2) Skip messages and docs are now path-aware — the previous WARN messages always pointed at~/.codex/skills/...and~/.codex/hooks.json(global paths), incorrect for--targetproject-local installs which place skills under<project>/.agents/skills/and hooks.json under<project>/.codex/hooks.json. Bash installer now interpolates$TARGETand$AGENTS_ROOTinto the message and switches phrasing based on$MODE; PowerShell installer uses$TargetRootand$AgentsRootsimilarly.src.codex/AGENTS.codex.mddocuments both global and project-local paths explicitly. New regression testtest_codex_windows_skip_does_not_load_target_fileproves the helper exits BEFORE any filesystem I/O when given Codex+Windows + a deliberately bogus target path that would fail any read or write attempt. Tests now 17 pass + 1 skip. -
Reverted the previous "skip Codex hook auto-install on Windows" defensive design and now auto-install the Codex hook entry on all platforms, including Windows. The earlier skip was an over-correction: empirical testing showed that the real blocker is not Windows-specific shell ambiguity but Codex's universal hook trust requirement (Codex marks every newly-installed or modified hook as "untrusted" until the user explicitly trusts it via the interactive
codexTUI, on both POSIX and Windows). The installer cannot satisfy this requirement programmatically because Codex does not currently expose a trust API — only managed-hook bundles deployed viarequirements.tomlwithmanaged_dir/windows_managed_dirbypass the trust prompt, and that is out of scope for a pack-level installer. With the trust step explicitly documented as the user's responsibility, the installer's job becomes "write the hook entry and the script"; the user runscodexonce and trusts the entry via the on-screen hook browser. Codex+Windows now uses the samebash <quoted_sh_path>shell form as Codex+POSIX (Codex hook schema does not supportargsexec form orshellfield perhttps://developers.openai.com/codex/hooks, so shell form is the only option). The Windows form assumes Codex's hook interpreter can locatebashon PATH — typical when Git Bash is installed alongside Codex, but if a user's runtime uses a different shell they can edit the persistedhooks.jsonentry. The marker-based merge is for entry identification, not edit preservation: the installer's idempotent re-install keys on thecheck-hypothesis-disclosuresubstring marker only to find the pack's entry across upgrades and to prevent duplicate entries from accumulating; on every reinstall it then normalizes the matched entry back to the canonicalbash <script.sh>form, so a hand-edited command shape will not survive the next install. To keep a manual override, users must pass--no-hypothesis-hook(PowerShell-NoHypothesisHook) on every future install or setORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1in the environment so the installer skips the hook step entirely. The opt-out env var does not block--remove, so legacy cleanup remains possible while a standing opt-out is in effect. End-to-end smoke test on Windows confirmedpowershell -File scripts/install-codex.ps1 -Globalnow writes~/.codex/hooks.jsonwith the bash shell-form entry. Tests grew to 17 (16 pass + 1 Windows-symlink skip);test_codex_windows_is_skipped_with_warnreplaced bytest_codex_windows_writes_entry_with_bash_formandtest_codex_windows_creates_parent_directoryto lock in the new behavior.src.codex/AGENTS.codex.mdandINSTALL.mddocument the manual trust step prominently so users are not surprised by the silent-skip-until-trusted behavior of Codex hooks. Empirical investigation evidence: Codex 0.130.0 binary strings include "New hook - review required", "Modified since last trusted - review required", "Trust to view hooks; to trust; to toggle", "Managed hooks are always on", confirming the trust mechanism applies to all platforms; a probe hook configured at~/.codex/hooks.jsonwith full bash form was empirically verified to load but not fire fromcodex execbecause of the untrusted state. -
Closed the Codex-Windows residual finding from the third architecture re-review by skipping the Codex hook auto-install on Windows hosts entirely with an explicit WARN. The third re-review correctly identified that the Codex Windows path was still hypothesis-bearing: Codex hooks docs (
https://developers.openai.com/codex/hooks) do not document Windows default shell selection, do not supportargsexec form, and do not support ashellfield. The previous attempt to emitpowershell -NoProfile -ExecutionPolicy Bypass -File <shlex.quote(path)>was unverifiable on multiple axes (POSIXshlex.quoteis not PowerShell-equivalent quoting; Git Bash POSIX path namespace/c/Users/...is not invocable bypowershell -File). Applied the no-kostyl rule to my own work: rather than silently emitting an unverified command string, the helper now refuses the Codex+Windows lane with a WARN message explaining the limitation. The Claude pack's auto-install on Windows is unaffected and uses the verified native PowerShell exec form. Implementation:install-hypothesis-hook.pyexits early with SKIP+WARN when--platform codex --host-os windows(unless--remove, which still works so users can clean up legacy entries); bothinstall-codex.{sh,ps1}skip the helper call on Windows hosts with the same explanatory message. The hook script itself is still installed to~/.codex/skills/lead/scripts/so a user who has verified their Codex Windows shell behavior can manually configure~/.codex/hooks.json. New regression tests:test_codex_windows_is_skipped_with_warn(verifies no file write + WARN on stderr) andtest_codex_windows_remove_still_works(legacy entry cleanup remains possible).src.codex/AGENTS.codex.mddocuments the Windows limitation explicitly. Tests grew to 17 (16 pass + 1 Windows-symlink skip by design). End-to-end smoke test on Windows confirmedbash scripts/install-codex.sh --globalnow prints the SKIP message and does NOT modify~/.codex/hooks.json. -
Closed the remaining architecture-review finding on commit
cd09cdb's Windows portability by adopting Claude Code's documented exec form for the hypothesis-disclosure hook entry — native cross-platform invocation with no Git Bash dependency on Windows.scripts/install-hypothesis-hook.pynow accepts a--host-os {posix,windows}flag and emits per-platform: Claude+POSIX uses exec formcommand: "bash", args: ["<sh_path>"]; Claude+Windows uses native PowerShell exec formcommand: "powershell", args: ["-NoProfile", "-ExecutionPolicy", "Bypass", "-File", "<ps1_path>"]; Codex stays on shell form (Codex hook schema perhttps://developers.openai.com/codex/hooksdoes not supportargsorshellselection) but now also emits the powershell command string on Windows hosts. Exec form is documented as "preferred for portable cross-platform hooks" by Claude per hooks reference and eliminates shell-quoting concerns entirely (eachargselement passes through as a literal argument with no shell interpretation). All four installers (scripts/install-{claude,codex}.{sh,ps1}) now detect host OS: Bash variants checkuname -sforMINGW*/MSYS*/CYGWIN*to enable the Windows path; PowerShell variants always pass--host-os windowsand reference the.ps1script. The marker-search logic in the merge script was extended to recognize the marker substringcheck-hypothesis-disclosurein bothcommand(legacy/Codex shell form) andargs[k](Claude exec form), so legacy entries from earlier installs are automatically collapsed into the new exec form on next reinstall without leaving stale duplicates. Tests grew from 12 → 16 (15 pass + 1 Windows-symlink skip): addedtest_install_claude_windows_powershell_exec_form,test_install_codex_posix_shell_form,test_install_codex_windows_powershell_shell_form, and split the command-injection test into separate Claude-exec-form and Codex-shell-form variants. End-to-end smoke test on Windows under Git Bash confirmed the real~/.claude/settings.jsonnow contains the PowerShell exec form. The architecture re-review ofcd09cdb("Leaving native shell:powershell/args for later is not acceptable for this gate") is now fully closed; the security re-review was already PASS. -
Hardened
scripts/install-hypothesis-hook.pyand the installer scripts against findings from Codex code review and full Codex security review of commit79aa5eb. Seven distinct findings addressed in one cohesive follow-up commit: (1) HIGH — command injection via unquoted--script-path: empirically demonstrated by the security reviewer with--script-path '/tmp/safe path; echo PWNED'producing"command": "bash /tmp/safe path; echo PWNED". Now usesshlex.quote()on the script path before embedding into the JSONcommandfield, defending against paths containing spaces, semicolons,$(...), quotes, or backslashes. (2) HIGH — silent fail when Python is missing: installers now fail closed withFAIL: python or python3 is required...instead of silently skipping the hook install (leaving the user thinking the hook is active). PowerShell variants check$LASTEXITCODEafter the merge-script call and propagate non-zero exits. (3) MEDIUM — duplicate marker entries not cleaned:find_our_entry_indices()now returns ALL marker matches;install()collapses duplicates to a single entry,--removedeletes every match. (4) MEDIUM — opt-out env var blocked--remove:ORCHESTRARIUM_NO_HYPOTHESIS_HOOKcheck now bypasses--remove, so a standing opt-out can still uninstall an existing hook entry. (5) MEDIUM — no regression tests for the merge logic: addedtests/test_install_hypothesis_hook.pywith 12 tests covering empty-install, idempotent reinstall, user-key preservation, duplicate handling, opt-out semantics (block install / allow remove), command-injection defense, malformed-but-valid JSON ({"hooks": {"PreToolUse": [{"hooks": 5}]}}), symlink-target refusal, and invalid-JSON fail-closed. 11/12 pass on Windows (symlink test requires elevated privileges and is skipped by design on Windows). (6) LOW — symlink/TOCTOU clobber risk: writer now refuses to write through a symlinked target (Path.is_symlink()check) and uses atomic write viatempfile.mkstemp()+os.replace()in the same directory. (7) LOW — TypeError on malformedentry["hooks"]: per-entryhooksfield is now validated as a list before iteration; non-list values are skipped gracefully instead of crashing the merge helper. Architecture review's structural recommendation (useshell: "powershell"+argsfor Windows-native invocation, per the documented Claude Code hooks reference) is acknowledged as a future improvement; the current bash form works on macOS, Linux, and Windows under Git Bash, and the shlex-quoting closes the actual injection vector. -
Auto-installed the hypothesis-disclosure PreToolUse hook in all four installer scripts (
scripts/install-{claude,codex}.{sh,ps1}), reversing the previous "opt-in only" design that the user correctly identified as kostyl (avoidance of the settings.json merge problem rather than a real fix). New helperscripts/install-hypothesis-hook.pyperforms idempotent JSON-merge: reads existingsettings.json(Claude) orhooks.json(Codex), finds our entry bycheck-hypothesis-disclosuresubstring marker in thecommandfield, inserts or updates in place, and preserves all other user keys and other hooks. Five smoke tests verified the merge logic: install into empty file produces correct structure; re-install is no-op; user keys (env, attribution, permissions, enabledPlugins, other hook lanes likeStop) are preserved on insert;--removeflag cleans only our entry;ORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1env var blocks the install entirely. Installers added a--no-hypothesis-hookCLI flag (-NoHypothesisHookon PowerShell) for explicit opt-out at install time; the env var is the runtime opt-out for users who want the installer to skip the step on every invocation. The hook is installed in both--globalmode (~/.claude/settings.json,~/.codex/hooks.json) and--targetproject mode (<project>/.claude/settings.json,<project>/.codex/hooks.json). End-to-end test against a real settings.json with pre-existing user keys (model, theme, permissions with custom Bash allow rules, enabledPlugins block) confirmed user state is preserved byte-for-byte except for the appended hook entry. Gemini and Qwen example packs do not get the hook because their runtimes don't expose comparable hook surfaces; the Bootstrap text rule remains binding on all platforms regardless. CLAUDE.md, AGENTS.codex.md, and INSTALL.md docs updated from "opt-in" framing to "auto-installed by default, opt out with --no-hypothesis-hook or ORCHESTRARIUM_NO_HYPOTHESIS_HOOK=1". This matters because the previous "opt-in" framing was a documentation kostyl that hid the design avoidance — the installer's job is to make the pack work out of the box, not ship a script in a passive directory and ask the user to wire it up themselves. -
Extended the Hypothesis disclosure discipline with a "Fix means correct logic, not workaround (no kostyl)" clause that defines what counts as a real fix versus what counts as a workaround that hides the symptom. A "fix" must name the root cause (a specific function, contract, invariant, or boundary that produced the wrong behavior) and correct it; a workaround silences a visible failure mode without changing the underlying logic. Explicit examples of kostyl enumerated in the rule: catch-and-swallow error handling that suppresses a diagnostic signal, defensive checks added without understanding why the original input was malformed, type assertions that silence the type contract, fallback values that mask missing-data bugs, renaming a misleading identifier without correcting the logic it labels, adding a retry loop around a non-idempotent operation, hardcoding a value to dodge a configuration-resolution bug,
try/except: passaround recurring failures, and log filtering to hide real errors. Two operational tests apply before commit: (1) root-cause naming — the commit message must name the root cause in concrete terms, not just describe the suppressed symptom; (2) symptom-only suppression detection — if the fix removes a visible failure mode without changing the underlying logic and there is no separate verified hypothesis that the suppression itself is the correct behavior (legitimate boundary fallback, intentional graceful degradation per a documented contract), it is a kostyl. Kostyl is permitted only as an explicitly labeledWORKAROUNDcommit that (a) names the root cause separately, (b) states scope and lifetime, and (c) discloses that the underlying defect is unfixed. The Bootstrap blocks insrc.claude/CLAUDE.mdandsrc.codex/AGENTS.codex.mdcarry a new step 4.5 ("no-kostyl check") that pairs with step 4's scope proportionality check, so the operational checklist now reads: diagnostic data → hypothesis inventory → verify-each-or-label → scope proportionality → no-kostyl check → recovery readiness. Codex AGENTS.md pack section line budget insrc.codex/skills/lead/scripts/validate-skill-pack.shwas bumped 340 → 360 to accommodate the new step (~2 lines actual, ~18 lines headroom). This matters because the existingBug-fix scopeandFailure transparencyrules covered some of this implicitly, but there was no explicit anti-kostyl clause that named the failure pattern; the user's complaint that "main conv keeps making fixes by hypothesis without diagnostics" and that "fix needs to mean implementation of correct logic, not workaround/hiding" required the rule to make the test operational and the violation pattern named. -
Extended shared cross-pack global
agents-mode.yamlsymmetry to all four pack installers and all read-order documentation surfaces. Gemini and Qwen installers (scripts/install-gemini.{sh,ps1},scripts/install-qwen.{sh,ps1}) now also create/normalize~/.agents-mode.yamlduring--globalinstall, matching the Claude and Codex behavior. 4-way idempotency verified empirically: SHA-256 of~/.agents-mode.yamlremained identical across sequential--globalinstalls from all four packs (Claude → Codex → Gemini → Qwen). 13 documentation surfaces insrc.gemini/skills/andsrc.qwen/skills/(consultant, external-brigade, external-reviewer, external-worker, init-project, lead/external-dispatch, second-opinion in each pack) updated to the same per-key read-order wording as the production-line packs. A repo-widegrep "fall back to global"now returns zero remaining instances, confirming consistent governance across all four packs. This matters because the previous commit (0f749db) was production-line-only; example-pack documentation drift would have meant a user who readssrc.gemini/skills/consultant/SKILL.mdwould see whole-file fallback while everyone else sees per-key composition — exactly the kind of governance contradiction the rule "do not let one surface silently redefine thin upstream artifacts" exists to prevent. -
Added a shared cross-pack global
agents-mode.yamllayer at~/.agents-mode.yaml(alongside~/.claude.json), serving as the single source of truth shared between Claude Code and Codex CLI. The new layer fits into the config-resolution stack with per-key precedence: project-local.agents-mode.yaml> pack-local global~/.claude/.agents-mode.yamlor~/.codex/.agents-mode.yaml> shared cross-pack global~/.agents-mode.yaml> built-in defaults. Each key resolves to the highest layer that defines it; layers compose rather than replacing each other wholesale. All four installers (scripts/install-claude.{sh,ps1}andscripts/install-codex.{sh,ps1}) now callsync_agents_mode_filefor the shared global target during--globalinstall only; project-target installs are unaffected. Behavior is normalize-not-overwrite: the file is created on first install if absent, and re-normalized to current canonical format on subsequent installs while preserving user-customized values. Cross-pack idempotency was verified empirically — Claude install creates the file, Codex install on the same file produces SHA-256-identical content (byte-for-byte). Read-order documentation updated insrc.claude/CLAUDE.md,src.codex/AGENTS.codex.md, bothexternal-dispatch.mdcontracts,src.codex/skills/lead/subagent-contracts.md,INSTALL.md, anddocs/agents-mode-reference.md. This matters because pack-local globals (~/.claude/.agents-mode.yamland~/.codex/.agents-mode.yaml) could previously drift between each other (each installer wrote independently), and there was no place to express cross-pack-shared operator preferences without duplicating them. The shared layer becomes the single point of truth for common settings; pack-local globals stay as platform-specific overrides where needed. -
Corrected two findings in commit
58b9273's structural-enforcement design. (A) The Claude hook config example used a relative path (bash .claude/agents/scripts/check-hypothesis-disclosure.sh) which, per the official Claude Code hooks docs, is unreliable because the hook runs with the session's current working directory, not the directory ofsettings.json. Recommended forms now use$HOME/...for global hooks and${CLAUDE_PROJECT_DIR}/...for project-local hooks. (B) The previous note insrc.codex/AGENTS.codex.mdclaimed "Codex CLI does not currently expose an analogous hook surface" — this was a hypothesis-driven kosyak. Direct WebFetch of https://developers.openai.com/codex/hooks verified that Codex CLI exposes aPreToolUsehook surface (stable featurehooks, default-on) with the same JSON envelope (stdin) + structured deny (hookSpecificOutput.permissionDecision = "deny"or exit code 2 + stderr) protocol as Claude Code. The Codex pack now ships the same hook script atsrc.codex/skills/lead/scripts/check-hypothesis-disclosure.{sh,ps1}(smoke-tested against Codex envelope shape — same JSON shape so the existing script works), andsrc.codex/AGENTS.codex.mdcarries the corrected hook config example using~/.codex/hooks.json(Codex matcher is regex on tool name only, so the script self-filters by parsingtool_input.command).INSTALL.mdwas updated to cover both packs symmetrically. -
Added an opt-in structural enforcement hook for the
Hypothesis disclosure disciplinerule on Claude Code clients. New scriptssrc.claude/agents/scripts/check-hypothesis-disclosure.shand.ps1machine-check the HEAD commit message beforegit push: behavior-changing commit types (feat/fix/refactor) must carry aVERIFIED:orASSUMPTION (UNVERIFIED)marker in the body, while whitelisted types (docs/chore/style/merge/ci/build/perf/test/revert) pass through unchecked. The hook is wired as a Claude CodePreToolUsehook with theif: "Bash(git push *)"permission-rule filter so it only fires on actual push attempts. It denies with a structured JSON payload (hookSpecificOutput.permissionDecision = "deny"+ actionable reason) that surfaces back to Claude and the user. The pack ships the script but does not modify.claude/settings.json; opt-in instructions are documented insrc.claude/CLAUDE.md(section "Optional structural enforcement") andINSTALL.md. Cross-platform parity: both the Bash and PowerShell scripts were smoke-tested against the four positive/negative cases (HEAD with disclosure → allow; non-push command → allow; behavior-changing commit without disclosure → deny with reason; whitelisted-type commit → allow; behavior-changing commit with disclosure → allow). Codex CLI does not currently expose an analogous shell-tool hook surface; Codex-side structural enforcement is a follow-up when the runtime exposes a hookable layer. The text rule and Bootstrap block remain binding on both platforms regardless of hook installation — the hook is a structural backstop for sessions where the text alone is insufficient, which is exactly the residual risk the QA review ofe36fb67flagged. -
Introduced the Hypothesis disclosure discipline as a globally-binding governance rule in
shared/AGENTS.shared.md(under "Verification and decision discipline"), plus an operational Bootstrap block at the top ofsrc.claude/CLAUDE.mdandsrc.codex/AGENTS.codex.md. The rule requires every fix or implementation commit to (1) decompose the proposed change into distinct interpretive claims, (2) verify each claim via the fourEvidence-citation disciplinecategories — or via direct user confirmation for user-intent claims — or explicitly label itASSUMPTION (UNVERIFIED)in the commit body with the verification step that would resolve it (silent unverified claims are violations), (3) keep scope proportionate to what the verified hypothesis actually requires (wider scope is allowed only when the hypothesis itself names it), (4) prefergit reset --hard HEAD~Novergit revertwhen a hypothesis is later invalidated on a local-only commit, and (5) treat named shortcut phrases (most likely means,presumably,I believe it refers to,this should map to,extrapolating from,based on training data,while I'm here let me also) as forbidden when used as the load-bearing reasoning for a commit, though they remain acceptable in open exploration. The Bootstrap block is the operational 5-step checklist (diagnostic data → hypothesis inventory → verify-each-or-label → scope proportionality → recovery readiness) plus a violation-trigger list naming the banned phrases as the activation signal. The Codex pack section line budget insrc.codex/skills/lead/scripts/validate-skill-pack.shwas bumped 300 → 340 to accommodate the Bootstrap (~32 lines). This matters because the existingAmbiguity resolution disciplineandEvidence-citation disciplinestate the principle ("do not guess; verify") and provide the evidence categories, but they did not require disclosure of the hypothesis chain itself — a commit could pass evidence-citation while still resting on an unverified interpretive leap hidden inside a confident-sounding claim. The new rule makes the hypothesis chain an explicit artifact reviewers can audit. -
Added
externalCodexProfile: gpt-5.5-xhighas the shipped best-effort default and as the consultant-lane mandatory profile, restoring symmetry with the long-shippedexternalClaudeProfile: opus-maxdefault. Schema (shared/agents-mode.schema.json) now listsgpt-5.5-xhighalongsidedefaultandgpt-5.5-fast, and the shipped scalar default flips fromdefaulttogpt-5.5-xhigh. Per-preset assignments inshared/agents-mode.presets.jsonnow correspond to each preset's stated intent: best-effort presets (default,correctness-first,power-mode) shipgpt-5.5-xhigh; balanced presets (absolute-balance,external-aggressive) keepdefault(which inheritsexternalModelMode); the maximum-speed preset (max-speed) shipsgpt-5.5-fast, which selects the fast Codex model tier (model variant only — reasoning_effort still staysxhigheven on this profile, so it is not an effort downgrade ofgpt-5.5-xhigh, just a faster underlying model). Theinvoke-codex-prompt.sh/.ps1prompt-orchestration wrappers now include-c model_reasoning_effort=xhighin their default Codex flag set, so a default wrapper invocation honors the best-effort contract automatically. Consultant role files (src.claude/agents/consultant.mdandsrc.codex/skills/consultant/SKILL.md) make the rule unconditional: consultant Codex calls always usegpt-5.5-xhighregardless of operator-set profile orexternalModelMode, mirroring the consultant Claude rule that always uses--model opus --effort max. External-worker, external-reviewer, and lead/dispatch contracts on both pack sides now document the third allowed value and its semantic. This matters because the previous shipped default (externalCodexProfile: default+externalModelMode: runtime-default) silently downgraded consultant Codex memos to the runtime default (high), and re-dispatch on failure repeated the downgrade — the user reported the consultant landing onhighinstead ofxhighand the fix restores symmetric best-effort across the Codex and Claude consultant paths, with the wrapper default and the consultant-lane unconditional rule providing two independent enforcement layers. -
Added a
## Coexistence with the superpowers pluginsection tosrc.claude/CLAUDE.mdclarifying how Orchestrator templates and superpowers skills compose. superpowers skills shape main conv's process discipline (HOW to think and work — brainstorming, systematic-debugging, verification-before-completion, writing-plans, TDD, subagent-driven-development), Orchestrator templates shape delegation routing (WHO does what and through which gate — quick-fix, research, review, full-delivery, etc.). Standard composition: process skill first if applicable (e.g. brainstorming before full-delivery; systematic-debugging before quick-fix), then Orchestrator template for delegation, then subagents may invoke common-skills via Skill tool inside their own context. The section includes a per-skill overlap table (brainstorming ↔ research/full-delivery, systematic-debugging ↔ quick-fix+bug-hunting, writing-plans ↔ planner, verification-before-completion ↔ qa-engineer, TDD ↔ implementer roles, subagent-driven-development ↔ team templates, requesting/receiving-code-review ↔ reviewer roles) and a quick decision rubric for "do I invoke a superpowers skill or pick a template?". This matters because operators running both packs together previously had no documented integration contract, and the resulting role-routing ambiguity in main conv was the user's stated concern: "mainconv claude unclear when it takes which roles, especially when using superpowers". -
Closed a recurring lazy-verification failure mode where a subagent would infer "provider unavailable" from the absence of a repo-side wrapper script and silently substitute its own opinion for the requested external one. The fix covers the three-level failure pattern the consultant's own post-mortem surfaced: (1) lazy verification — never ran
command -v codexdespite the 5-second cost; (2) pattern-match bias — extrapolated one missing wrapper artifact to "the whole subsystem is broken"; (3) curiosity failure — observed dispatch rounds returnResolved provider: noneand accepted that as proof of unavailability rather than as evidence the routing layer itself had a defect to investigate. Three coordinated changes: (a) newActive-availability probe disciplineinshared/AGENTS.shared.mdrequiring availability claims to be backed by a direct probe (command -v,which,Get-Command,Test-Path,curl -I) executed in the current session, with indirect inferences explicitly forbidden and anomalous orchestration output (Resolved provider: nonewith a profile that lists a primary) explicitly framed as a routing-investigation signal rather than evidence of unavailability; (b)Bootstrap — first actionblock prepended to bothsrc.claude/agents/consultant.mdandsrc.codex/skills/consultant/SKILL.mdordering the consultant role to verify provider availability as a real shell call, write the prompt to a temporary file, shell out via the prompt-orchestration wrapper, and treat any response drafted without an actual tool call as a discipline violation; (c) new file-based prompt-orchestration wrapperssrc.claude/agents/scripts/invoke-codex-prompt.sh/.ps1andinvoke-claude-prompt.sh/.ps1that encapsulate the sharedExternal CLI prompt deliveryrule in one invocation — they probe provider availability, persist the prompt body to.scratch/<provider>-prompts/<topic>-<timestamp>.md, run the CLI with prompt redirected from that file, capture stdout/stderr to sibling files, and propagate the exit code. The wrappers carry no SECRET.md, no env injection, and no auth-mode switching; the existinginvoke-claude-api.*wrapper remains the separate secret-backed transport used only whenreserveResolver: claude-wrapperresolves through it. This matters because the failure pattern recurs across the kosyak corpus and a generic "verify before claiming" rule was not enough; the discipline now has a specific named trigger for availability claims, a structural enforcement order in the consultant role itself, and an automation surface that removes the per-invocation file/redirect chain construction that subagents tend to skip under pressure. -
Added slash-command auto-invocation contract so main conversation can apply the right entry-point command's flow when the user describes the intent without typing the slash command. Each of the six entry-point commands (
/agents-bugfix,/agents-implement,/agents-design,/agents-research,/agents-review,/agents-refactor) now carries a## When to auto-invokeblock at the top of its file listing trigger phrases, intent patterns, and explicit do-not-auto-invoke exceptions;src.claude/CLAUDE.mdadds a## Slash command auto-invocationdispatch index that maps intent signals to command files, with resolution rules for ambiguous matches and an explicit fallback to the template decision tree when no auto-invoke trigger fires. Auto-invocation requires main conv to announce the routing decision in its first response so the user can override if the auto-routing was wrong. This matters because the slash commands previously had no documented signal for when to invoke them outside the user typing the command explicitly, and the user's stated concern was that main conv could not auto-route ("важно, чтобы mainconv мог сам их вызывать, если я забуду"). The fix lives in the existing command surface — no new auto-trigger skill (which would compete with superpowers' own routing) and no settings.json hook (which would land outside the pack). -
Expanded
shared/AGENTS.shared.mdwith eight new globally-binding rules and five extensions to existing rules, distilled from a 20-incident post-mortem corpus and a personal rules collection. New rules cover wire-shape verification before cross-component integration (most common failure pattern in the corpus, three same-day incidents), operational-contract scope (status codes, log severity, alert paging are part of the contract), destructive-default polarity (Apply=truenotDryRun=true), guard-precondition discipline (defensive checks must verify their own inputs, not just arithmetic on present values), all-return-paths discipline for cross-cut fixes (slow path AND cache-hit fast path AND inner re-check), polling-anchor discipline (anchor on system's authoritative latest state, not self-derived filter), race-window assertion discipline (engineer windows deterministically large, not nondeterministically tiny), end-to-end channel verification for durability and observability fixes (trace producer → receiver, not just producer-side green), and user-repetition / visual-reference fidelity (repetition = high-confidence signal of misread intent; visual = ground truth). Existing rules extended:Ambiguity resolution disciplinenow lists four evidence categories (file:line, installed-dependency surface, official documentation, smoke test) plus banned correctness-drivers (should work,probably, etc.);Evidence-based completionnow includes background-task notification ≠ artifact verification and authorization-fidelity (do not substitute weaker verification for the authorized surface);Markdown formula rendering formatnow requires traceability of each input quantity, classification of each formula (analytical exact / common generalized / approximation / fit / numerical policy), and preservation of formula semantics during readability-only edits;External CLI prompt deliverynow includes long-run checkpoint discipline (Claude opus/max ≥30 min is a checkpoint not a failure, use 45-60 min timeouts or background processes with periodic polling); core delegation completion-reconciliation now includes roadmap-slice ≠ final answer and user continuation signals (дальше,go,продолжай). -
Added role-specific governance bullets pointing at the cross-platform/cross-language port discipline (Go regex vs PS1 case sensitivity, POSIX reparenting vs Windows parent-alive, CSV parser strictness) in
$platform-engineer,$toolchain-engineer, and$backend-engineeracross both Claude and Codex source trees. Added the subagent thread-limit discipline (practical ≤4 concurrent in-session agents, route extras to external CLI) to$leadon both sides. Added the feature-toggle planning discipline (stable identifier, owner, default state, single capability-registry gate, disabled = fully inert across UI/hotkey/watcher/persistence) to$plannerand$architecton both sides. This matters because rules that only fire when a specific kind of work happens belong in the relevant role definition where they will actually be read, not buried in the shared catalog. -
Added Rule 8 to the
$bug-huntingcommon skill: announce launch parameters before non-trivial runs (batch jobs, optimization stages, multi-hour compute, simulations, benchmarks) so diagnostic vs production mode is never silently swapped, and verify expected output artifact existence at the declared destination after — completion notifications are not by themselves evidence of success. This matters because pre-launch parameter omission and post-run notification-trusting are two specific failure modes the corpus surfaced that overlap conceptually with bug-hunting's diagnostic discipline. -
Introduced the
common skillscategory as a first-class governance concept alongside the role index. Common skills are workflow-focused capabilities that any role or the main conversation can invoke when the skill's description matches the current task; they package reusable methodology, gates, and evidence requirements without owning delivery. This matters because workflow knowledge that used to live as ad hoc per-project Codex skills (for example manual Qt/Windows UI verification, MathType technical-book formatting, systematic bug-hunting via diagnostic logging) now has a named lane in the pack with the same installer, validation, and cross-provider distribution path as roles, instead of being scattered across user-machine.codex/skills/directories. -
Documented two common-skill archetypes in
shared/AGENTS.shared.md:knowledge-style(loaded into the caller's current context to inform how the caller performs work) anddelegate-style(spawnable as a fresh-context subagent that returns one self-contained artifact, also invocable inline via the skill loader). This matters because the two archetypes have meaningfully different invocation semantics — knowledge-style ships SKILL.md plus optionalreferences/only, while delegate-style additionally carriesagents/openai.yamlon Codex and a thin Agent-tool wrapper at.claude/agents/<name>.mdon Claude so the main conversation can fork a clean context when delegation is intended. -
Shipped the first four common skills across all four provider source trees:
$windows-gui-manual-testing(delegate-style; Qt and native Windows desktop GUI manual visual verification, now including a screen-capture recipe so the agent can record a fresh video when none exists),$analyzing-video-bugs(knowledge-style; ffmpeg-based frame extraction, scene-change detection, dense sampling around transitions, and native-resolution verification for any UI/animation bug video),$bug-hunting(knowledge-style; systematic runtime-bug investigation via diagnostic logging — log first, never patch on unverified theory, never re-roll on guesses), and$mathtype-book-page(knowledge-style; bring translated technical-book DOCX pages to accepted MathType format with source-PDF authority and a defective-chunk repair reference). This matters because production Codex and Claude installs now distribute these workflows through the standard pack install path, so operators no longer need to manually mirror them between user-machine and project-local directories. -
Wired explicit cross-references between the three UI/bug skills:
$windows-gui-manual-testingowns video capture and hands off to$analyzing-video-bugsfor frame analysis;$analyzing-video-bugsaccepts any video regardless of who recorded it;$bug-huntingRule 4 points at the video skills for visual evidence rather than duplicating ffmpeg recipes. This matters because the three skills have overlapping concerns (general bug methodology, visual UI evidence, video frame extraction) and explicit pointers prevent each one from drifting into the others' scope or duplicating recipes. -
Added explicit common-skill pointers to the most relevant production-line role files:
$qa-engineer,$ui-test-engineer,$backend-engineer,$frontend-engineer, and$qt-ui-engineerin both Claude (src.claude/agents/<role>.md) and Codex (src.codex/skills/<role>/SKILL.md). Each role now has a one-line working-rule pointing at the relevant common skill(s) for its primary work surface. This matters because while Skill-tool description matching can in principle discover these skills automatically, an explicit pointer in the role body raises recall in practice and documents the intended pairing of role + skill for new contributors. -
Extended the Claude installer's pack directory list with
skillsin bothscripts/install-claude.shandscripts/install-claude.ps1. Codex, Gemini, and Qwen installers already iteratedskills/, so the change is Claude-only and reuses the existing per-item install loop, manifest verification, and user-file preservation behavior. This matters because the Claude pack can now ship.claude/skills/<name>/SKILL.mdcontent reachable by theSkilltool from both the main conversation and subagents, without introducing a new copy mechanism or separate validator path.
- Added the shared
externalCodexProfile: default | gpt-5.5-fastoperator key to theagents-modecontract while keeping every shipped preset ondefault. This lets operators opt into a Codex fast profile explicitly after provider resolution, while preserving existing productionexternalProvider: autobehavior: whenautoresolves to Codex,defaultinheritsexternalModelModeinstead of silently changing the model/profile choice. The shared YAML exemplar is now grouped with section comments so the flat installer-compatible shape is easier to read without introducing a nested-schema migration. - Tightened release-gate coverage and operator wording for the same
agents-modechange. The installer regression now exercises the Codex PowerShell global install path when PowerShell is available, the docs describe Codex pinned-top execution through supported config/profile keys instead of an unsupported raw CLI flag, and the install guide now states that model-only edits to pack-owned built-in Codex agent templates are treated as stale template drift rather than preserved custom overrides. - Refreshed the Russian reference layer so shared and provider-local
ru/trees no longer lag the English source references. The evidence-based answer pipeline now has a shared Russian reference plus provider-local compatibility pointers, the Russian workflow and subagent-operating-model references include the current change-classification, external-adapter, completion-reconciliation, and periodic-control semantics, and the Codex Russian operating-model diagram now matches the current sequential Codex runtime model. The Codex pack validator also now checks that every reference Markdown file has a matching Russian counterpart, so future operator-facing reference drift is caught before publication.
- Updated the shipped Codex top-model path from
gpt-5.4togpt-5.5across built-in subagent overrides and current model-policy documentation. This matters because the Codex CLI does not expose a documented fresh-model alias in its local help, so Orchestrarium's pinned top-model contract must not leave production agents on a stale model name; Claude still uses theopus-maxprofile, which resolves through Claude'sopusalias. - Hardened Codex reinstall behavior for built-in agent override files. Reinstall now refreshes stale Orchestrarium-owned
default.toml,worker.toml, andexplorer.tomltemplates even when the stale model string is not the current default, while preserving files whose prompt or structure was actually customized by the user. This matters because pack-owned model and prompt updates should reach long-lived global installs without silently destroying real operator customizations. - Added a shipped
quality-firstproduction provider-order profile alongside the quietbalanceddefault. Operators can now setexternalPriorityProfile: quality-firstwhen they want maximum result quality from productionautorouting: the profile biases near-tie advisory, source-bound, and review lanes toward Codex while preserving Claude-first lanes where the benchmark evidence gives Claude a clearer compact or visual-worker edge. This matters because model-priority tuning is now a named routing profile, not an init-time preset or a hidden host-line default; Gemini and Qwen remain explicit example-only paths. - Routed the init-time
power-modepreset through the shippedquality-firstprovider-order profile instead of leaving it on the quietbalancedprofile. This matters becausepower-modeshould mean maximum useful result end to end: quality-first provider order plus top model policy, forced delegation, forced MCP, parallel fan-out, and two-opinion advisory/review lanes, while preserving the production-only Codex/Claude provider boundary. - Added the first machine-readable work-item execution tracking contract. Active work items can now keep
agent-runs.jsonlbesidestatus.md, andscripts/validate-work-item-state.* --work-item <path>checks duplicate run IDs, stale running agents, missing evidence, missing artifacts forPASS, and inconsistentBLOCKED/REVISEgates before closeout. This matters because the existing rule "verify subagents before trusting them" now has an operator-checkable workflow instead of relying only on narrative session summaries. - Hardened that validator before publication so
PASSartifacts must resolve inside the work-item directory, evidence entries must carry only schema-approved fields with a valid kind plus reference, and required timestamp/run identifiers must satisfy the shared schema length contract. Codex and Claude source validators now run a real schema-plus-validator smoke check instead of only grepping for ledger strings. This matters because machine-readable closeout evidence should not be satisfied by a path that escapes local task memory, by an empty or shape-invalid evidence object, or by source validators that cannot catch schema/implementation drift. - Added a ledger append/init helper for local task-memory operators.
scripts/agent-run-ledger.* --work-item <path> initprepares legacy work items by adding missing status sections andagent-runs.jsonl, whileappendwrites one event and immediately reuses the shared validator, rolling the file back if the event would make closeout invalid. This matters because real agent tracking now has a practical command path instead of asking operators to hand-edit JSONL during an active delivery flow. - Added a periodic active work-item state checker and installed the runtime helper surface with the production Codex and Claude packs. Operators can now run
scripts/check-work-items-state.* --root . --stale-hours 24to scan all active work items for invalid ledgers or stale running agents, while global/project installs copyagent-run-ledger.*,validate-work-item-state.*, andcheck-work-items-state.*into the provider runtime script directories. This matters because interruption recovery and publication review can verify every active task without requiring the Orchestrarium source checkout.
- Added machine-readable
agents-modecontract sources and a shared contract validator for the current scalar keys, provider sets, production priority profile, opinion counts, and init-time preset expansions. The newshared/agents-mode.schema.json,shared/agents-mode.presets.json, andscripts/validate-agents-mode-contract.pyare wired into the Codex, Claude, Gemini, and Qwen source validators and backed by a unittest smoke check. This matters because future changes topower-mode,reserve, productionautorouting, or example-provider boundaries now have a single executable drift check instead of relying on manual table comparison across YAML, docs, and init helper prose. - Moved
agents-moderuntime normalization onto the same schema-backed provider/lane policy used by the new contract check.scripts/normalize-agents-mode.pynow readsshared/agents-mode.schema.jsonnext to the shared YAML template for production providers, supplemental advisory/review candidates, and provider-scoped scalar defaults, while keeping a compatibility fallback for older standalone layouts. This matters because reinstall/read-time normalization no longer carries a second hardcoded Codex/Claude/reserve universe that can drift from the documented contract. - Extended the
agents-modecontract validator beyond Markdown tables so it now checks provider init canonical-shape YAML snippets and thecorrectness-first/power-moderaised opinion-count bullet lists. This matters because init helpers can no longer drift by keeping the table correct while their copied write-shape block or lane-count prose silently goes stale. - Added installer-level
agents-moderegression coverage that runs the Bash installers for Codex, Claude, Gemini, and Qwen against disposable stale overlays under/.scratch/, then verifies the emitted provider files against the schema-backed profile and count contract. The new check also caught and fixed a Gemini/Qwen Bash installerseddelimiter bug that could break managedAGENTS.mdimport rewriting during project installs. This matters because the contract is now proven through the real install path, not only through standalone normalizer and documentation checks. - Added a generated-doc sync tool for the
agents-modecontract.scripts/sync-agents-mode-docs.py --checknow verifies that the operator reference and all provider init surfaces match the JSON schema/preset sources for preset tables, raised opinion-count lists, and canonical YAML snippets, while--writerefreshes those generated blocks after intentional contract edits. The main contract validator invokes the check directly, so generated docs can no longer drift silently while the schema remains correct. - Added an init-time
power-modepreset for the hardest tasks where the operator wants maximum useful result rather than merely more speed or more fan-out. The preset combines forced delegation, forced parallelism, forced MCP use, external worker/reviewer preference, pinned top production-provider policy, Opus-max Claude profile on the Codex line, and two-opinion advisory/review lanes while keeping neutral workdirs and production-onlyautorouting. This matters because complex work now has a named high-power posture without changing the quiet shared default or promoting Gemini/Qwen into production profiles. - Aligned the Qwen init preset surface with the full shared preset ladder instead of keeping a shorter expansion summary. Qwen now documents the same key-by-key
default,absolute-balance,external-aggressive,correctness-first,power-mode, andmax-speedmatrix as Gemini and the production packs, including fixed two-opinion advisory/review lanes forcorrectness-firstandpower-mode; the Qwen validator now checks this parity. This matters because Qwen remains example-only, but its integration example should be as complete as Gemini and should not drift from the canonical agents-mode contract. - Added canonical
reserveResolversupport toagents-modeso the symbolicreserveprofile candidate can be bound without bloating every lane order. Operators can now usereserveResolver: claude-sonnet,claude-wrapper,wrapper:<command>, ordisabled;wrapper:<command>is documented as a PATH-resolved command or repo-relative wrapper path, while normalization stripsreservewhen the resolver is disabled and preserves legacyexternalClaudeApiMode: disabledby converting it toreserveResolver: disabled. This matters because advisory/review lanes can intentionally use a weaker or broader reserve transport such as Sonnet or a local wrapper without turning that transport into a production provider or worker/editing fallback. - Renamed the old
claude-secretprofile candidate to symbolicreserveand retiredexternalClaudeApiModefrom the canonicalagents-modeschema. New defaults and init/toggle flows now express the supplemental read-only path only through advisory/reviewexternalPriorityProfiles, while normalization maps legacyclaude-secretentries to last-positionreserve, preserves legacyexternalClaudeApiMode: disabledby strippingreserve, and drops the retired scalar from canonical output. This matters because operators can now attach any approved read-only reserve resolver without making the schema look Claude-specific or allowing the reserve path into worker, editing, installer, or publication lanes. - Rebased the shipped external provider priority profile on the
2026-05-01 full-v2-hard-r2benchmark release. The repo now carries a release-backed routing evidence note with the/40model totals, all 40 score slots, and the final12 + 1line-priority read;shared/agents-mode.defaults.yaml, operator docs, Codex/Claude external-dispatch references, and normalization checks now use the consolidated lanes (design.ui-ux-structure,worker.reasoning-constraints,worker.ui-implementation,worker.visual-graphics-visualization,review.security, andreview.ui-visual-correctness) instead of the older split UI/decorative/review-visual profile names. This matters because productionexternalProvider: autonow follows the accepted Codex/Claude professional-fit evidence while still keeping owner roles out of generic external adapters and keeping Gemini/Qwen out of production profiles. - Compressed Codex skill frontmatter descriptions into balanced startup metadata and tightened the Codex skill catalog budget from 5000 to 3000 description characters. This preserves role-selection anchors such as Qt, ETL, IaC, SLOs, high-DPI, auth, secrets, CI/CD, and external-provider provenance while keeping detailed role contracts in each skill body. This matters because Codex loads descriptions before selecting skill bodies, and compact metadata reduces startup truncation warnings without degrading role discovery quality.
- Added shared Engineering hygiene rules for mechanism inventory and state-synchronization ownership before introducing new architecture paths. Agents now must identify the existing owner and propagation pipeline for non-trivial state sync, validation, parsing, persistence, routing, scheduling, retry, cache, auth, config, or cross-surface behavior before adding a parallel mechanism; cross-surface state changes must avoid split-brain UI/API/tray/CLI paths, preserve canonical backend semantics, define lifecycle ownership and diagnostics, and verify representative surfaces against the same final state. This matters because architecture fixes should extend the maintained owner instead of creating hidden second paths that bypass events, validation, audit, or user-visible reconciliation.
- Promoted the global Markdown formula hotfix back into the shared governance source and operating-model references. Human-facing Markdown docs that add or materially revise formulas now must use dollar-delimited inline math by default, avoid multi-line
$$...$$display blocks unless the target renderer is explicitly verified, avoid\sb/\spcompatibility hacks, scan for unbalanced dollars and broken Markdown table pipes, keep formulas out of headings, brace subscripts and superscripts, and state each formula's applicability, assumptions, units or dimensions, variable meanings, and owning source when those facts are not obvious. This matters because scientific and engineering notes should render reliably while making special-case formulas and reduced models visibly scoped. - Revised the existing shared "never guess" rule instead of adding a second competing AXIOM. Root-cause, bug-fix, runtime/UI, native API, and "it does not work" claims now require concrete observable data that rules out plausible alternatives before behavior changes proceed; the Claude feedback memory carries the same measurement-first guidance. This matters because faster iteration is only useful when each iteration is anchored in evidence rather than plausible-sounding diagnosis.
- Clarified the installed runtime-entrypoint split for the production Codex and Claude lines. Root docs now state explicitly that installed Codex
AGENTS.mdshould stay the compact universal minimum while detailed installed role/runtime guidance lives in the installed skill bodies and built-in.codex/agents/*.tomloverrides; shared/provider references remain source-tree maintainer canon, and Claude remains the analogous shortCLAUDE.mdentrypoint plus.claude/agents/*.mdrole-file model. This matters because maintainers now have one clear place for detailed guidance without re-expanding the installed CodexAGENTS.md, and the production-vs-example routing boundary for Gemini/Qwen is unchanged. - Aligned operator-facing README and install documents with the shared terminology discipline by adding concise terms-and-abbreviations sections to the main repo docs, provider runtime layout reference, and provider source-tree READMEs. This matters because the production-vs-example provider boundary, runtime/install surfaces, and mixed CLI/MCP/agents-mode vocabulary should be readable without relying on tribal context.
- Hardened the Windows PowerShell validation wrappers so installed global Codex and Claude packs can validate from their runtime roots even when the current directory is not inside a git checkout. This makes the documented global install verification path reliable for ordinary Windows operator shells.
- Added an explicit shared verification rule for subagent output: a subagent
PASS, summary, or claimed test result is now treated as a claim that the orchestrating owner or next accountable gate must verify against files, diffs, artifacts, logs, command output, or other repo-standard evidence before accepting, forwarding, or using it for completion. This matters because parallel and external helper work should reduce latency without turning agent reports into unverified truth; the Codex, Claude, Gemini, and Qwen validators now check that the governance surface keeps this rule. - Added a shared visual-artifact verification rule for generated images, diagrams, drawings, renders, charts, plots, screenshots, CAD/exported drawings, and similar artifacts. These outputs now require actual visual inspection before acceptance, commit, user delivery, or evidentiary use; file existence, generation logs, metadata, hashes, and model claims are not enough. Pack validators now guard both the shared governance rule and the shared operating-model reference where that source is available. This matters because visual artifacts can be syntactically generated but visually blank, clipped, mislabeled, corrupted, or semantically wrong.
- Added a shared documentation terminology rule for new and materially revised human-facing documents. Documents that use domain terms, role names, provider/model names, workflow labels, acronyms, or potentially unclear English terms now need a final
Terms and Abbreviationssection, or a localized equivalent, that expands and explains those terms. This matters because mixed English/Russian governance and skill-pack docs should remain readable to operators without forcing them to infer abbreviations from context; existing documents are not mechanically rewritten unless a glossary cleanup pass is explicitly in scope. - Tightened the root router default install so pressing Enter now installs only the production Codex/Claude pair, and removed the "all available root installs" menu path that could install Gemini and Qwen together with production packs. Gemini and Qwen remain individually selectable explicit example integrations, but they are no longer reachable through a default-style aggregate installer path. This matters because providers classified as
WEAK MODEL / NOT RECOMMENDEDmust stay out of default installs, not only out of productionautorouting.
- Removed the duplicated source-side Gemini and Qwen
AGENTS.shared.mdcopies.shared/AGENTS.shared.mdis now the only source-tree governance canon for those example integrations:src.gemini/GEMINI.mdandsrc.qwen/QWEN.mdimport it directly in the monorepo, while the installers still materialize provider-local runtimeAGENTS.mdfiles for installed Gemini and Qwen targets. This matters because source governance updates now have one maintained owner instead of near-identical provider copies that could drift. - Fixed the provider installers so repeated installs skip unchanged runtime payload directories or files instead of deleting and copying every pack-owned item on each refresh. Codex now skips unchanged skill directories, Claude now skips unchanged agent subdirectories and command or agent files, Gemini/Qwen now skip unchanged extension payload items, and the example extension READMEs plus installed validators now carry the same not-recommended provider label without confusing source-root README requirements with installed extension README requirements. This matters for global installs whose runtime payload target is a junction into a OneDrive-backed directory: active provider sessions can scan metadata while an install is running, and unnecessarily rewriting unchanged files can surface transient Windows
os error 32file-lock reads on unrelated runtime files. Changed payload items are still replaced, intentional legacy cleanup still runs, and user-added items are still preserved. - Fixed Codex skill frontmatter descriptions that used unquoted plain scalars containing
:, which made YAML parsers reject affectedSKILL.mdfiles such asvisualization-engineer. The Codex validator now also checks skill frontmatter as YAML, so future metadata-budget edits cannot pass while still being invalid startup metadata. This matters because Codex and editor tooling read the frontmatter before the skill body, and a single invalid description can break skill discovery even when the rest of the pack validates structurally.
- Reduced Codex skill frontmatter descriptions to compact startup metadata and added a validator budget gate for the Codex skill catalog. This matters because Codex loads all installed skill descriptions before selecting a skill body, so verbose role metadata could trigger startup truncation warnings and hide useful routing cues. Detailed role contracts remain in each
SKILL.mdbody, whilevalidate-skill-packnow fails if descriptions grow past the per-skill or aggregate metadata budget. - Hardened installed Codex validation for real global
CODEX_HOMElayouts. The validator now resolves symlinkedskills/andAGENTS.mdwithout losing the actual~/.codex/agents/runtime root, and it keeps preserved user-added skills as non-blocking orphan warnings instead of treating their metadata as Orchestrarium pack-budget failures. This matters because long-lived global installs can keep custom skills and Windows/WSL link layouts while still receiving a strict pack-owned validation gate. - Added an explicit file-based prompt delivery rule for provider-backed CLI runs. External consultant, worker, and reviewer launches that carry substantive task prompts now write those prompts to temporary files and feed them through stdin or a provider-supported file-input mechanism, keeping argv limited to flags, model/profile options, and file paths. This matters because long inline prompts are fragile across Windows shells, quoting layers, logs, and process listings; tiny smoke checks and documented provider limitations remain allowed deviations when recorded.
- Reclassified the secret-backed Claude wrapper from a fallback transport into the separate
claude-secretadvisory/review profile candidate. The shared defaults now placeclaude-secretonly after primaryclaudeandcodexon advisory and review lanes, worker lanes strip it, and the normalizer plus pack validators enforce that Gemini/Qwen stay out of production profiles whileclaude-secretnever becomes a scalarexternalProvider. This matters because weaker secret-backed Claude can still provide an additional advisory or review opinion when explicitly enabled, without silently replacing primary Claude or leaking into implementation, editing, installer, publication, or other mutating work. - Reframed the root integration surfaces around the current provider posture instead of treating every provider line as equally production-ready.
README.md,INSTALL.md, the shared runtime-layout note, the external-worker design spec, and the root router installers now present Codex plus Claude as the shipped production path, keep Gemini and Qwen labeled as explicitWEAK MODEL / NOT RECOMMENDEDexample integrations, remove the retired production-schema references togemini-crosscheckand Gemini-specific fallback/workdir knobs, and make the root router expose Qwen only when matchingscripts/install-qwen.*entrypoints actually exist. Theagents-modenormalizer now also strips Gemini and Qwen out of any existing customexternalPriorityProfilesprovider list during read or reinstall normalization, and the pack validators cover that migration plus the Qwen not-recommended label. This matters because maintainers now see the real production-vs-example boundary at the root entrypoints, stale local auto profiles cannot silently preserve Gemini/Qwen production routing, and the install menu no longer promises a Qwen root path before the repo has the necessary launcher scripts.
- Clarified the maintainer contract for the Orchestrarium monorepo itself so runtime gates do not invent a local Codex install by accident. The repo-local
AGENTS.md,README.md, andINSTALL.mdnow state explicitly that this repository is the installer/source tree, that maintainers may intentionally rely on the global Codex install at~/.codex/, and that a missing local.agents/.agents-mode.yamlinsideOrchestrarium/is not automatically a broken state. This matters because lead- or consultant-driven maintenance inside the monorepo must ask whether to use the global install or to create a repo-local install through the installers, instead of hand-writing.agents/just to satisfy a gate. - Changed the Codex-line first-write default for
externalClaudeProfilefromsonnet-hightoopus-maxacross the installer seed, init/toggle flows, and reference docs. This matters because new.agents/.agents-mode.yamlfiles created by Codex install or by Codex-side helper flows now start from the stronger Claude profile automatically, while explicit preset choices and manual overrides can still selectsonnet-highwhere a lighter profile is intentional. - Fixed the consultant closeout contract so
consultantMode: disablednow really disables consultant usage instead of leaving a hidden lead closeout blocker. Lead, consultant, and operator-reference surfaces across the packs now treat consultant sweeps as explicit or repo-policy-driven advisory work, and a disabled consultant mode now waives consultant closeout rather than forcing a formal batch-close escalation for tasks that did not need consultant input in the first place. - Simplified the Claude external transport contract by collapsing the old split
externalClaudeSecretModeandexternalClaudeApiModestory into one honest wrapper toggle,externalClaudeApiMode. The shared defaults, provider addenda, lead dispatch contracts, helper docs, and validators now treat the named API path as the secret-backed Claude wrapper itself, move that key higher in the canonicalagents-modelayout, and annotate inline YAML comments with both allowed values and the default so operators no longer have to reverse-engineer which Claude flag actually owns the wrapper route. - Tightened the operator-facing guidance around cost and routing expectations. README and the routing references now state plainly that Orchestrarium is tuned for maximum execution effectiveness rather than minimum token spend, and the visual or Gemini-first routing examples are now described only as explicit provider overrides or repo-local custom profiles instead of hidden shipped defaults. This matters because maintainers and operators now get an upfront warning about rapid token burn on large or heavily fanned-out runs and are less likely to mistake sample routing postures for built-in behavior.
- Fixed the installer-side
agents-modemaintenance contract so reinstall now repairs stale installed overlays instead of silently preserving obsolete pack-owned schema. When an existing.agents-mode.yamlis present, the Codex, Claude, and Gemini installers now normalize it to the current canonical form, preserve effective scalar choices such as consultant/delegation/MCP/workdir preferences, refresh shipped profile and opinion-count blocks to the current pack version, drop retired keys such asexternalClaudeSecretMode, keepexternalClaudeProfileonly on the Codex line, and restore the current inline default comments. This matters because schema/default updates now actually land in long-lived global and project installs without forcing operators to delete their overlay by hand. - Changed the overlay-resolution contract so a missing project-local
.agents-mode.yamlno longer means "act as if no operator state exists" when the matching global Orchestrarium install is present. Codex, Claude, and Gemini docs plus role surfaces now resolve overlays in the same order: local canonical file, local legacy file, matching global canonical file, then matching global legacy file; read-time normalization happens in the scope that supplied the effective config, and read-only fallback no longer fabricates a project-local override just to make routing work. This matters because globally installed operator defaults now apply honestly across projects until an explicit project-local override is created, instead of leaving agents to report misleadingno fileor disabled-state results. - Added a shared
parallelMode: manual | auto | forceoperator key so parallel helper routing is no longer only implied through external-only surfaces. The canonicalagents-modereference, init/toggle flows, provider operating-model addenda, and the shared defaults now treatparallelModeas the general fan-out rule for any independent helper or subagent lanes, while external opinion counts and brigade routing remain explicit overlays on top. This matters because operators can now force safe parallel work for speed, keep it judgment-based, or keep it explicit-only without conflating that choice with external-provider settings.
- Fixed the Claude secret-backed transport wrappers so they no longer depend on a separate
claude-apiexecutable. Both.claude/agents/scripts/invoke-claude-api.ps1and.shnow read the nearest ClaudeSECRET.md, export the declaredANTHROPIC_*environment, and then launch plainclaude; this matters because the documented Claude fallback path can now run in environments whereclaudeis installed but no extraclaude-apishim exists. - Synced the operator docs and validation surface to the new wrapper contract. The Claude transport reference, install/help text, and skill-pack validator now describe the secret-backed wrapper as the canonical Claude API path, prefer
CLAUDE_BINas the explicit binary override, and stop claiming that a directclaude-apicommand must exist on PATH for the repo-local Claude fallback to work.
- Fixed the legacy
.agents-modemigration path across all six installers (Codex, Claude, Gemini — PowerShell and Bash). The legacy file variable was set to the same.agents-mode.yamlpath as the canonical target, so the migration function could never actually find or migrate an extensionless.agents-modefile. The legacy variable now correctly points to.agents-mode(no extension), so operators with older extensionless overlays get migrated forward into.agents-mode.yamlas documented. - Added a shared
externalModelMode: runtime-default | pinned-top-propolicy across the liveagents-modeschema, operator reference, init surfaces, consultant and external-adapter contracts, and release-facing docs.runtime-defaultkeeps the resolved provider on its own runtime-selected model/profile, whilepinned-top-promeans one strongest documented provider-native path plus one bounded same-provider fallback used only for usage-limit or quota exhaustion: Codex uses the configured top model/profile path documented in the current agents-mode reference and allowsgpt-5.3-codex-sparkonly on fully autonomous low-reasoning worker lanes, Claude keepsopus-maxand falls back by transport throughclaude-apiinstead of dropping tosonnet-high, and Gemini usesgemini-3.1-prothengemini-3-flash. - Tightened the external-execution contract so the new model policy does not get confused with transport choice or host-subagent routing. Claude transport now hangs off the single wrapper toggle
externalClaudeApiMode, Codex-lineexternalClaudeProfileremains the narrower Claude override, and Gemini fallback semantics now apply only under pinned Gemini runs instead of being described as a generic always-on hop. - Clarified the fallback semantics so operators do not treat every fallback as the same kind of downgrade. Repo-local policy now records that
gpt-5.3-codex-sparkandgemini-3-flashare bounded mechanical overflow paths for separate-limit or budget relief only, whileclaude-apiis the accepted economical near-full-strength Claude transport that may be used either after Claude CLI limit exhaustion or immediately through the explicitexternalClaudeApiMode: forcetoggle. - Lead-side external launch now has an explicit Windows shell fallback rule without changing the normal launch path or adding a new
agents-modetoggle. When a direct external run fails because the native Windows shell is blocked by shell/bootstrap, execution-policy, or environment-policy problems, the lead may retry once through Git-for-Windows Bash / MSYS; the WSLbash.exestub stays forbidden, and ordinary provider auth/quota/model failures still remain provider failures rather than shell-fallback triggers.
- Added
externalGeminiFallbackModeto the canonicalagents-modeschema and live operator surfaces. The new default isauto: Gemini-backed external work now targetsgemini-3.1-profirst and allows one fallback retry ongemini-3-flashonly for quota, limit, capacity, HTTP429, orRESOURCE_EXHAUSTED-style Gemini failures instead of keeping older pre-Gemini-3 references around. work-items/is no longer treated as publication-facing tracked repository content in Orchestrarium. The directory remains the local recovery and handoff surface for admitted work on the operator machine, but root.gitignore, repo-local task-memory guidance, and the publication gate now keep it out of tracked git so execution memory no longer leaks into GitHub by default.- Project installers now reinforce that local-only task-memory rule instead of relying on operator memory. Codex, Claude, and Gemini project installs now ensure both
/.reports/and/work-items/are ignored in target repositories, so freshly bootstrapped projects inherit the same local-only recovery boundary the monorepo now expects. agents-modeinit now exposes one explicit safe-start preset plus a small family of operator postures instead of implying that one default should cover every workflow.defaultstays the quiet safe-init path,absolute-balanceis the everyday center, and the more aggressiveexternal-aggressive,correctness-first, andmax-speedshortcuts now let operators choose external saturation, verification density, or throughput deliberately without confusing that choice with the persisted provider-order profile namebalanced.- External lane routing is now more explicit about what each provider is actually good at. The canonical lane matrix now separates broad UI modernization from surgical UI cleanup, keeps performance and architecture review as a distinct reviewer lane, and adds
worker.systems-performance-implementationso Rust hot paths, media-pipeline work, and other systems/perf-sensitive implementation can route tocodex > claude > geminiwithout overloading the ordinary implementation lane. Shared scalar defaults, the provider universeauto | codex | claude | gemini, and the rule thatclaude-apiis transport-only under the Claude provider all stay unchanged. - The install surfaces, root manuals, and provider references now describe the same current routing contract instead of forcing operators to reconstruct it from mixed older prose. README, INSTALL, the canonical
agents-modereference, and the provider addenda now agree thatexternalProvider: autois lane/profile-driven, thatclaude-apiis a secondary Claude transport rather than a fourth provider, and that provider-local references are addenda to the live canon rather than a second hidden source of truth. - Standalone Codex, Claude, and Gemini entry docs now expose the same release-relevant operator cues as the monorepo entry surfaces. Their README and INSTALL front doors now call out the current init-time preset family and point operators directly at the explicit systems/perf lane in the canonical
agents-modereference, whiledocs/provider-runtime-layouts.mdnow names the Gemini and root-router install surfaces among its authoritative inputs so release audits do not rely on an incomplete source list.
- Gemini no longer validates or installs as a decorative lead-only scaffold. The Gemini source tree, standalone pack, and runtime docs now carry the full shared role catalog in
skills/, the preview specialist-team layer inagents/, and the same team-template inventory used by the neighboring packs. Gemini validators now fail if that role or team surface is missing, and Gemini installers now materialize.gemini/agents/in real installs instead of silently shipping onlyskills/andcommands/. - Gemini installs now materialize the shared-governance layer as installed
AGENTS.md, andGEMINI.mdimports that file through Gemini CLI's standard markdown import mechanism instead of pretending shared governance should be baked into.gemini/settings.json. The source tree still keepssrc.gemini/AGENTS.shared.mdas the canonical shared module, but install output now matches the cross-provider runtime convention better. - Gemini reinstalls now preserve user-side
@...imports that live in the managedGEMINI.mdimport block next to@./AGENTS.md, matching the Claude-side reinstall contract instead of silently dropping local includes on refresh. Project installs also keep an existing rootAGENTS.mdintact, so repo-owned shared governance can be imported without overwriting user-owned project policy files. - The root router installers,
install.shandinstall.ps1, now install all three provider lines instead of stopping at Codex and Claude. Maintainers can choose Codex, Claude Code, Gemini CLI, or all three from one entrypoint, and the new rootscripts/install-gemini.*path uses the same validated Gemini payload shape as the standalone Gemini branch. - Added missing PowerShell validator wrappers to all provider source trees.
src.codex,src.claude, andsrc.gemininow each carry aps1companion for their source-surface validation script, so Windows maintainers no longer have to drop into Git Bash just to run the canonical pack checks. - Shared governance and runtime-layout docs now explicitly require provider/runtime claims to distinguish between official provider behavior, Orchestrarium repo-local conventions, and observed installed behavior. This tightens the review and documentation contract around provider-native surfaces so source-tree patterns or local install choices are less likely to be misreported as official provider guarantees.
- External routing docs now spell out a fail-fast eligibility gate before any provider or CLI probing. Operators and leads can now see in one place that
$consultantstays advisory-only,$external-workercovers the full non-owner worker lane,$external-reviewercovers review and QA, and only owner roles such as$product-managerand$leadremain generic-external-unsupported by default, which prevents wasted runtime investigation when a user explicitly asks forexternal. agents-modenow supports switchable named external-priority profiles and per-lane multi-opinion counts instead of hiding one fixed provider order behindexternalProvider: auto. The shippedbalancedprofile preserves the quiet defaults, whilegemini-crosscheckis the explicit repo-local profile for bringing Gemini into broader non-visual advisory and review second-opinion sets when one external opinion is not enough. This keeps scalarexternalProvideroverrides intact, keepsclaude-apias a Claude transport rather than a fourth provider, and makes the broader Gemini routing policy visible in both docs and live overlays.- Removed the retired consultant fallback mode from the live contract across Codex, Claude, and Gemini.
consultantModeis nowexternal | internal | disabled,externalmeans external-only rather than “external with internal escape hatch”, thesecond-opiniontoggles no longer exposeauto, and pack validators now fail if the oldconsultantMode: autoor fallback-approved wording resurfaces. - Reading
agents-modeflags is now a canonical migration point, not just a passive lookup. Codex, Claude, and Gemini runtime docs, init helpers, second-opinion toggles, dispatch contracts, and external adapter surfaces now require comment-free or stale-format overlays to be rewritten into the current canonical layout before their flags are trusted. This means pack upgrades and doc/schema drift can self-heal existing.agents-mode.yamlfiles during ordinary reads instead of leaving old comments, missing keys, or retired layout fragments silently in place. - Parallel external helper support is now explicit instead of being an implicit side effect of scattered routing prose. Shared governance and all three provider packs now state that external concurrency is not capped at one instance per helper or provider, that
externalOpinionCountsstill governs distinct-provider opinions for a single lane rather than generic helper multiplicity, and that a new first-class brigade surface exists for launching and aggregating one bounded batch of independent external helper runs. - External execution reporting now separates adapter-host wrappers from the real provider run. Shared governance, operator docs, consultant docs, external-dispatch contracts, and validators now require
Adapter host runtimeas a separate field whenever an internal helper only launches the external call, and they explicitly forbid treating that helper asActual execution pathfor$external-workeror$external-reviewer. This keeps logs honest when a pack uses an internal wrapper to reach Claude, Codex, or Gemini, and it closes the misleading case where a Codex helper could appear to have already become a Claude execution. - External adapter execution is now direct-launch only for provider-backed consultant execution in
externalmode plus$external-workerand$external-reviewer. Internal agent/helper/subagent host layers are no longer a valid execution shape; if the orchestrating runtime cannot launch the selected provider directly, the route must fail closed or reroute honestly instead of proxying through a fake internal lane. - Gemini preview subagent loading no longer trips over a non-agent
agents/README.md. Gemini pack docs were moved out of the loader-visibleagents/path, validators now fail if any top-levelagents/*.mdfile lacks YAML frontmatter, and Gemini reinstall explicitly removes the old legacyagents/README.mdfrom installed runtimes so existing~/.geminitrees self-heal on refresh instead of continuing to throw agent-definition errors. - Installers now seed the canonical default
agents-modefile directly into the active target for all three provider lines instead of leaving that overlay to ad hoc first-use creation. In the monorepo, the only editable source default now lives inshared/agents-mode.defaults.yaml; local installs seed the project-local file, global installs seed the provider-home file, and the init helpers now review or update that installed default rather than pretending they are always bootstrapping from nothing. This keeps first-run behavior, docs, and reinstall semantics lined up across Codex, Claude, and Gemini without leaving duplicate seed files in eachsrc.<provider>/tree. - Provider-local
agents-mode.defaults.yamlfiles were removed from the monorepo source trees entirely. Main installers now derive their seeded defaults straight fromshared/agents-mode.defaults.yaml, Codex applies its one provider-specificexternalClaudeProfileaddition at install time, and the main validators now fail if those deletedsrc.<provider>/agents-mode.defaults.yamlfiles come back. Standalone pack repositories still ship one pack-root default each so they remain self-contained outside the monorepo. - Gemini installers now materialize the official Gemini extension package instead of leaving
src.gemini/extension/as a source-only placeholder. Project and global installs both populate.gemini/extensions/orchestrarium-gemini/with the activegemini-extension.json, extension README, mirroredskills/,agents/,commands/, and runtimeGEMINI.mdplusAGENTS.md, and Gemini validators now fail if that installed extension surface is missing. - Gemini installs no longer mirror the same Orchestrarium payload into top-level
.gemini/skills/,.gemini/agents/, and.gemini/commands/alongside the installed extension. The Gemini runtime now treats.gemini/extensions/orchestrarium-gemini/as the canonical Orchestrarium payload while leaving the higher-precedence user/workspace tiers free for deliberate overrides, and reinstall now cleans legacy duplicated top-level Gemini items so users do not hit noisy skill or command conflict warnings on startup. - External launch rules are now explicit instead of being reconstructible only from audit failures. The canonical operator reference, external-dispatch contracts, install docs, and Claude runtime docs now spell out that PowerShell should use the
.ps1Claude wrapper, Bash / Git Bash should use the.shwrapper,CLAUDE_API_BINis the escape hatch when a shell cannot seeclaude-api, plain unauthenticated Claude CLI runs should switch to the allowed Claude API transport instead of retrying blindly, Codex commit review should usecodex review --commit <sha>without a free-form prompt, and very wide neutral-dir release audits should be split into narrower admitted slices. - Fixed the Claude PowerShell API wrapper so it no longer depends on
ConvertFrom-Json -AsHashtable, which is unavailable in Windows PowerShell 5.1. The wrapper now accepts both-PrintSecretPathand--print-secret-path, resolvesclaude-api,claude-api.cmd, orclaude-api.exemore defensively, and preserves the same repo-local.claude/SECRET.mdthen~/.claude/SECRET.mdlookup order.
- Added a repo-local publication gate at
scripts/check-publication-gate.shandscripts/check-publication-gate.ps1. The gate now combines the existing staged leak scan with an explicitRELEASE_NOTES.mdcheck for release-relevant staged changes, so maintainers no longer have to rely on prose-only repo rules to remember this step before publication. - Added canonical
RELEASE_NOTES.mdtracking for release-relevant tracked changes in the Orchestrarium monorepo. This gives the repository one stable place to describe important behavior, workflow, and governance changes that matter at publication time instead of scattering that context across commit messages and session reports. - Made release-note updates mandatory before publication when staged tracked changes affect installed behavior, governance, routing, role contracts, install surface, or developer and operator workflow. This closes a documentation gap in the publication path: significant changes now need an explicit human-readable explanation before release instead of relying on readers to reconstruct impact from diffs alone.
- Added a Gemini CLI provider-pack scaffold in
src.gemini/. The scaffold follows the official Gemini-preferred model withGEMINI.mdas the native entrypoint,skills/<name>/SKILL.mdas the expertise layer,commands/*.tomlas user command shortcuts, andextension/gemini-extension.jsonas the future MCP and extension boundary. - Added a Codex utility skill,
$init-project, for first-time project bootstrap. It configures## Project policiesin the rootAGENTS.mdand initializes.agents/.agents-mode.yamlfrom the canonical Codex policy catalog and external-dispatch schema, so Codex now has a direct onboarding entry point comparable to Claude's project-init flow instead of making operators piece that setup together manually.
- Added explicit task-continuity and execution-continuity language to the canonical
agents-modereference and the Codex, Claude, and Gemini init entrypoints. Initialized projects now document that side requests pause but do not replace the primary task unless the user explicitly reprioritizes, and that accepted phases should continue forward by default until a real gate blocks progression. - Completed the first-time init contract across all three provider lines in the main canonical docs. Codex still bootstraps
AGENTS.mdplus.agents/.agents-mode.yaml, Gemini stays split between native/initand the optional Orchestrarium overlay helper, and Claude init now also bootstraps.claude/.agents-mode.yamlinstead of stopping at project policies alone. - Migrated the canonical release log away from a single
## Unreleasedbucket into dated sections. Future updates now append under the current## YYYY-MM-DDheading, which keeps publication history chronological without repeatedly rewriting one shared scratch section. - Reframed the public repository wording around
cross-providerorchestration instead of describing the monorepo only as two current agent packs. README, shared reference guidance, and GitHub About now present Codex and Claude Code as provider-specific packs built on one shared governance and reference core, which makes the repository's intended extension path for future providers such as Gemini much clearer. - Fixed the Claude installer so reinstalling the Claude pack no longer drops user-side
@...imports that live in the installed.claude/CLAUDE.mdimport block next to@AGENTS.md. This matters for local customizations such as@memory/...: pack source still owns the installed Claude section, but user-side imports in the target file now survive reinstall instead of being silently erased. - Reworked
Engineering hygieneapplication guidance with explicit rule precedence, working definitions, thematic grouping, and repo-local concretization guidance. This keeps the underlying standards intact while making them much easier to apply consistently during implementation and review, especially when multiple rules pull in different directions or when a reader needs to know which repo-local details must still be defined. - Synced Codex and Claude publication-safety and operating-model references so the new hygiene structure and release-notes publication gate are documented consistently across both packs. This preserves cross-pack alignment: the installed policy, maintenance overlays, and methodology references now describe the same expectations instead of leaving one surface stale and another current.
- Clarified the root development overlays and root install/docs wording so they describe Codex and Claude Code as repo-local monorepo maintenance surfaces instead of mixing current pack purpose with stale Claude-side naming. This makes it clearer who reads
AGENTS.mdandCLAUDE.md, what those files are for, and what the root installer is asking the maintainer to install. - Renamed the remaining Claude-side
Claudestratoruser-facing strings in the installed pack, Claude commands, maintenance references, and install scripts to consistentClaude Code packwording, while broadening installer detection so already-installed legacy# Claudestratorfiles still upgrade cleanly. This removes stale branding from maintainer and runtime surfaces without breaking the reinstall path for repositories that still carry the older Claude header. - Aligned the remaining Codex-side installer, menu, and validation wording on
Codex packinstead of mixing installed-artifact labels with repo/product labels such asOrchestrarium skill-pack. This removes the last user-facing naming asymmetry between the two packs while preserving repo-context wording where it still matters, such as instructions that refer to the Orchestrarium repository root or layout detection. - Reframed the local operator config from the retired consultant-only overlay into broader pack-local
agents-modefiles. The new canonical schema separatesconsultantModefrom two operator-level preferences,delegationModeandmcpMode, so users can persist always-on vs explicit delegation behavior andautovsforceMCP usage without overloading the consultant-onlymodefield or repeating those instructions in every prompt. - Tightened the semantics of the two operator-level toggles.
delegationModenow usesmanual | auto | force:manualkeeps delegation explicit-by-request,autoleaves delegation enabled by routing judgment, andforcemakes delegation a standing instruction whenever a matching specialist and viable tool path exist. If the requested delegation path is technically unavailable, the agent must report that constraint explicitly instead of silently pretending the forced delegation happened locally.mcpModeremainsauto | force, whereforcemeans the config itself is an explicit instruction to use relevant available MCP tools instead of leaving MCP usage to optional judgment. - Codex-line
agents-modekeeps the optionalexternalClaudeProfilesetting so external dispatch to Claude can deliberately choose betweensonnet-highandopus-maxinstead of relying on provider defaults. Claude-lineagents-modedrops that field from its canonical schema because Claude-side external dispatch goes to Codex CLI, so preserving a Claude profile there only created inert local state and confused the contract. - First-time
agents-modecreation now writes the full default shape instead of a partial file. Codex writesconsultantMode,externalClaudeApiMode: auto,delegationMode,mcpMode, both external-preference flags,externalProvider, andexternalClaudeProfile; Gemini writes the same shared routing keys plusexternalClaudeApiMode: auto; Claude writes the shared shape without the Claude-target keys. This makes new workspace state self-describing and removes later need to infer missing settings after bootstrap. agents-modefiles now annotate each key with an inline YAML comment listing the allowed values for that setting. This makes the local operator config easier to inspect and edit safely without reopening the reference docs just to remember which modes are valid.- Shared governance now makes two verification expectations explicit without splitting truth across duplicate reminders: verify claims against canonical sources or live state before acting on them, and update the owning canonical artifact in the same change whenever behavior, policy, workflow, config schema, or runtime layout changes.
- Added
docs/agents-mode-reference.mdas the canonical value-by-value operator reference forconsultantMode,externalClaudeApiMode,delegationMode,mcpMode,preferExternalWorker,preferExternalReviewer, sharedexternalProvider, and Codex-onlyexternalClaudeProfile. Root manuals now point there instead of relying on scattered prose summaries. - Extended the shared operator-mode model to cover Gemini explicitly without pretending Gemini is just another copy of the Codex or Claude lines. The reference docs now distinguish Gemini's official runtime bootstrap (
/init), official runtime config (.gemini/settings.json), and Orchestrarium's repo-local.gemini/.agents-mode.yamloverlay for shared consultant/delegation/MCP/external-routing semantics. - Hardened Codex reinstall merging by wrapping the installed pack-managed
AGENTS.mdsection in explicit begin/end markers instead of guessing the section end from the last non-blank line. Reinstall now has a deterministic machine-readable boundary when preserving user-managed content below the pack block. - Added a Gemini-side
init-projecthelper so the scaffold now has an official split bootstrap path: Gemini/initownsGEMINI.md, while the Orchestrarium helper owns.gemini/.agents-mode.yamlwhen shared cross-provider routing toggles are needed. - Generalized provider-backed external dispatch so Codex or Claude can explicitly route consultant or external-adapter work through Gemini CLI instead of being locked to the historical Codex<->Claude pair. Existing defaults are preserved with
externalProvider: auto, so current operator workflows stay stable until a repository explicitly opts into Gemini as the external provider. - Fixed the Codex installer's
AGENTS.mdmerge behavior so user-managed content such as## Project policiescan survive reinstall instead of being wiped by a full file replace. This matters now that Codex has an explicit project-init flow: operator bootstrap data is meant to live with the target project, not disappear the next time the pack is refreshed. - Promoted the duplicated design-only methodology references out of
references-codex/andreferences-claude/into one canonicalshared/references/layer, and converted the old pack-local paths into compatibility pointers. This removes avoidable cross-pack drift in documents that were already semantically shared, keeps exact operational commands out of design references, and creates the extension point a future Gemini pack can reuse without minting a third copy set. - Tightened the boundaries around that new shared-reference layer.
shared/references/README.mdnow explicitly namesperiodic-control-matrixas an intentional pack-local exception because it still carries pack/runtime vocabulary, task-memory layout, and runtime-doc links, and the Gemini wording is softened from "plug in a new pack directly" to "reuse the shared layer as a foundation with local overlays where needed." The pack validators now also check that the canonical shared reference files exist and that the per-pack compatibility pointers still target them, so shared-reference drift is caught before publication instead of slipping past structure-only validation. - Extracted
subagent-operating-modelinto a shared core plus pack-local addenda instead of leaving two near-duplicate full copies. The new shared core now owns the common blueprint, routing, role, and governance-model text, whilereferences-codex/andreferences-claude/retain only runtime-specific paths, provider dispatch details, execution-model notes, and repository concretization. This reduces drift in one of the densest methodology documents without pretending away the real Claude/Codex differences that still need local addenda. - Closed the remaining risk from that extraction by teaching both pack validators to check the ownership boundary of the new split, not just file presence. They now verify that the shared
subagent-operating-modelstays free of pack-specific runtime concretization, that each pack-local addendum keeps the required runtime-specific semantics such asagents-modelocation, external-dispatch direction, periodic-control ownership, and task-memory concretization where applicable, that the addenda stay structurally bounded instead of regrowing into full near-duplicate methodology copies, and that the canonical shared core plus both English addenda still match their normalized fingerprints. The shared core now points periodic-control readers back to the pack-local matrix through the addendum instead of pretending there is a shared matrix file. Root maintenance docs andINSTALL.mdwere also updated so maintainers can see the same shared-core plus addendum model directly in the repo entry points instead of having to reconstruct it from the extracted reference files alone. - Clarified report and plan traceability for provider-backed execution. The canonical session-log contract now requires external or provider-backed runs to record
Execution role,Assigned / replaced internal role,Requested provider,Resolved provider,Actual execution path,Model / profile used, andDeviation reasonas separate fields instead of collapsing them into ambiguous labels. Codex and Claude external-dispatch contracts now mirror that same execution-record schema, and the reporting rule now treats runtime-default provider selection asRequested provider: internalplus an explicitResolved providerrather than leakingautointo the artifact. Adapter-driven reviews such asexternal-reviewerstanding in forqa-engineercan therefore be read unambiguously in.reports/and.plans/. - Fixed the publication-safety gate to inspect the actual staged file set instead of the whole index, which removes self-inflicted false positives from the checker scripts' own regex catalogs. The Claude PowerShell wrapper now also resolves its sibling bash checker correctly from
src.claude/agents/scripts/, so the same publication gate works again from the repository source tree on Windows instead of pointing at a non-existent installed.claude/...path. - Claude-target external dispatch now treats the secret-backed wrapper path as the single named transport toggle under
externalClaudeApiMode: disabled | auto | force.autokeeps the ordinary Claude CLI path first and allows one wrapper-backed fallback after CLI/auth/quota-style failures;forcestarts on the wrapper immediately. The contract still forbids silent profile downgrades or provider switches, so if the allowed Claude path still fails the provider is reported unavailable in the normal way. - Added a dedicated provider runtime-layout reference at
docs/provider-runtime-layouts.mdthat splitsglobalandlocalinstalled surfaces for Codex, Claude Code, and Gemini. This closes a recurrent source of confusion between monorepo source trees such assrc.codex/and the paths the actual provider runtimes read after installation, and it also records one important Gemini boundary early: Gemini CLI is native aroundGEMINI.md,.gemini/commands/, and extensions rather than theskills/oragents/trees used by the current Codex and Claude packs.
AGENTS.md: the Codex-readable governance and instruction file installed into a repository or provider home.agent-run-ledger.*: helper script family that initializes legacy work-item ledger files and appends validatedagent-runs.jsonlevents.agent-runs.jsonl: JSONL execution ledger stored besidestatus.mdfor machine-readable work-item state.API: Application Programming Interface, a programmatic contract exposed by a tool, runtime, or service.AXIOM: a top-priority rule label; in this change it refers to a global-only "never guess" rule that was folded into the existing shared discipline instead of copied as a competing section.Bash: a Unix-style command shell used by the.shinstallers.CAD: Computer-Aided Design, software and file formats used for technical drawings, geometry, or engineered layouts.CLI: Command-Line Interface, a terminal command surface such ascodex,claude, orgemini.Codex: the OpenAI Codex runtime and provider pack maintained by this repository.Claude Code: Anthropic's Claude Code runtime and the matching provider pack maintained by this repository.data point: one concrete observed value, log line, field, return code, screenshot fact, or command result used as evidence.Gemini CLI: Google's Gemini command-line runtime and the example provider pack maintained by this repository.hash: a deterministic digest of file bytes or content; useful for identity checks but not a substitute for visual inspection.HTTP: Hypertext Transfer Protocol, the web protocol whose status codes include quota and rate-limit signals such as429.INSTALL.md: this repository's installation guide.JSONL: JSON Lines; one JSON object per line, used here for append-only execution events.junction: a Windows directory link that redirects one path to another path on disk.Markdown: a lightweight markup format used for repository documentation.MCP: Model Context Protocol, a protocol used to expose tools and resources to agent runtimes.metadata: descriptive file or runtime information such as dimensions, timestamps, MIME type, or generator fields.native API: an operating-system or runtime interface called directly by code, where return codes and structure fields often carry the evidence needed for debugging.MSYS: a Windows POSIX-style shell/runtime family commonly used by Git Bash and similar tooling.PATH: the operating-system environment variable used to find executable commands.PASS: a gate or validator result meaning the checked artifact may proceed.QA: Quality Assurance, verification work focused on tests, regressions, and acceptance criteria.Qwen Code: the Qwen command-line runtime and the example provider pack maintained by this repository.README.md: the primary human-facing overview document for a repository or pack.root router: the top-levelinstall.shorinstall.ps1entrypoint that dispatches to pack-specific installers.SECRET.md: a local-only file used for secret-backed Claude transport configuration; it must not be published.SKILL.md: the entrypoint file for a Codex skill definition.status.md: human-readable recovery summary for the active work item.OneDrive-backed directory: a directory stored under Microsoft OneDrive synchronization, where sync and hydration behavior can affect file access timing.TeX: a typesetting system and math-notation language used by many Markdown math renderers.UI: User Interface, the visible or interactive surface presented to a user.visual artifact: an image, diagram, drawing, render, chart, plot, screenshot, CAD/exported drawing, or similar visible output.WEAK MODEL / NOT RECOMMENDED: the repository's classification for example-only providers that must stay out of productionautorouting.WSL: Windows Subsystem for Linux, a Linux compatibility environment on Windows.YAML: YAML Ain't Markup Language, the structured data format used by files such as.agents-mode.yaml.YYYY-MM-DD: date format with four-digit year, two-digit month, and two-digit day.