From 2bc52d6bb1b323fa68d590bec6295173f21eaec7 Mon Sep 17 00:00:00 2001 From: fullsend-code <278716306+fullsend-ai-coder[bot]@users.noreply.github.com> Date: Thu, 2 Jul 2026 07:55:52 +0000 Subject: [PATCH] docs(#2802): standardize tier terminology with descriptive prefixes Replace bare "Tier N" references across documentation, ADRs, problem docs, and agent prompts/skills with descriptive-prefix forms to eliminate ambiguity between the three distinct tier systems in this codebase: - "credential delivery tier" for the ADR 0025 four-tier credential model (tiers 1-4: prefetch, providers, REST server, host files) - "intent authorization tier" for the intent-representation.md four-tier change authorization model (tiers 0-3: standing rules, tactical, strategic, organizational) - "configuration tier" for the three-tier config inheritance model (upstream defaults, org config, per-repo overrides) Add a terminology convention entry to AGENTS.md documenting the three tier systems and the requirement to always use a descriptive prefix. Files updated across 28 docs, ADRs, problem docs, skills, and agent prompts. External tier references (e.g. "GitLab Free tier") are left unchanged. Plan docs with already-contextual "fallback tier" references are also unchanged. Note: pre-commit and make lint could not run due to sandbox network restrictions preventing shellcheck installation. The post-script runs an authoritative pre-commit check on the runner. Closes #2802 --- AGENTS.md | 12 +++ ...redential-delivery-for-sandboxed-agents.md | 16 ++-- ...030-openshell-sandbox-interaction-model.md | 6 +- ...-safe-push-wrapper-for-sandboxed-agents.md | 28 +++---- docs/ADRs/0033-per-repo-installation-mode.md | 2 +- docs/ADRs/0035-layered-content-resolution.md | 2 +- docs/ADRs/0046-host-side-api-server-design.md | 6 +- ...-deprecate-customized-directory-overlay.md | 2 +- docs/architecture.md | 8 +- docs/guides/user/customizing-agents.md | 2 +- docs/landscape.md | 8 +- docs/problems/agent-architecture.md | 10 +-- docs/problems/agent-compatible-code.md | 2 +- docs/problems/applied/konflux-ci/README.md | 2 +- docs/problems/architectural-invariants.md | 14 ++-- docs/problems/code-review.md | 16 ++-- docs/problems/contributor-guidance.md | 28 +++---- docs/problems/debugging.md | 2 +- docs/problems/downstream-upstream.md | 2 +- docs/problems/governance.md | 6 +- docs/problems/intent-representation.md | 80 +++++++++---------- docs/problems/platform-nativeness.md | 2 +- docs/problems/production-feedback.md | 4 +- docs/problems/security-threat-model.md | 14 ++-- docs/problems/testing-agents.md | 16 ++-- .../fullsend-repo/skills/code-review/SKILL.md | 6 +- .../fullsend-repo/skills/pr-review/SKILL.md | 2 +- .../pr-review/sub-agents/intent-coherence.md | 2 +- 28 files changed, 156 insertions(+), 144 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 2f07b1545..0936a8726 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -144,3 +144,15 @@ least-frequently-run agents. - **CODEOWNERS files are always human-owned.** Agents cannot modify their own guardrails. - **The repo is the coordinator.** No coordinator agent — branch protection, CODEOWNERS, and status checks are the coordination layer. - **Organization-specific content is cordoned.** Core problem docs are general; applied considerations live in `docs/problems/applied/`. + +## Terminology: tier conventions + +The term "tier" is used in multiple distinct contexts across this codebase. Always use a descriptive prefix to avoid ambiguity: + +| Prefix | Meaning | Defined in | +|---|---|---| +| **credential delivery tier** | The four-tier model for how agents receive credentials: (1) prefetch + post-process, (2) providers + L7, (3) host-side REST server, (4) host files | [ADR 0025](docs/ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md) | +| **intent authorization tier** | The four-tier model for change authorization: (0) standing rules, (1) tactical/issue, (2) strategic, (3) organizational | [intent-representation.md](docs/problems/intent-representation.md) | +| **configuration tier** | The three-tier inheritance model for agent configuration: upstream defaults → org config → per-repo overrides | [ADR 0035](docs/ADRs/0035-layered-content-resolution.md) | + +**Do not** use bare "Tier N" or "tier" without a prefix — the same number means different things in different contexts (e.g., "Tier 2" could be provider-based credential delivery or strategic intent authorization). External tier references (e.g., "GitLab Free tier", "GitHub plan tiers") are exempt from this convention. diff --git a/docs/ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md b/docs/ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md index a41264670..bba7a4faa 100644 --- a/docs/ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md +++ b/docs/ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md @@ -24,7 +24,7 @@ Accepted (extends [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)) [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md) established a two-tier credential isolation model: prefetch + post-process as the default, and a host-side REST server with L7 enforcement as the fallback. The REST server requires per-endpoint proxy code, input validation, response sanitization, and server lifecycle management — cost that scales linearly with each new external service. Two problems motivate extending the model: reducing the proxy maintenance burden for services where it is unnecessary, and formalizing how agents get fine-grained operation control over external APIs — not just credential isolation, but capability scoping. -OpenShell's native provider system addresses the first problem. Providers inject credentials as opaque placeholder tokens that the gateway proxy swaps for real values at the HTTP layer, so credentials never enter the sandbox. The `fullsend run` command already supports providers: the harness layer loads provider definitions from the agent's `providers/` directory, creates them on the gateway via `openshell provider create`, and passes them to sandbox creation — so the infrastructure for tiers 2 and 4 is operational today. Tier 3 (host-side REST server) is not yet implemented. L7 egress policies add two enforcement axes: HTTP method + path restrictions, and binary-level restrictions — the proxy identifies the calling binary via `/proc/pid/exe` and walks the process tree, so policies can restrict which executables may reach each endpoint (see [openshell-policy-bypass experiment](https://github.com/fullsend-ai/experiments/pull/5) for validation). Together, providers and L7 policies replace the REST server for services with static API key/token auth, with no custom proxy code. +OpenShell's native provider system addresses the first problem. Providers inject credentials as opaque placeholder tokens that the gateway proxy swaps for real values at the HTTP layer, so credentials never enter the sandbox. The `fullsend run` command already supports providers: the harness layer loads provider definitions from the agent's `providers/` directory, creates them on the gateway via `openshell provider create`, and passes them to sandbox creation — so the infrastructure for credential delivery tiers 2 and 4 is operational today. Credential delivery tier 3 (host-side REST server) is not yet implemented. L7 egress policies add two enforcement axes: HTTP method + path restrictions, and binary-level restrictions — the proxy identifies the calling binary via `/proc/pid/exe` and walks the process tree, so policies can restrict which executables may reach each endpoint (see [openshell-policy-bypass experiment](https://github.com/fullsend-ai/experiments/pull/5) for validation). Together, providers and L7 policies replace the REST server for services with static API key/token auth, with no custom proxy code. For fine-grained operation control beyond what L7 path filtering can express, two mechanisms complement providers. First, custom wrapper binaries baked into the OpenShell sandbox image (e.g. a `safe-push` that wraps `git push` and rejects force pushes) — placed on a read-only path via Landlock so the agent cannot modify them, with L7 binary filtering ensuring only the wrapper can reach the upstream service. Second, the host-side REST server from ADR 0017, which can inspect request bodies, restrict GraphQL operations, or transform responses. Both mechanisms provide operation-level control that providers and L7 path matching alone cannot. @@ -34,7 +34,7 @@ Not all services fit the provider model. Providers cannot inject credentials int ## Decision -Adopt a four-tier credential delivery model, extending ADR 0017's two-tier model: +Adopt a four-tier credential delivery model, extending ADR 0017's two-tier credential delivery model: 1. **Prefetch + post-process** (unchanged from ADR 0017). Agent runs with zero credential access. Use for agents with fully enumerable inputs. This remains the default — the first question for any new agent is whether it can run without runtime credential access. @@ -44,14 +44,14 @@ Adopt a four-tier credential delivery model, extending ADR 0017's two-tier model 4. **Host files + L7 egress policies** (new explicit tier). Credential files are copied into the sandbox via the harness `host_files` mechanism. L7 policies restrict egress to only the necessary endpoints and binaries. Use when the provider placeholder model and REST server cannot work: services with file-based auth, multi-step OAuth2 flows, or in-sandbox cryptographic operations (e.g. GCP Vertex AI, where `google-auth-library` must read a service account JSON to sign JWTs locally). The security boundary is the network policy, not credential isolation — real credentials exist on the sandbox filesystem. This makes the single-responsibility agent model ([ADR 0020](0020-composable-single-responsibility-agents-with-individual-sandboxes.md)) especially important: the narrower the agent's responsibility and the fewer endpoints its policy permits, the smaller the attack surface if the agent is compromised. -Agent definitions should use the highest tier possible: prefer providers over REST servers, REST servers over host files. The decision tree for a new integration: can prefetch handle it, and is a deterministic input set sufficient (or does the agent need to explore dynamically at runtime)? → if prefetch suffices, use tier 1. Does the service use static token auth in headers? → use tier 2. Do credentials need to appear in request bodies or responses need transformation? → use tier 3. Does auth require credential files or in-sandbox cryptographic ops? → use tier 4. +Agent definitions should use the highest credential delivery tier possible: prefer providers over REST servers, REST servers over host files. The decision tree for a new integration: can prefetch handle it, and is a deterministic input set sufficient (or does the agent need to explore dynamically at runtime)? → if prefetch suffices, use credential delivery tier 1. Does the service use static token auth in headers? → use credential delivery tier 2. Do credentials need to appear in request bodies or responses need transformation? → use credential delivery tier 3. Does auth require credential files or in-sandbox cryptographic ops? → use credential delivery tier 4. ## Consequences - Services with static token auth (GitHub, OpenAI, Anthropic) no longer require custom proxy endpoints, reducing per-service maintenance to provider YAML and L7 policy definitions. -- The host-side REST server from ADR 0017 is retained for cases requiring request body credential injection or response transformation — its role narrows from "default fallback" to a specific tier. +- The host-side REST server from ADR 0017 is retained for cases requiring request body credential injection or response transformation — its role narrows from "default fallback" to a specific credential delivery tier. - The `host_files` mechanism is formalized as the explicit fallback for file-based auth flows (GCP Vertex AI), documenting a pattern already present in the scaffold harness files. -- L7 policy authoring remains security-critical across all tiers — providers reduce proxy code but do not reduce the need for correct path patterns, and a path pattern typo can over-permit access. -- Custom wrapper binaries used for operation-level control (tier 2) must be placed on read-only paths enforced by Landlock; if the agent can modify the wrapper, the restriction is bypassed. -- Agents using host files (tier 4) have real credentials on the sandbox filesystem; per-agent documentation must explicitly state that the security boundary is network-only. -- Fullsend should provide validation tooling that checks agent harness definitions for compliance with this model — auditing L7 policies for free-text endpoints that could carry placeholders, verifying wrapper binaries are on read-only paths, and flagging tier mismatches — for both internal development and for users crafting new agents. +- L7 policy authoring remains security-critical across all credential delivery tiers — providers reduce proxy code but do not reduce the need for correct path patterns, and a path pattern typo can over-permit access. +- Custom wrapper binaries used for operation-level control (credential delivery tier 2) must be placed on read-only paths enforced by Landlock; if the agent can modify the wrapper, the restriction is bypassed. +- Agents using host files (credential delivery tier 4) have real credentials on the sandbox filesystem; per-agent documentation must explicitly state that the security boundary is network-only. +- Fullsend should provide validation tooling that checks agent harness definitions for compliance with this model — auditing L7 policies for free-text endpoints that could carry placeholders, verifying wrapper binaries are on read-only paths, and flagging credential delivery tier mismatches — for both internal development and for users crafting new agents. diff --git a/docs/ADRs/0030-openshell-sandbox-interaction-model.md b/docs/ADRs/0030-openshell-sandbox-interaction-model.md index 49b88d263..00031188d 100644 --- a/docs/ADRs/0030-openshell-sandbox-interaction-model.md +++ b/docs/ADRs/0030-openshell-sandbox-interaction-model.md @@ -118,9 +118,9 @@ loader that sources `.env.d/*.env`. Application configuration is delivered via are available only to pre/post scripts and never enter the sandbox. **Credentials: providers reconciled on the gateway.** Credential delivery follows -the four-tier model in +the four-tier credential delivery model in [ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md). For -tiers that use OpenShell providers, the runner reconciles them before sandbox +credential delivery tiers that use OpenShell providers, the runner reconciles them before sandbox creation: it loads provider definitions from the harness's `providers/` directory and calls `openshell provider create --name --type --credential ` for each one. Credentials use the bare-key form — secret @@ -131,7 +131,7 @@ Providers are then attached to the sandbox via `--provider ` flags on real credentials at the HTTP proxy layer, so credentials never enter the sandbox. For auth flows incompatible with the provider placeholder model (e.g. GCP Vertex AI file-based auth), host files deliver credential files -directly (tier 4). +directly (credential delivery tier 4). **Files and binaries: SCP + images (Options A + B).** Agent definitions, skills, host files, and security hooks are SCP'd during bootstrap. Tool binaries and diff --git a/docs/ADRs/0032-safe-push-wrapper-for-sandboxed-agents.md b/docs/ADRs/0032-safe-push-wrapper-for-sandboxed-agents.md index 86efe8368..0d62f1bca 100644 --- a/docs/ADRs/0032-safe-push-wrapper-for-sandboxed-agents.md +++ b/docs/ADRs/0032-safe-push-wrapper-for-sandboxed-agents.md @@ -23,11 +23,11 @@ Accepted (extends [ADR 0025](0025-provider-credential-delivery-for-sandboxed-age ## Context -[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) introduced a four-tier credential delivery model and described custom wrapper binaries as a Tier 2 mechanism for operation-level control — citing `safe-push` as the canonical example of a binary that wraps `git push` and rejects force pushes. The [openshell-policy-bypass experiment](https://github.com/fullsend-ai/experiments/pull/5) validated that the three-layer defense (L7 binary matching + wrapper logic + Landlock read-only path) holds against an agent with 20 turns of unrestricted bypass attempts. This ADR specifies the design of `safe-push` and its integration with the harness and sandbox infrastructure. +[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) introduced a four-tier credential delivery model and described custom wrapper binaries as a credential delivery tier 2 mechanism for operation-level control — citing `safe-push` as the canonical example of a binary that wraps `git push` and rejects force pushes. The [openshell-policy-bypass experiment](https://github.com/fullsend-ai/experiments/pull/5) validated that the three-layer defense (L7 binary matching + wrapper logic + Landlock read-only path) holds against an agent with 20 turns of unrestricted bypass attempts. This ADR specifies the design of `safe-push` and its integration with the harness and sandbox infrastructure. ### The push robustness problem -The current code agent relies on a non-agentic post-script (`post-code.sh`) to push code after the sandbox is destroyed. This is the Tier 1 (prefetch + post-process) model: the agent never touches push credentials, and the post-script handles branch validation, secret scanning, pre-commit hooks, and the actual `git push`. This model is robust for security but has two limitations: +The current code agent relies on a non-agentic post-script (`post-code.sh`) to push code after the sandbox is destroyed. This is the credential delivery tier 1 (prefetch + post-process) model: the agent never touches push credentials, and the post-script handles branch validation, secret scanning, pre-commit hooks, and the actual `git push`. This model is robust for security but has two limitations: 1. **The agent has no control over the push flow.** The post-script is a fixed script — the agent cannot choose between force-push and regular push, retry on conflict, or adapt to diverged branches. Making the script more complex to handle edge cases increases fragility. @@ -78,9 +78,9 @@ Therefore, the only tamper-proof delivery path for the policy file is the contai ## Decision -Introduce `safe-push`, a Go binary that acts as a mandatory policy gate for all `git push` operations from inside the sandbox. `safe-push` is a Tier 2 mechanism ([ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md)) that coexists with Tier 1 post-script push — the harness configuration determines which model an agent uses. +Introduce `safe-push`, a Go binary that acts as a mandatory policy gate for all `git push` operations from inside the sandbox. `safe-push` is a credential delivery tier 2 mechanism ([ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md)) that coexists with credential delivery tier 1 post-script push — the harness configuration determines which model an agent uses. -Tier 2 is a scoped relaxation of the constraint established in the [security threat model](../problems/security-threat-model.md), which states that "agents cannot take forge actions directly — credentialed operations (push, label, comment) are applied by deterministic post-scripts outside the sandbox." Under Tier 2, agents *can* push directly, but only through `safe-push` with policy enforcement and only in environments where the risk profile justifies it (private repos, internal tooling). The threat model constraint remains the default — Tier 1 post-script push is the recommended path for public and high-risk repos. +Credential delivery tier 2 is a scoped relaxation of the constraint established in the [security threat model](../problems/security-threat-model.md), which states that "agents cannot take forge actions directly — credentialed operations (push, label, comment) are applied by deterministic post-scripts outside the sandbox." Under credential delivery tier 2, agents *can* push directly, but only through `safe-push` with policy enforcement and only in environments where the risk profile justifies it (private repos, internal tooling). The threat model constraint remains the default — credential delivery tier 1 post-script push is the recommended path for public and high-risk repos. ### Binary design @@ -107,7 +107,7 @@ OpenShell L7 policy ensures that `safe-push` is the only path to push: - **`safe-push` (`/usr/local/bin/safe-push`):** L7 process-tree tracking allows `git` processes descended from `safe-push` to reach `git-receive-pack` on `github.com`. - **All other binaries:** L7 denies all processes that are NOT descended from `safe-push` from reaching `git-receive-pack`. This includes direct `git push`, custom scripts using `curl` to speak the git smart HTTP protocol, compiled binaries, and custom git remote helpers (`git-remote-` scripts the agent could write to `/tmp/`). The L7 deny rule must match the endpoint, not the binary — any process reaching `git-receive-pack` without `safe-push` ancestry is blocked. - **SSH transport:** L7 policy must block outbound SSH (port 22 and `ssh.github.com:443`) to prevent the agent from bypassing HTTPS-based enforcement by configuring a remote with `git@github.com:...` syntax. The entire enforcement model assumes HTTPS transport — SSH is an opaque encrypted tunnel where L7 cannot inspect or gate operations. -- **Other git network operations** (`fetch`, `pull`, `clone`, `ls-remote`): Gated by L7 directly — allow or deny by binary + host + path as appropriate for the agent's tier. `safe-push` is not involved. +- **Other git network operations** (`fetch`, `pull`, `clone`, `ls-remote`): Gated by L7 directly — allow or deny by binary + host + path as appropriate for the agent's credential delivery tier. `safe-push` is not involved. ### Credential separation @@ -150,16 +150,16 @@ Denied operations are logged to `/tmp/workspace/.safe-push-audit.jsonl`: The audit log is on a writable path — the agent could tamper with it. It is extracted by the runner after sandbox teardown for observability. It is a best-effort record for operational visibility, not a security boundary. -### Coexistence with Tier 1 post-script push +### Coexistence with credential delivery tier 1 post-script push -`safe-push` (Tier 2) coexists with post-script push (Tier 1). The harness configuration determines which model an agent uses: +`safe-push` (credential delivery tier 2) coexists with post-script push (credential delivery tier 1). The harness configuration determines which model an agent uses: -- **Tier 1 (post-script push):** Preferred for public repos and high-risk environments. The post-script runs secret scanning (gitleaks), pre-commit hooks, and other content validation before code reaches the remote. Push happens outside the sandbox with the `PUSH_TOKEN`. The agent has no push capability inside the sandbox. -- **Tier 2 (safe-push):** Suitable for private repos and low-risk environments where content validation overhead is unnecessary. The agent pushes directly from inside the sandbox via `safe-push`, with policy enforcement and credential injection via providers. +- **Credential delivery tier 1 (post-script push):** Preferred for public repos and high-risk environments. The post-script runs secret scanning (gitleaks), pre-commit hooks, and other content validation before code reaches the remote. Push happens outside the sandbox with the `PUSH_TOKEN`. The agent has no push capability inside the sandbox. +- **Credential delivery tier 2 (safe-push):** Suitable for private repos and low-risk environments where content validation overhead is unnecessary. The agent pushes directly from inside the sandbox via `safe-push`, with policy enforcement and credential injection via providers. -The tier is implicit in the harness's image + policy combination. A Tier 1 harness uses an image without safe-push network access (or without safe-push entirely) and includes a `post_script` that handles push. A Tier 2 harness uses an image with safe-push and an L7 policy that routes push traffic through it. +The credential delivery tier is implicit in the harness's image + policy combination. A credential delivery tier 1 harness uses an image without safe-push network access (or without safe-push entirely) and includes a `post_script` that handles push. A credential delivery tier 2 harness uses an image with safe-push and an L7 policy that routes push traffic through it. -`safe-push` covers push policy even for agents where Tier 1 post-script push is preferred, because the same image may be used in both high-risk (Tier 1) and low-risk (Tier 2) environments. The L7 policy — not the image — determines whether the agent can actually reach the remote. +`safe-push` covers push policy even for agents where credential delivery tier 1 post-script push is preferred, because the same image may be used in both high-risk (credential delivery tier 1) and low-risk (credential delivery tier 2) environments. The L7 policy — not the image — determines whether the agent can actually reach the remote. ### Per-agent policy customization (future) @@ -180,12 +180,12 @@ Re-evaluate when the second distinct policy profile is needed. No changes to the harness YAML schema ([ADR 0024](0024-harness-definitions.md)) are required. The existing `image` and `policy` fields are sufficient: ```yaml -# harness/code.yaml (Tier 2 with safe-push) +# harness/code.yaml (credential delivery tier 2 with safe-push) description: Code agent with direct push capability for low-risk repos. agent: agents/code.md model: opus image: ghcr.io/fullsend-ai/fullsend-code:latest # includes safe-push + policy -policy: policies/code-write-tier2.yaml # includes L7 binary filtering for safe-push +policy: policies/code-write-credential-tier2.yaml # includes L7 binary filtering for safe-push pre_script: scripts/pre-code.sh # No post_script push — agent pushes via safe-push inside sandbox @@ -203,4 +203,4 @@ timeout_minutes: 120 - The `safe-push` binary, its policy file, and the real `git` binary must all reside on Landlock-protected read-only paths. If any of these can be modified by the agent, the enforcement is bypassed. - Audit logging for denied operations is best-effort (writable path, agent could tamper). The primary security boundary is the deny itself (the push never reaches the network), not the audit record. - Credential separation is maintained: `safe-push` never sees or handles credentials. OpenShell providers inject credentials at the HTTP layer after `safe-push` has already approved the operation and spawned the real `git` process. -- Tier 2 is a scoped relaxation of the security threat model's constraint that "agents cannot take forge actions directly." The threat model constraint remains the default for public and high-risk repos (Tier 1). Tier 2 must be an explicit opt-in via harness configuration, not an automatic upgrade. +- Credential delivery tier 2 is a scoped relaxation of the security threat model's constraint that "agents cannot take forge actions directly." The threat model constraint remains the default for public and high-risk repos (credential delivery tier 1). Credential delivery tier 2 must be an explicit opt-in via harness configuration, not an automatic upgrade. diff --git a/docs/ADRs/0033-per-repo-installation-mode.md b/docs/ADRs/0033-per-repo-installation-mode.md index 1a51cd782..b62437772 100644 --- a/docs/ADRs/0033-per-repo-installation-mode.md +++ b/docs/ADRs/0033-per-repo-installation-mode.md @@ -161,7 +161,7 @@ fullsend-ai/fullsend defaults < .fullsend/customized/ < AGENTS.md (base, sparse-checked) (overrides) (instructions) ``` -The org-level `.fullsend` config repo tier is skipped — the in-repo `.fullsend/` directory serves as the config workspace. The reusable workflows' "Prepare workspace" step is parameterized by root directory: `.` for per-org (the `.fullsend` repo checkout), `.fullsend/` for per-repo. In both modes, it sparse-checkouts upstream defaults into `{root}/agents/`, `{root}/skills/`, etc., then copies `{root}/customized/*` on top — identical code path, different root. +The org-level `.fullsend` config repo configuration tier is skipped — the in-repo `.fullsend/` directory serves as the config workspace. The reusable workflows' "Prepare workspace" step is parameterized by root directory: `.` for per-org (the `.fullsend` repo checkout), `.fullsend/` for per-repo. In both modes, it sparse-checkouts upstream defaults into `{root}/agents/`, `{root}/skills/`, etc., then copies `{root}/customized/*` on top — identical code path, different root. **Git ref for config reads**: In per-repo mode, `.fullsend/`, `AGENTS.md`, and `.github/workflows/fullsend.yml` are always read from the **base branch** (the default branch of the repository), not the PR head branch. This is enforced by `pull_request_target`, which checks out the base branch by default. The reusable workflows do not check out the PR head ref for config or agent instructions — only the target repo's source code is checked out from the PR head for the agent to operate on. This prevents PR authors from injecting modified agent instructions, policies, or workflow files via their PR — the project's #1 threat category (external prompt injection). diff --git a/docs/ADRs/0035-layered-content-resolution.md b/docs/ADRs/0035-layered-content-resolution.md index 23f2fe0fb..40132836d 100644 --- a/docs/ADRs/0035-layered-content-resolution.md +++ b/docs/ADRs/0035-layered-content-resolution.md @@ -21,7 +21,7 @@ Superseded by [ADR 0064](0064-deprecate-customized-directory-overlay.md). ## Context -[ADR 0003](0003-org-config-repo-convention.md) designed a three-tier layering +[ADR 0003](0003-org-config-repo-convention.md) designed a three-tier configuration layering model — `fullsend defaults < org .fullsend config < per-repo overrides` — but the runtime never implemented it. The scaffold (`internal/scaffold/scaffold.go`) copies all ~82 files from `internal/scaffold/fullsend-repo/` into every diff --git a/docs/ADRs/0046-host-side-api-server-design.md b/docs/ADRs/0046-host-side-api-server-design.md index 37c30c112..2c99d6860 100644 --- a/docs/ADRs/0046-host-side-api-server-design.md +++ b/docs/ADRs/0046-host-side-api-server-design.md @@ -26,8 +26,8 @@ Accepted ## Context [ADR 0024](0024-harness-definitions.md) introduced the `api_servers` harness field as planned but not -implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as Tier 3 -of the credential delivery model — for cases where providers (Tier 2) cannot +implemented. [ADR 0017](0017-credential-isolation-for-sandboxed-agents.md)/[ADR 0025](0025-provider-credential-delivery-for-sandboxed-agents.md) established the host-side REST server as credential delivery tier 3 +of the credential delivery model — for cases where providers (credential delivery tier 2) cannot handle, originally scoped to credentials in request bodies and response transformation. Practice revealed additional cases beyond provider reach: long-running operations exceeding MCP timeouts, operations the sandbox @@ -166,7 +166,7 @@ messages, bounded in-memory state. - Servers must bind to `0.0.0.0` on shared hosts, widening the attack surface until [NVIDIA/OpenShell#1633](https://github.com/NVIDIA/OpenShell/issues/1633) ships. -- API servers (Tier 3) are now clearly scoped to cases providers cannot +- API servers (credential delivery tier 3) are now clearly scoped to cases providers cannot handle: long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations. - Fullsend-maintained servers follow a Go interface pattern (testable, diff --git a/docs/ADRs/0064-deprecate-customized-directory-overlay.md b/docs/ADRs/0064-deprecate-customized-directory-overlay.md index f1e3b7a0f..06a0bd9f9 100644 --- a/docs/ADRs/0064-deprecate-customized-directory-overlay.md +++ b/docs/ADRs/0064-deprecate-customized-directory-overlay.md @@ -24,7 +24,7 @@ resolution). ## Context [ADR 0035](0035-layered-content-resolution.md) introduced a three-tier -layering model for agent customization: upstream defaults are copied into the +configuration layering model for agent customization: upstream defaults are copied into the workspace at runtime, then files from `customized/` (per-org) or `.fullsend/customized/` (per-repo) are overlaid on top, replacing upstream files with matching names. The overlay is file-level replacement with no diff --git a/docs/architecture.md b/docs/architecture.md index 1ebc53de8..26c3d2874 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -145,7 +145,7 @@ Identity is not the same as trust. An agent's identity lets it authenticate to e **Decided:** - Credential delivery model: four tiers — (1) prefetch + post-process for agents with enumerable inputs (zero credential access), (2) OpenShell providers + L7 egress policies for static token auth (credentials never enter sandbox), (3) host-side REST server for operations providers cannot handle — long-running operations, sandbox capability gaps, credentials in request bodies, response transformation, and multi-step atomic operations (see [ADR 0046](ADRs/0046-host-side-api-server-design.md)), (4) host files + L7 policies for complex auth requiring in-sandbox credential files. L7 policies enforce both method + path and binary-level restrictions. Providers are preferred over REST servers when viable ([ADR 0017](ADRs/0017-credential-isolation-for-sandboxed-agents.md), extended by [ADR 0025](ADRs/0025-provider-credential-delivery-for-sandboxed-agents.md)). -- Host-side API server design: Tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)). +- Host-side API server design: Credential delivery tier 3 servers follow a uniform process contract (`--port`, `--token`, `--bind-address`, `/healthz`, `/tools.json`, `SIGTERM`). Network access is controlled via composable provider profiles — atomic capability profiles composed per-harness. Per-run UUID bearer tokens are delivered through OpenShell provider placeholders. File transfer uses `openshell sandbox upload/download` ([ADR 0046](ADRs/0046-host-side-api-server-design.md)). - Per-role GitHub Apps with manifest-based creation. Each agent role gets its own app with scoped permissions. PEMs stored in Secret Manager as `fullsend-{role}-app-pem` — one secret per role, shared across orgs on a mint. `ROLE_APP_IDS` uses the same shared-per-role model (`coder` → app ID). Org isolation is enforced via `ALLOWED_ORGS`, WIF conditions, and installation verification ([ADR 0007](ADRs/0007-per-role-github-apps.md), [ADR 0033](ADRs/0033-per-repo-installation-mode.md)). Public multi-tenant mint (`ALLOWED_ORGS=*`) with upstream-only workflow provenance is defined in [ADR 0059](ADRs/0059-public-mint-mode-with-wildcard-allowlists.md); upstream-only provenance limits which workflows can call the mint, complementing [ADR 0029](ADRs/0029-central-token-mint-secretless-fullsend.md) multi-tenant blast-radius concerns. - Cross-org mint authorization: workflows may request tokens for a different org via optional `target_org` when the target org installs the role App and sets `FULLSEND_FOREIGN__REPOS`. Empty `repos` yields installation-wide tokens on either path; cross-org adds FOREIGN gating, same-org relies on WIF/OIDC enrollment ([ADR 0060](ADRs/0060-cross-org-mint-authorization-via-org-variables.md)). - Standalone mint deployment: `cmd/mint/` provides a self-contained HTTP server that uses direct JWKS verification and filesystem PEM storage instead of GCP infrastructure. It shares the `internal/mintcore/` library with the GCF mint and adds support for custom role permissions and a fallback proxy to an upstream mint. Custom role permissions live in mintcore (not `cmd/mint/`) so that `RolePermissionsFor`, `HasRole`, and `CreateInstallationToken` return a unified view without callers needing to distinguish built-in from custom roles. The GCF mint never calls `RegisterCustomRolePermissions`, so the code is inert there. See the [standalone mint guide](guides/infrastructure/standalone-mint.md). @@ -196,7 +196,7 @@ The adopting organization's **`.fullsend`** repository is the natural home for p ## Intent Source -The system that provides authorized intent for agent work. Responsible for representing what changes are wanted, who authorized them, and at what tier of approval. +The system that provides authorized intent for agent work. Responsible for representing what changes are wanted, who authorized them, and at what intent authorization tier of approval. Intent answers the question "should this change exist?" before anyone asks "is this change correct?" Without authorized intent, an agent has no basis for deciding what to work on or whether its output matches what was asked for. @@ -206,7 +206,7 @@ The adopting organization's **`.fullsend`** repository holds the pointer to the - What is the right representation — forge issues, a dedicated intent repo, RFCs, or tiered combinations? (See [intent-representation.md](problems/intent-representation.md).) - How do agents verify that intent is authentic and hasn't been tampered with? -- How do different tiers of intent (standing rules, tactical issues, strategic features) map to different authorization requirements? +- How do different intent authorization tiers (standing rules, tactical issues, strategic features) map to different authorization requirements? - How does intent interact with the "try it" phase — agents building exploratory drafts before authorization? (See [intent-representation.md](problems/intent-representation.md).) ## Observability @@ -330,7 +330,7 @@ Retrospective analyst — examines completed or in-progress agent workflows, ide ## Configuration layering -Fullsend uses a three-tier inheritance model for all configuration: agent definitions, skills, policies, harness definitions, and guardrails. Each tier can extend or override the one below it. Guardrails can only be tightened, never weakened. +Fullsend uses a three-tier configuration inheritance model for all configuration: agent definitions, skills, policies, harness definitions, and guardrails. Each configuration tier can extend or override the one below it. Guardrails can only be tightened, never weakened. ``` diff --git a/docs/guides/user/customizing-agents.md b/docs/guides/user/customizing-agents.md index cab457eaa..ab9f60551 100644 --- a/docs/guides/user/customizing-agents.md +++ b/docs/guides/user/customizing-agents.md @@ -95,7 +95,7 @@ security: # Security is enabled by default with fail_mode ## Layered Configuration Resolution -Fullsend uses a three-tier inheritance model for all configuration: agent definitions, skills, policies, harness definitions, and guardrails. Each tier can extend or override the one below it. +Fullsend uses a three-tier configuration inheritance model for all configuration: agent definitions, skills, policies, harness definitions, and guardrails. Each configuration tier can extend or override the one below it. ``` ┌──────────────────────────────────────────────────────────────┐ diff --git a/docs/landscape.md b/docs/landscape.md index 28fafc212..f1b9fe96f 100644 --- a/docs/landscape.md +++ b/docs/landscape.md @@ -134,7 +134,7 @@ The bug path is shorter: |---|---|---| | Coordination | Central checkpointed workflow | Repository as coordinator | | Authority | Jira labels/comments, GitHub reviews | CODEOWNERS, branch protection, required checks | -| Intent | Jira elaborated into PRD/spec/tasks | Tiered intent with stronger strategic authorization | +| Intent | Jira elaborated into PRD/spec/tasks | Tiered intent authorization with stronger strategic authorization | | Review | Local review, AI review, human gate | Independent zero-trust review sub-agents | | Sandbox | Productive Podman runner | Stricter credential isolation and egress policy | | Portability | GitHub/Jira-centric | Forge-neutral `forge.Client` abstraction | @@ -159,14 +159,14 @@ Fullsend has design commitments that Forge does not appear to cover: **Ideas to borrow:** -- *Staged intent artifacts.* Forge's PRD -> spec -> epics -> tasks sequence is a useful model for Tier 2+ work. Fullsend should borrow the artifact progression, not the Jira-label authority. +- *Staged intent artifacts.* Forge's PRD -> spec -> epics -> tasks sequence is a useful model for intent authorization tier 2+ work. Fullsend should borrow the artifact progression, not the Jira-label authority. - *Q&A without approval.* Humans can ask questions at a gate without approving or rejecting. This fits fullsend's ambiguous-intent and dual-interpretation escalation problems. - *Checkpointed pause/resume.* Forge waits for humans through durable workflow state, not idle implementation containers. Fullsend should keep this operational pattern while keeping authoritative state repo-visible. - *Skill override resolution.* `skills/default` plus `skills/{project}` is a simple precedent for fullsend's harness layering. - *Review feedback as a task type.* Forge's `implement_review` flow classifies review comments as actionable or contested before acting. That is relevant to fullsend's review loop and salvage/rewrite questions. - *Audited CI gate skips.* `/forge skip-gate` is dangerous unless governed, but the UX is useful: constrained command, named check, PR confirmation, audit comment, and re-evaluation. -**Cautions:** Jira label approval is too weak for high-tier intent. A workflow engine can dispatch work, but should not become merge authority. Forge's Podman runner is a productivity sandbox, not a full zero-trust boundary. A single AI review stage is not enough for autonomous merge confidence. CI skip mechanisms need permission checks, policy, and auditability from day one. +**Cautions:** Jira label approval is too weak for high-intent-authorization-tier intent. A workflow engine can dispatch work, but should not become merge authority. Forge's Podman runner is a productivity sandbox, not a full zero-trust boundary. A single AI review stage is not enough for autonomous merge confidence. CI skip mechanisms need permission checks, policy, and auditability from day one. ### Stripe Minions @@ -348,7 +348,7 @@ None of these tools address: - **Formal intent verification** — checking whether a change is authorized against a structured intent system. CodeRabbit's "intent" context is about understanding the PR's purpose, not verifying it against an authorization system. - **Zero-trust inter-agent review** — agents treating each other's output as untrusted. Existing multi-agent systems implicitly trust the orchestrator and each other. - **Autonomous merge with security-focused confidence** — the judgment problem of "should this change exist?" as distinct from "is this change correct?" -- **Tier-based autonomy** — different levels of agent authority for different types of changes. +- **Intent-authorization-tier-based autonomy** — different levels of agent authority for different types of changes. - **Agent governance** — who controls the agents' policies and permissions. - **Contribution volume management** — how maintainers handle the flood of AI-generated external contributions. See [contribution-volume.md](problems/contribution-volume.md). diff --git a/docs/problems/agent-architecture.md b/docs/problems/agent-architecture.md index 605338c8f..feaaba94f 100644 --- a/docs/problems/agent-architecture.md +++ b/docs/problems/agent-architecture.md @@ -88,7 +88,7 @@ The distinction matters because the failure modes are asymmetric. A narrow fix t When the triage agent identifies a bug that likely recurs but doesn't qualify for broad remediation, it should: -1. Fix the reported instance (normal Tier 1 flow) +1. Fix the reported instance (normal intent authorization tier 1 flow) 2. Search the codebase for structurally similar patterns 3. For each candidate location, create a **derivative issue** linked to the original, containing: - The location and the pattern match @@ -97,11 +97,11 @@ When the triage agent identifies a bug that likely recurs but doesn't qualify fo This keeps each fix scoped and individually reviewable while ensuring the broader problem doesn't get forgotten. The priority agent can then decide whether to batch derivative issues or address them individually based on severity and available capacity. -##### Interaction with the tier model +##### Interaction with the intent authorization tier model -Broad pattern remediation has a tier escalation risk. A single nil-check fix is Tier 1 (bug fix with a linked issue). But "apply nil-check discipline across the entire codebase and add a linter rule" may be Tier 2 — it's a codebase-wide convention change, not a surgical fix. The triage agent should flag this when recommending broad remediation, and the review agent must independently assess whether the scope warrants tier escalation (see [intent-representation.md](intent-representation.md#defense-independent-tier-classification-by-review-agents)). +Broad pattern remediation has an intent authorization tier escalation risk. A single nil-check fix is intent authorization tier 1 (bug fix with a linked issue). But "apply nil-check discipline across the entire codebase and add a linter rule" may be intent authorization tier 2 — it's a codebase-wide convention change, not a surgical fix. The triage agent should flag this when recommending broad remediation, and the review agent must independently assess whether the scope warrants intent authorization tier escalation (see [intent-representation.md](intent-representation.md#defense-independent-intent-authorization-tier-classification-by-review-agents)). -Conversely, if the triage agent creates 30 derivative issues for the same pattern, that's a signal that broad remediation would have been cheaper. The quality/drift detection agent (which monitors aggregate trends) should detect this accumulation and recommend consolidation into a single pattern-level issue — potentially escalating to Tier 2 if the scope warrants it. +Conversely, if the triage agent creates 30 derivative issues for the same pattern, that's a signal that broad remediation would have been cheaper. The quality/drift detection agent (which monitors aggregate trends) should detect this accumulation and recommend consolidation into a single pattern-level issue — potentially escalating to intent authorization tier 2 if the scope warrants it. ##### When the boundary is unclear @@ -166,7 +166,7 @@ Agents interact through GitHub's existing mechanisms: - **Status checks** — review sub-agents post pass/fail results - **PR comments** — structured findings, change requests, suggestions -- **Labels** — classification signals (tier, priority, scope) +- **Labels** — classification signals (intent authorization tier, priority, scope) - **Commit status** — CI results, test outcomes There is no side channel. No agent-to-agent API. No shared state outside the repo. This means: diff --git a/docs/problems/agent-compatible-code.md b/docs/problems/agent-compatible-code.md index db86c0e52..3496d6af2 100644 --- a/docs/problems/agent-compatible-code.md +++ b/docs/problems/agent-compatible-code.md @@ -57,4 +57,4 @@ When a consumed dependency runs as a container image, the integration boundary i - Can gradual typing systems (TypeScript, Python with mypy) provide sufficient safety for agentic development, or do they retain too much of the dynamic language risk? - What's the migration path for existing services written in dynamically typed languages? Is there a threshold where rewriting becomes worthwhile? - How do we measure "agent compatibility" empirically? Can we compare agent error rates or refactoring success rates across languages? -- Should consumed dependencies have a "compatibility tier" based on how well-defined their boundaries are, affecting the autonomy level for changes that touch them? +- Should consumed dependencies have a "compatibility classification" based on how well-defined their boundaries are, affecting the autonomy level for changes that touch them? diff --git a/docs/problems/applied/konflux-ci/README.md b/docs/problems/applied/konflux-ci/README.md index c62b8becf..5bb35eb6c 100644 --- a/docs/problems/applied/konflux-ci/README.md +++ b/docs/problems/applied/konflux-ci/README.md @@ -151,7 +151,7 @@ The cryptographic attestation approach (Approach 3 in the general doc) is themat Open questions specific to Konflux: - How do we handle the migration from JIRA? Can the two systems coexist during transition? -- Changes that affect the public API contract between Konflux and its users warrant Tier 3 treatment. +- Changes that affect the public API contract between Konflux and its users warrant intent authorization tier 3 treatment. ### Performance verification diff --git a/docs/problems/architectural-invariants.md b/docs/problems/architectural-invariants.md index 51dd86ce2..1cc659ced 100644 --- a/docs/problems/architectural-invariants.md +++ b/docs/problems/architectural-invariants.md @@ -4,12 +4,12 @@ How do we represent and enforce the things that must always be true about the sy ## A third kind of intent -The [intent representation](intent-representation.md) doc focuses on feature-level intent: "is this change authorized?" But there's a different category of intent that doesn't map cleanly to the tier system: +The [intent representation](intent-representation.md) doc focuses on feature-level intent: "is this change authorized?" But there's a different category of intent that doesn't map cleanly to the intent authorization tier system: -- **Feature intent** (Tiers 0-3): "Build feature X" / "Fix bug Y" — time-bounded, specific to a change +- **Feature intent** (intent authorization tiers 0-3): "Build feature X" / "Fix bug Y" — time-bounded, specific to a change - **Architectural intent**: "These things must always be true" — persistent, constraining all changes -Architectural invariants are not features. They're constraints on *how* features get implemented and *what* features are acceptable. A feature might be authorized at Tier 2, but if its implementation violates an architectural invariant, it should still be rejected — or the invariant needs to be explicitly revised through a governed process. +Architectural invariants are not features. They're constraints on *how* features get implemented and *what* features are acceptable. A feature might be authorized at intent authorization tier 2, but if its implementation violates an architectural invariant, it should still be rejected — or the invariant needs to be explicitly revised through a governed process. ## Architecture documentation as invariant source @@ -46,11 +46,11 @@ Beyond per-PR checks, a drift detection agent can periodically scan the codebase - Has naming convention drift occurred? - Are deprecated patterns still present? (ADRs that supersede earlier ones) -When deviations are found, the drift agent can open cleanup PRs — similar to OpenAI's "garbage collection" concept, but grounded in declared architectural constraints rather than style preferences. These cleanup PRs would be Tier 0 (standing rules, pre-authorized) since they're enforcing already-agreed invariants. +When deviations are found, the drift agent can open cleanup PRs — similar to OpenAI's "garbage collection" concept, but grounded in declared architectural constraints rather than style preferences. These cleanup PRs would be intent authorization tier 0 (standing rules, pre-authorized) since they're enforcing already-agreed invariants. -### 3. Tier escalation detection +### 3. Intent authorization tier escalation detection -Architectural invariants help solve the [tier escalation problem](intent-representation.md#the-tier-escalation-problem). A change classified as Tier 1 (tactical bug fix) that violates or modifies an architectural invariant is not a bug fix — it's at minimum a Tier 2 change requiring explicit authorization. The architecture repo provides the baseline for this detection. +Architectural invariants help solve the [intent authorization tier escalation problem](intent-representation.md#the-intent-authorization-tier-escalation-problem). A change classified as intent authorization tier 1 (tactical bug fix) that violates or modifies an architectural invariant is not a bug fix — it's at minimum an intent authorization tier 2 change requiring explicit authorization. The architecture repo provides the baseline for this detection. Examples: - A "bug fix" that changes a naming convention defined by an ADR → architectural change, needs authorization @@ -116,7 +116,7 @@ This lifecycle is a [governance](governance.md) concern — who can create, modi ## Relationship to other problem areas -- **Intent representation** — architectural invariants are a form of persistent, cross-cutting intent (distinct from feature-level intent in Tiers 0-3) +- **Intent representation** — architectural invariants are a form of persistent, cross-cutting intent (distinct from feature-level intent in intent authorization tiers 0-3) - **Code review** — review sub-agents (correctness, intent alignment) consume invariants as review context - **Security threat model** — drift from security-relevant invariants (RBAC, trusted task model, build provenance) is a security concern, not just a quality concern - **Repo readiness** — repos with clear architectural boundaries and documented invariants are safer for agent autonomy diff --git a/docs/problems/code-review.md b/docs/problems/code-review.md index 7a214a778..81aa12f60 100644 --- a/docs/problems/code-review.md +++ b/docs/problems/code-review.md @@ -32,7 +32,7 @@ The review process is identical whether the PR author is an agent or a human. Th ### The context window argument -A single review agent asked to evaluate a PR must simultaneously consider: correctness, security (platform and content), intent alignment, test adequacy, style conformance, prompt injection defense, tier classification, and cross-repo impact. For a non-trivial diff in a complex codebase, this overwhelms the context window — not just in terms of token count, but in terms of attention quality. +A single review agent asked to evaluate a PR must simultaneously consider: correctness, security (platform and content), intent alignment, test adequacy, style conformance, prompt injection defense, intent authorization tier classification, and cross-repo impact. For a non-trivial diff in a complex codebase, this overwhelms the context window — not just in terms of token count, but in terms of attention quality. Even as context windows grow, the problem persists. Research and practice consistently show that LLM attention degrades with volume. A 200k-token context window doesn't mean 200k tokens of equally-weighted analysis. Asking one agent to hold the full diff, the relevant codebase context, the intent specification, the security threat model, and the repo's conventions all at once means it does all of them poorly. @@ -87,17 +87,17 @@ Reviews changes for threats to the platform and its users. Collapses the platfor ### Intent & Coherence (sonnet) -Evaluates whether the change matches an authorized intent and whether its scope matches its claimed tier. +Evaluates whether the change matches an authorized intent and whether its scope matches its claimed intent authorization tier. - Does this PR trace to a linked issue or authorized feature? - Does the implementation match what the issue/feature describes? -- Is the change scope consistent with its tier classification? (The [tier escalation problem](intent-representation.md#the-tier-escalation-problem) — a "bug fix" that's really a feature request.) +- Is the change scope consistent with its intent authorization tier classification? (The [intent authorization tier escalation problem](intent-representation.md#the-intent-authorization-tier-escalation-problem) — a "bug fix" that's really a feature request.) - Does the change go beyond what was authorized? - Does the change fit the overall design of the module/system? - Is the complexity proportional to the value delivered? - Are there simpler alternatives that achieve the same goal? -**Context needed:** The diff summary, the linked issue/feature file, surrounding module architecture, design docs, the intent repo state, the tier classification criteria. +**Context needed:** The diff summary, the linked issue/feature file, surrounding module architecture, design docs, the intent repo state, the intent authorization tier classification criteria. ### Style/conventions agent (sonnet) @@ -174,16 +174,16 @@ A human reviewer can say "I'm not sure about this, let me think" or "I need some When an agent escalates to a human, the quality of that escalation matters. A vague "I'm not confident" wastes the human's time. A more useful pattern: when the agent's uncertainty stems from a change being legitimately interpretable in multiple ways, it presents its best interpretations as structured alternatives — while explicitly inviting the human to reject all of them. -For example, a review agent uncertain about tier classification could escalate with: +For example, a review agent uncertain about intent authorization tier classification could escalate with: -- **Reading A:** "This is a bug fix (Tier 1) — the existing behavior doesn't match the documented intent, and the change is scoped to correcting that gap. Requires: linked issue." -- **Reading B:** "This is a new feature (Tier 2) — the system never intended to do this, and the change adds new capability. Requires: authorized feature file in `approved/`." +- **Reading A:** "This is a bug fix (intent authorization tier 1) — the existing behavior doesn't match the documented intent, and the change is scoped to correcting that gap. Requires: linked issue." +- **Reading B:** "This is a new feature (intent authorization tier 2) — the system never intended to do this, and the change adds new capability. Requires: authorized feature file in `approved/`." Critically, the escalation must always include an explicit "none of the above" option — the human may see a framing the agent missed entirely, or may decide the change should be rejected outright. The agent's interpretations are a starting point for the human's decision, not an exhaustive menu. This avoids presenting a false dichotomy that pressures the human into picking whichever option seems least wrong. The human sees coherent framings and can pick the one that matches their understanding, offer their own, or reject the change — rather than starting from scratch. This is faster and more structured than an open-ended "please review." -This pattern is most valuable at escalation boundaries — where the system has already decided it can't resolve something autonomously. It doesn't replace confidence scores or explicit uncertainty signals; it complements them by making the *nature* of the uncertainty actionable. It applies wherever agents interact with humans: tier classification (see [intent-representation.md](intent-representation.md#the-tier-escalation-problem)), the exploration phase for proposed features (see [intent-representation.md](intent-representation.md#the-try-it-phase)), and deadlock resolution between review sub-agents (see [agent-architecture.md](agent-architecture.md#how-deadlocks-are-resolved)). +This pattern is most valuable at escalation boundaries — where the system has already decided it can't resolve something autonomously. It doesn't replace confidence scores or explicit uncertainty signals; it complements them by making the *nature* of the uncertainty actionable. It applies wherever agents interact with humans: intent authorization tier classification (see [intent-representation.md](intent-representation.md#the-intent-authorization-tier-escalation-problem)), the exploration phase for proposed features (see [intent-representation.md](intent-representation.md#the-try-it-phase)), and deadlock resolution between review sub-agents (see [agent-architecture.md](agent-architecture.md#how-deadlocks-are-resolved)). [Forge-sdlc/forge](../landscape.md#forge-sdlcforge) has a concrete version of this idea in its `implement_review` flow: review comments are treated as their own task type, classified as actionable or contested before the agent acts, and contested comments trigger a structured response rather than silent compliance. That is a useful precedent for fullsend's review loops, especially when an agent should push back on incorrect feedback while still respecting the reviewer's blocking authority. diff --git a/docs/problems/contributor-guidance.md b/docs/problems/contributor-guidance.md index 4fed15a1c..af33170fb 100644 --- a/docs/problems/contributor-guidance.md +++ b/docs/problems/contributor-guidance.md @@ -64,20 +64,20 @@ AI agents can't do any of this easily. Most of today's agent "knowledge" is docu At a minimum, any contributor (human or agent) needs to understand: -### Tier classification +### Intent authorization tier classification -Is my change a bug fix, a small improvement, a new feature, or an architectural change? The [intent representation](intent-representation.md) tier system exists, but: +Is my change a bug fix, a small improvement, a new feature, or an architectural change? The [intent representation](intent-representation.md) intent authorization tier system exists, but: - Can a new contributor reliably classify their own change? - If they misclassify (intentionally or not), how does the system course-correct? -- Are the tier definitions publicly documented in a way that's discoverable? +- Are the intent authorization tier definitions publicly documented in a way that's discoverable? ### Authorization requirements -What approvals does my change need? Tier 0 (standing rules) needs none. Tier 1 (tactical) needs a linked issue. Tier 2+ (strategic) needs explicit authorization. But: +What approvals does my change need? Intent authorization tier 0 (standing rules) needs none. Intent authorization tier 1 (tactical) needs a linked issue. Intent authorization tier 2+ (strategic) needs explicit authorization. But: - How does a first-time contributor know this? -- If they're fixing what they perceive as a bug, do they need to understand that some "bugs" are actually feature requests requiring higher-tier authorization? +- If they're fixing what they perceive as a bug, do they need to understand that some "bugs" are actually feature requests requiring higher-intent-authorization-tier authorization? - What happens if they submit a PR without the required authorization? Is there a helpful error message, or does it just languish? ### Repository-specific conventions @@ -155,7 +155,7 @@ The goal: make implicit knowledge explicit (which helps AI agents) **without** m ## Relationship to other problem areas -- **[Intent representation](intent-representation.md)** — the tier system must be explained to contributors +- **[Intent representation](intent-representation.md)** — the intent authorization tier system must be explained to contributors - **[Code review](code-review.md)** — review expectations and criteria must be transparent - **[Codebase context](codebase-context.md)** — what knowledge must agents know to succeed? - **[Governance](governance.md)** — who decides what rules contributors must follow? @@ -165,15 +165,15 @@ The goal: make implicit knowledge explicit (which helps AI agents) **without** m ## Open questions -- Should tier classification be self-reported by contributors or determined by reviewers (human or AI)? What if they disagree? +- Should intent authorization tier classification be self-reported by contributors or determined by reviewers (human or AI)? What if they disagree? - How do we handle the learning curve for new human contributors who don't yet understand the intent system, while also providing enough written context for AI assistants? - What's the right balance between "helpful guidance" (from AI reviewer agents) and "intrusive gatekeeping"? How do we ensure AI feedback is constructive? - How do we measure whether contribution guidance is working? (time to first PR merge? contributor retention? reduction in misclassified changes? satisfaction of AI-assisted vs. unassisted contributors?) See also [human-factors.md](human-factors.md) for metrics around contributor engagement and meaningful participation. -- Should there be a "sandbox" repo where contributors can experiment without worrying about tier classification and authorization? +- Should there be a "sandbox" repo where contributors can experiment without worrying about intent authorization tier classification and authorization? - How do we handle contributions from organizations that have their own AI agents opening PRs? Do external AI assistants need special guidance beyond what's in CONTRIBUTING.md and CLAUDE.md? - What happens when a human contributor disagrees with an AI reviewer's classification or feedback? Is there an escalation path to human reviewers? - How do we keep contribution documentation up-to-date as the agent system evolves? Who is responsible for capturing new institutional knowledge as it emerges? -- Should contribution guidelines be versioned? If the tier definitions change, how do in-flight contributions handle the transition? +- Should contribution guidelines be versioned? If the intent authorization tier definitions change, how do in-flight contributions handle the transition? - How do we avoid creating a "two-class" system where AI-assisted contributions get faster processing than unassisted human contributions? - How verbose is too verbose? At what point does comprehensive documentation (helpful for AI) become overwhelming for human contributors? - Should we explicitly signal which documentation is "need to know" for humans vs. "supplementary context" primarily for AI assistants? @@ -237,7 +237,7 @@ This is a **conceptual model for organizing information**, not a prescription fo | Tier | Contributor persona | What they need | |---|---|---| | **First-time** | Opening first PR, fixing a typo or small bug | Minimal friction: where to open PR, how to run tests, basic code style | -| **Occasional** | Has contributed before, submitting bug fixes or small improvements | Tier classification, issue linkage, CODEOWNERS awareness | +| **Occasional** | Has contributed before, submitting bug fixes or small improvements | Intent authorization tier classification, issue linkage, CODEOWNERS awareness | | **Regular** | Frequent contributor, proposing features or architectural changes | Full intent system, authorization process, cross-repo impact analysis | | **Core maintainer** | Has commit access, reviewing others' work | Governance model, agent configuration, security threat model | @@ -252,7 +252,7 @@ AI agents acting on behalf of regular contributors should have access to all lay When a contributor opens a PR, an AI agent (acting as a reviewer/helper) provides contextual guidance: - "This change touches API surface, which requires human CODEOWNERS approval — see [link] for details" -- "This looks like a Tier 2 feature. You'll need to open an intent proposal at [repo] first." +- "This looks like an intent authorization tier 2 feature. You'll need to open an intent proposal at [repo] first." - "CI is failing because [specific test]. Here's how to run it locally: [command]" This applies to **all PRs equally** — whether opened by a human contributor working alone, a human with AI assistance, or an AI agent acting autonomously. Treating all contributions the same way aligns with the zero trust principle: no agent should assume another agent has done its job correctly. This also provides better auditability (showing agent-to-agent handoffs) and simpler implementation (no need to detect or differentiate PR sources). @@ -265,9 +265,9 @@ This applies to **all PRs equally** — whether opened by a human contributor wo Make the review criteria public and explicit. Before submitting a PR, contributors (or their AI assistants) can see exactly what will be checked: -- [ ] Tier classification: _____ -- [ ] Linked issue (if Tier 1+): _____ -- [ ] Authorization record (if Tier 2+): _____ +- [ ] Intent authorization tier classification: _____ +- [ ] Linked issue (if intent authorization tier 1+): _____ +- [ ] Authorization record (if intent authorization tier 2+): _____ - [ ] Tests added/updated: yes/no - [ ] CODEOWNERS approval needed: yes/no (auto-detected) - [ ] Architectural invariants verified: yes/no diff --git a/docs/problems/debugging.md b/docs/problems/debugging.md index 35ecd0eaf..b7b5769ca 100644 --- a/docs/problems/debugging.md +++ b/docs/problems/debugging.md @@ -98,7 +98,7 @@ The comparison itself — "did this agent act on all of its instructions, or sil - **[Operational Observability](operational-observability.md)** — Provides the data debugging consumes. This document focuses on the methodology — what questions to ask, what fault categories to distinguish — while observability focuses on the infrastructure. - **[Testing the Agents](testing-agents.md)** — Testing catches faults before deployment; debugging addresses faults that escape testing. -- **[Intent Representation](intent-representation.md)** — Spec bugs are intent representation failures. The [wrong-spec problem](intent-representation.md#the-wrong-spec-problem) is one of the fault categories here. The tiered intent model also affects debuggability — a Tier 0 change with no explicit intent is hard to debug because there is nothing to compare the outcome against. +- **[Intent Representation](intent-representation.md)** — Spec bugs are intent representation failures. The [wrong-spec problem](intent-representation.md#the-wrong-spec-problem) is one of the fault categories here. The tiered intent model also affects debuggability — an intent authorization tier 0 change with no explicit intent is hard to debug because there is nothing to compare the outcome against. - **[Human Factors](human-factors.md)** — Debugging traditionally *reduces* expertise atrophy — you learn the system by fixing its failures. In an agentic workflow where agents handle routine fault resolution, that learning opportunity may be lost. See [domain ownership and expertise](human-factors.md#domain-ownership-and-expertise). - **[Agent Architecture](agent-architecture.md)** — The [repo-as-coordinator](agent-architecture.md#interaction-model-the-repo-as-coordinator) model means cross-agent fault localization requires reconstructing event sequences from GitHub artifacts rather than from a centralized log. - **[Code Review](code-review.md)** — The multi-agent review pipeline (correctness, security, intent-coherence, and other sub-agents) is the most concrete instance of the multi-agent chain problem analyzed here. When a bad change gets through, fault localization must determine which sub-agent should have caught it — or whether the fault fell in a gap between sub-agent responsibilities. diff --git a/docs/problems/downstream-upstream.md b/docs/problems/downstream-upstream.md index 2aa257867..506e0690f 100644 --- a/docs/problems/downstream-upstream.md +++ b/docs/problems/downstream-upstream.md @@ -117,7 +117,7 @@ Downstream contributors consume the project without contributing proportionally. The priority intake problem exists independent of agents. But agents make it more urgent. -When agents can implement a proposed feature in hours rather than weeks, the volume of "proposed and implemented" contributions increases. The project needs a priority mechanism that operates at agent speed without becoming a rubber stamp. This connects to [intent-representation.md](intent-representation.md)'s tiered model — Tier 2+ features need explicit authorization before agents can merge them. But *who proposes and who authorizes* is the downstream contributor priority question this document addresses. +When agents can implement a proposed feature in hours rather than weeks, the volume of "proposed and implemented" contributions increases. The project needs a priority mechanism that operates at agent speed without becoming a rubber stamp. This connects to [intent-representation.md](intent-representation.md)'s tiered intent authorization model — intent authorization tier 2+ features need explicit authorization before agents can merge them. But *who proposes and who authorizes* is the downstream contributor priority question this document addresses. The intent system's `proposed/` to `approved/` workflow assumes someone is filtering proposals. The priority intake model determines who that someone is and what criteria they use. diff --git a/docs/problems/governance.md b/docs/problems/governance.md index 8f5487969..3b0ba2acd 100644 --- a/docs/problems/governance.md +++ b/docs/problems/governance.md @@ -10,7 +10,7 @@ Governance is distinct from [intent representation](intent-representation.md). I The decisions that shape how the agentic system behaves across the org: -- **Tier definitions** — what change types exist, what authorization each tier requires, and who can approve at each level. (The tiers themselves are defined in [intent-representation.md](intent-representation.md); governance decides who has the authority to create or modify those tier definitions.) +- **Intent authorization tier definitions** — what change types exist, what authorization each intent authorization tier requires, and who can approve at each level. (The intent authorization tiers themselves are defined in [intent-representation.md](intent-representation.md); governance decides who has the authority to create or modify those definitions.) - **Autonomy levels** — which repos are agent-autonomous, which are in shadow mode, which require full human review. What are the graduation criteria, and who evaluates them? - **Agent permissions** — what authority each agent role has (merge, approve, comment, label). What are the boundaries, and who draws them? - **Org-wide guardrails** — minimum standards that apply to all repos regardless of individual repo policy. Examples: all repos must have CODEOWNERS, all security-sensitive paths require human approval, all agent config changes require human approval. @@ -53,7 +53,7 @@ How are governance decisions made, and how does the community participate? ## Accountability - When an agent makes a bad decision, who is responsible? The person who configured the agent? The person who authored the policy? The person who approved the repo for autonomy? -- How do we trace an agent action back to the policy that authorized it? Every merge should be traceable: this PR was merged because the review sub-agents approved, operating under policy version X, with the change classified as Tier N, authorized by intent record Y. +- How do we trace an agent action back to the policy that authorized it? Every merge should be traceable: this PR was merged because the review sub-agents approved, operating under policy version X, with the change classified as intent authorization tier N, authorized by intent record Y. - What's the escalation path when something goes wrong? Who gets paged? Who has authority to revoke agent autonomy in an emergency? - Can autonomy be automatically revoked? If a bad merge is detected (e.g., production incident traced to an agent-merged PR), should the system automatically downgrade the repo to human-required review? @@ -74,7 +74,7 @@ The governance question isn't just "how much should we spend?" but "who decides ## Relationship to other problem areas -- **Intent representation** defines the tiers and authorization mechanisms. Governance defines who has authority to change those definitions. +- **Intent representation** defines the intent authorization tiers and authorization mechanisms. Governance defines who has authority to change those definitions. - **Security threat model** identifies the threats. Governance defines the policies that mitigate them and who can modify those policies. - **Autonomy spectrum** describes the graduation model. Governance defines who evaluates readiness and makes the graduation decision. - **Agent architecture** defines the agent roles and permissions. Governance defines who assigns those permissions and under what constraints. diff --git a/docs/problems/intent-representation.md b/docs/problems/intent-representation.md index c8e5978a0..7c9483aa4 100644 --- a/docs/problems/intent-representation.md +++ b/docs/problems/intent-representation.md @@ -89,11 +89,11 @@ Moving from `explored/` to `approved/` requires signoff from architects and PM v - Migrating from JIRA (organizational inertia, existing integrations) - Feature files in `proposed/` could themselves contain prompt injection targeting agents that read them — CODEOWNERS on `approved/` prevents self-approval, but the content is still agent-consumed -## Approach 2: Tiered intent with different mechanisms per tier +## Approach 2: Tiered intent with different mechanisms per intent authorization tier -Not everything needs the same process. Explicitly tier changes by scope, with different intent mechanisms at each tier. +Not everything needs the same process. Explicitly tier changes by scope, with different intent mechanisms at each intent authorization tier. -### Tier 0: Standing rules (no per-change intent needed) +### Intent authorization tier 0: Standing rules (no per-change intent needed) Pre-authorized categories of changes that the organization always wants: @@ -104,26 +104,26 @@ Pre-authorized categories of changes that the organization always wants: The intent is "we always want these." An agent verifies the change actually falls in this category (static analysis: "this change only touches test files") and no further authorization is needed. -**Test changes require additive-only verification.** Not all test-only changes are Tier 0. Tests are part of the trust boundary — review agents rely on them to validate production code. A change that weakens assertions, broadens mocks, reduces coverage, or removes checks is modifying a guardrail, not adding coverage. The Tier 0 gate for test changes must verify that the change is additive: +**Test changes require additive-only verification.** Not all test-only changes are intent authorization tier 0. Tests are part of the trust boundary — review agents rely on them to validate production code. A change that weakens assertions, broadens mocks, reduces coverage, or removes checks is modifying a guardrail, not adding coverage. The intent authorization tier 0 gate for test changes must verify that the change is additive: -- New test files or new test functions: Tier 0 -- New assertions added to existing tests: Tier 0 -- Weakened assertions (e.g., `Equal` → `NotNil`, exact match → substring): **not Tier 0** — requires Tier 1 justification -- Removed or commented-out test cases: **not Tier 0** -- Mocks that replace real dependencies in security-sensitive paths: **not Tier 0** -- Test refactoring that restructures without weakening: Tier 0, but the review agent must verify no net reduction in assertion strength -- Binary or opaque files (`.xz`, `.bin`, encoded blobs, etc.) in test directories: **not Tier 0** — these cannot be meaningfully reviewed by agents and require human review regardless of location +- New test files or new test functions: intent authorization tier 0 +- New assertions added to existing tests: intent authorization tier 0 +- Weakened assertions (e.g., `Equal` → `NotNil`, exact match → substring): **not intent authorization tier 0** — requires intent authorization tier 1 justification +- Removed or commented-out test cases: **not intent authorization tier 0** +- Mocks that replace real dependencies in security-sensitive paths: **not intent authorization tier 0** +- Test refactoring that restructures without weakening: intent authorization tier 0, but the review agent must verify no net reduction in assertion strength +- Binary or opaque files (`.xz`, `.bin`, encoded blobs, etc.) in test directories: **not intent authorization tier 0** — these cannot be meaningfully reviewed by agents and require human review regardless of location -This distinction matters because of the [temporal split-payload attack](security-threat-model.md#cross-cutting-attack-pattern-temporal-split-payload-test-poisoning): an attacker can poison the test suite through Tier 0 test changes and later exploit the blind spot with a separate production change that passes the weakened tests. Static analysis for Tier 0 classification must go beyond "does this only touch test files?" to "does this make the test suite strictly stronger?" +This distinction matters because of the [temporal split-payload attack](security-threat-model.md#cross-cutting-attack-pattern-temporal-split-payload-test-poisoning): an attacker can poison the test suite through intent authorization tier 0 test changes and later exploit the blind spot with a separate production change that passes the weakened tests. Static analysis for intent authorization tier 0 classification must go beyond "does this only touch test files?" to "does this make the test suite strictly stronger?" -### Tier 1: Tactical (issue is sufficient) +### Intent authorization tier 1: Tactical (issue is sufficient) - Bug fixes with a linked issue and reproduction - Small improvements scoped to a single repo An agent can act on these if there's a corresponding GitHub issue. The issue itself is the intent signal. Normal code review applies, no additional approval needed. -### Tier 2: Strategic (requires explicit multi-party authorization) +### Intent authorization tier 2: Strategic (requires explicit multi-party authorization) - New features - API changes @@ -133,7 +133,7 @@ An agent can act on these if there's a corresponding GitHub issue. The issue its This is where the git-based authorization mechanism (Approach 1) kicks in. The feature must be explicitly authorized via the intent repo before agents can merge the implementation. -### Tier 3: Organizational (requires broader consensus) +### Intent authorization tier 3: Organizational (requires broader consensus) - Cross-org changes affecting multiple repos - Deprecations and removals @@ -141,16 +141,16 @@ This is where the git-based authorization mechanism (Approach 1) kicks in. The f Possibly requires an RFC-like process with a community review period, in addition to the git-based authorization. -Note: changes to the agentic system itself (agent policies, security policies, tier definitions) are a [governance](governance.md) concern, not an intent concern. Those changes are about modifying the rules of the system, not about authorizing work within the system. +Note: changes to the agentic system itself (agent policies, security policies, intent authorization tier definitions) are a [governance](governance.md) concern, not an intent concern. Those changes are about modifying the rules of the system, not about authorizing work within the system. -### The key question for each tier +### The key question for each intent authorization tier -How does a review agent verify which tier a change falls into? This is non-trivial: +How does a review agent verify which intent authorization tier a change falls into? This is non-trivial: -- **Tier 0** might be automatable via static analysis ("this change only touches test files") -- **Tier 1** needs issue linkage verification -- **Tier 2-3** need the formal authorization check against the intent repo -- **Tier gaming** is a threat — an attacker frames a strategic change as a tactical bug fix to avoid the higher approval bar. The review agent must independently assess change scope, not trust the author's classification. +- **Intent authorization tier 0** might be automatable via static analysis ("this change only touches test files") +- **Intent authorization tier 1** needs issue linkage verification +- **Intent authorization tiers 2-3** need the formal authorization check against the intent repo +- **Intent authorization tier gaming** is a threat — an attacker frames a strategic change as a tactical bug fix to avoid the higher approval bar. The review agent must independently assess change scope, not trust the author's classification. ## Approach 3: Intent as cryptographic attestation @@ -179,7 +179,7 @@ Repos contain descriptions of desired behavior — like ADRs, or an `INTENT.md` **Pros:** Intent travels with the code. Version-controlled. Agents can read it at review time. -**Cons:** Hard to keep current. Doesn't capture strategic/cross-repo intent well. Easy to game (an attacker could modify the intent document as part of a malicious PR). Better suited for cross-cutting standing rules (Tier 0) than for feature-level intent. +**Cons:** Hard to keep current. Doesn't capture strategic/cross-repo intent well. Easy to game (an attacker could modify the intent document as part of a malicious PR). Better suited for cross-cutting standing rules (intent authorization tier 0) than for feature-level intent. ## Approach 5: Issues/specs as intent source @@ -225,10 +225,10 @@ Any intent system needs to survive attack: - **JIRA manipulation** — attacker fast-tracks a feature through refinement states. In a system using JIRA as the intent source, this attack works because there are no real ACLs on state transitions. - **Git-based manipulation** — attacker submits a PR to the intent repo with a feature file containing prompt injection in the description. CODEOWNERS on `approved/` prevents self-approval, but `proposed/` is open and the content is agent-consumed. - **Attestation forgery** — attacker compromises a signing key. Mitigated by requiring multiple signatures (m-of-n), but key management is complex. -- **Tier gaming** — attacker frames a strategic change as a tactical bug fix to avoid the higher approval bar. The review agent must independently assess change scope. +- **Intent authorization tier gaming** — attacker frames a strategic change as a tactical bug fix to avoid the higher approval bar. The review agent must independently assess change scope. - **Intent composition** — three small "tactical" changes that individually look innocuous but together constitute an unauthorized feature. Detection requires cross-change awareness. -## The tier escalation problem +## The intent authorization tier escalation problem Tiering is necessary, but it introduces a specific weakness: low-tier changes have lightweight intent requirements, which creates an incentive to disguise high-impact changes as low-impact ones. @@ -240,26 +240,26 @@ Experienced maintainers catch this: "That's not a bug, that's a feature request ### How it gets worse with agents -An agent processing a "bug report" at Tier 1 (lightweight intent, just needs an issue) might: +An agent processing a "bug report" at intent authorization tier 1 (lightweight intent, just needs an issue) might: - Implement a significant behavioral change because the issue describes it as a fix - Add new API surface under the guise of "fixing" missing functionality - Change security-relevant behavior because the reporter framed a policy decision as a defect -The agent is technically responsive to the issue, but it's implementing something that should have gone through Tier 2 authorization. +The agent is technically responsive to the issue, but it's implementing something that should have gone through intent authorization tier 2 authorization. -### Defense: independent tier classification by review agents +### Defense: independent intent authorization tier classification by review agents -Review agents must independently assess what tier a change *actually* represents, regardless of how the author or issue classifies it. This means: +Review agents must independently assess what intent authorization tier a change *actually* represents, regardless of how the author or issue classifies it. This means: - **Scope analysis** — does this change add new behavior, or fix existing behavior? Adding new API endpoints is not a bug fix, even if the issue says "bug." -- **Impact analysis** — does this change affect security, UX, or API surface? If so, it's at least Tier 2 regardless of the issue label. -- **Intent verification** — does the linked issue actually describe what this PR does? And does the code do exactly what the intent file says, and nothing more? The vibe-to-spec workflow gives the agent a strict checklist. If someone tries to sneak a major new feature into a low-tier bug fix, the agent will automatically block it because the extra code won't match the generated spec. +- **Impact analysis** — does this change affect security, UX, or API surface? If so, it's at least intent authorization tier 2 regardless of the issue label. +- **Intent verification** — does the linked issue actually describe what this PR does? And does the code do exactly what the intent file says, and nothing more? The vibe-to-spec workflow gives the agent a strict checklist. If someone tries to sneak a major new feature into a low-intent-authorization-tier bug fix, the agent will automatically block it because the extra code won't match the generated spec. - **Pattern detection** — multiple "small" changes from the same source that collectively add up to a feature should trigger escalation. -When tier classification is genuinely ambiguous, rather than making a weak call or defaulting to escalation without context, the review agent can use [dual-interpretation escalation](code-review.md#dual-interpretation-escalation) — presenting the human with its tier readings and the evidence for each, while always leaving room for the human to see a different framing or reject the change entirely. +When intent authorization tier classification is genuinely ambiguous, rather than making a weak call or defaulting to escalation without context, the review agent can use [dual-interpretation escalation](code-review.md#dual-interpretation-escalation) — presenting the human with its intent authorization tier readings and the evidence for each, while always leaving room for the human to see a different framing or reject the change entirely. -This applies equally to review agents looking at code PRs *and* to agents evaluating intent changes in the intent repo itself. A low-tier intent statement that describes something high-impact should be flagged and escalated. +This applies equally to review agents looking at code PRs *and* to agents evaluating intent changes in the intent repo itself. A low-intent-authorization-tier intent statement that describes something high-impact should be flagged and escalated. ### The philosophical question @@ -277,21 +277,21 @@ The system needs a mechanism for catching specs that are *internally valid but e The combination of **Approach 1 (git as intent ledger) + Approach 2 (tiered intent)** appears strongest: -- Low-tier changes have lightweight intent requirements (or none for Tier 0) -- High-tier changes require git-tracked, CODEOWNERS-enforced authorization +- Low-intent-authorization-tier changes have lightweight intent requirements (or none for intent authorization tier 0) +- High-intent-authorization-tier changes require git-tracked, CODEOWNERS-enforced authorization - The "try it before you buy it" pattern (agents build exploratory PRs before authorization) provides high-information, low-risk exploration This combination addresses the JIRA ACL weakness, provides audit trails, and scales across the change-type spectrum. But it needs experimentation to validate. ## Open questions -- How do agents classify a change's tier reliably? Can this be automated, or does a human need to label it? -- How do we handle emergent changes — a "small bug fix" that reveals a deeper architectural issue requiring Tier 2+ authorization? -- Can intent be composed? If three Tier 1 changes together constitute an unauthorized Tier 2 feature, who notices? +- How do agents classify a change's intent authorization tier reliably? Can this be automated, or does a human need to label it? +- How do we handle emergent changes — a "small bug fix" that reveals a deeper architectural issue requiring intent authorization tier 2+ authorization? +- Can intent be composed? If three intent authorization tier 1 changes together constitute an unauthorized intent authorization tier 2 feature, who notices? - How do we prevent the intent repo from becoming a bottleneck at agent speed? - What does the feature file format look like? How much structure is needed for agents to evaluate programmatically, and could an AI-driven "vibe-to-spec" workflow using tools like spec-kit reliably generate this required structure (functional requirements, acceptance scenarios, state machines etc) directly from rapid human prototyping? - How do organizations handle migration from existing issue tracking systems (e.g., JIRA)? Can the two systems coexist during transition? -- What's the relationship between intent tiers and CODEOWNERS in the target repos? Are guarded paths a proxy for "changes here are always Tier 2+"? +- What's the relationship between intent authorization tiers and CODEOWNERS in the target repos? Are guarded paths a proxy for "changes here are always intent authorization tier 2+"? - Cross-repo intent: when a feature spans multiple repos, is it one feature file referencing multiple repos, or multiple feature files? - How does the "try it" pattern work for changes that can't be meaningfully evaluated without merging? (e.g., infrastructure changes, deployment config) -- Who has authority to modify the tier definitions and authorization requirements? (See [governance.md](governance.md)) +- Who has authority to modify the intent authorization tier definitions and authorization requirements? (See [governance.md](governance.md)) diff --git a/docs/problems/platform-nativeness.md b/docs/problems/platform-nativeness.md index e7665967a..86299d657 100644 --- a/docs/problems/platform-nativeness.md +++ b/docs/problems/platform-nativeness.md @@ -69,7 +69,7 @@ gh-aw explicitly keeps humans in the loop. Its safe-outputs model produces artif Checking whether a change is authorized against a structured intent system — not just "is this change correct?" but "is this change one we actually want?" This is absent from every tool in the [landscape](../landscape.md), including gh-aw. A native system could implement it, but gh-aw's architecture does not. See [intent-representation.md](intent-representation.md). -### Tier-based autonomy +### Intent-authorization-tier-based autonomy Different agent authority for different types of changes: auto-merge a dependency bump, require human review for an API change, block agent-authored modifications to CODEOWNERS. gh-aw's [integrity filtering](https://github.github.com/gh-aw/reference/integrity/) implements a form of input trust tiering (`merged > approved > unapproved > none`) and its [supply chain protection](https://github.github.com/gh-aw/reference/threat-detection/#supply-chain-protection-protected-files) blocks modifications to sensitive files (dependency manifests, CI config, CODEOWNERS) by default — but these are applied to *what the agent can see and touch*, not to *whether the agent's output should be merged*. The output model remains flat: agent proposes, human decides, regardless of change type. Fullsend's autonomy spectrum applies to the merge decision itself. See [autonomy-spectrum.md](autonomy-spectrum.md). diff --git a/docs/problems/production-feedback.md b/docs/problems/production-feedback.md index ad03c38e0..b05c31ed6 100644 --- a/docs/problems/production-feedback.md +++ b/docs/problems/production-feedback.md @@ -47,11 +47,11 @@ Correlating user-reported problems with platform signals serves two purposes: ## Potential agent interactions with signals -**Triage agent** monitors signal distributions and creates issues when failure patterns exceed thresholds — without waiting for a human to notice and report. A sustained increase in failures across users for a given operation type is equivalent to dozens of individual bug reports. The agent files a single well-scoped issue with affected versions, sample logs, and time-of-onset. This is signal-driven rather than report-driven triage: the signal is the bug report. Broad, multi-user patterns suggesting architectural root causes should be escalated to Tier 2 at creation time. +**Triage agent** monitors signal distributions and creates issues when failure patterns exceed thresholds — without waiting for a human to notice and report. A sustained increase in failures across users for a given operation type is equivalent to dozens of individual bug reports. The agent files a single well-scoped issue with affected versions, sample logs, and time-of-onset. This is signal-driven rather than report-driven triage: the signal is the bug report. Broad, multi-user patterns suggesting architectural root causes should be escalated to intent authorization tier 2 at creation time. **Priority agent** weights open issues by breadth of impact (users affected), depth (fraction of operations failing), duration, and rate of change. Priority updates dynamically as the signal evolves — not only when a human re-triages. -**Review agent** uses platform reliability history to calibrate scrutiny on PRs. A code path responsible for a high fraction of recent scheduling timeouts or failure spikes warrants deeper edge-case analysis than a low-traffic utility. This also feeds tier classification — a "bug fix" touching a historically high-blast-radius path may warrant Tier 2 treatment regardless of how the issue was filed. +**Review agent** uses platform reliability history to calibrate scrutiny on PRs. A code path responsible for a high fraction of recent scheduling timeouts or failure spikes warrants deeper edge-case analysis than a low-traffic utility. This also feeds intent authorization tier classification — a "bug fix" touching a historically high-blast-radius path may warrant intent authorization tier 2 treatment regardless of how the issue was filed. **Code agent** uses failure logs, error distributions, and timing correlation as starting context — richer than a human-written issue. The agent can correlate log patterns to code paths and generate a root cause hypothesis before writing any code. diff --git a/docs/problems/security-threat-model.md b/docs/problems/security-threat-model.md index bb52fb739..32415194f 100644 --- a/docs/problems/security-threat-model.md +++ b/docs/problems/security-threat-model.md @@ -222,7 +222,7 @@ This cross-cutting nature is why the model-as-toolchain risk deserves explicit t ### Open questions -- Should dependency updates be in a separate autonomy tier from code changes? +- Should dependency updates be in a separate intent authorization tier from code changes? - How do we handle the case where a dependency update is legitimate but introduces a vulnerability that hasn't been disclosed yet? - How do we verify model integrity? Unlike a compiler binary that can be hash-checked, model behavior is stochastic and opaque. - Is model diversity a practical defense, or do the costs (maintaining multiple model integrations, inconsistent output quality) outweigh the security benefit? @@ -279,13 +279,13 @@ The five threats above model attacks on agents, on credentials, on the model, an The attacker splits a malicious change across two PRs separated by time: -1. **PR 1 (setup — Tier 0).** A PR that only modifies test files. It appears to add coverage for an edge case, but subtly weakens an assertion, introduces a misleading mock, or removes a check that would catch a specific vulnerability. Because "test additions that don't change production behavior" are classified as Tier 0 standing rules (see [intent-representation.md](intent-representation.md)), this PR requires no per-change intent. It touches no CODEOWNERS-guarded paths. It merges autonomously. +1. **PR 1 (setup — intent authorization tier 0).** A PR that only modifies test files. It appears to add coverage for an edge case, but subtly weakens an assertion, introduces a misleading mock, or removes a check that would catch a specific vulnerability. Because "test additions that don't change production behavior" are classified as intent authorization tier 0 standing rules (see [intent-representation.md](intent-representation.md)), this PR requires no per-change intent. It touches no CODEOWNERS-guarded paths. It merges autonomously. 2. **PR 2 (payload — weeks or months later).** A separate PR introduces a vulnerability in production code. The Correctness sub-agent evaluates test adequacy and finds that the relevant code path is covered by tests that pass. The Security sub-agent evaluates the diff in isolation and sees nothing overtly malicious — the vulnerability relies entirely on the blind spot created by the poisoned test. It finds no prompt injection patterns because the attack uses syntactically valid code, not adversarial text. ### Why existing defenses don't catch it -- **Tier 0 classification** treats test-only changes as inherently low-risk, but tests are part of the trust boundary — they are what review agents rely on to validate production code. Poisoning tests is poisoning the oracle. +- **Intent authorization tier 0 classification** treats test-only changes as inherently low-risk, but tests are part of the trust boundary — they are what review agents rely on to validate production code. Poisoning tests is poisoning the oracle. - **Stateless per-PR review** means no review agent correlates the test weakening in PR 1 with the production change in PR 2. Each PR is evaluated independently. - **CODEOWNERS** typically guards production APIs, deployment manifests, and security policies — not test directories. - **The Security sub-agent** looks for prompt injection patterns. This attack uses standard, valid code. There is nothing to flag. @@ -295,14 +295,14 @@ The attacker splits a malicious change across two PRs separated by time: This is not a variant of an existing threat category. It combines elements of several: -- It exploits **Tier 0 intent rules** (an intent representation problem) +- It exploits **intent authorization tier 0 rules** (an intent representation problem) - It defeats **stateless review** (an agent architecture problem) - It subverts **testing as a guardrail** (a repo readiness problem) - It uses **temporal separation** to evade correlation Temporal separation of attack stages is not new — it is a well-established evasion technique. The [xz backdoor (CVE-2024-3094)](https://en.wikipedia.org/wiki/XZ_Utils_backdoor) is a direct real-world precedent: the attacker spent years building trust as a contributor, then introduced malicious binary test fixtures (`tests/files/*.xz`) that contained the actual backdoor payload, extracted at build time. The "test data" *was* the delivery mechanism. That attack demonstrated that test infrastructure is not a low-risk area — it is an attack surface, and one that reviewers (human and automated) systematically under-scrutinize. -What makes this pattern particularly dangerous in the agentic context is that the system *formally codifies* the assumption that test-only changes are low-risk (Tier 0), and review agents operate statelessly per-PR. The xz attack succeeded against human reviewers who could in principle have correlated the commits; here, the system architecture structurally prevents that correlation. +What makes this pattern particularly dangerous in the agentic context is that the system *formally codifies* the assumption that test-only changes are low-risk (intent authorization tier 0), and review agents operate statelessly per-PR. The xz attack succeeded against human reviewers who could in principle have correlated the commits; here, the system architecture structurally prevents that correlation. ### The xz variant: test data as covert payload storage @@ -320,8 +320,8 @@ This means the attack surface includes: ### Defense considerations -- **Distinguish additive from reductive test changes.** A Tier 0 classification for test-only changes should require that the change is *additive* — new test cases, new assertions, increased coverage. Changes that weaken assertions, broaden mocks, reduce coverage, or remove checks should not qualify as Tier 0. See [intent-representation.md](intent-representation.md) for the refined Tier 0 definition. -- **Opaque files are never Tier 0.** Binary blobs, encoded data, and other files that cannot be meaningfully reviewed by agents should not qualify for autonomous merge regardless of where they appear. If an agent cannot reason about a file's content, the file requires human review. +- **Distinguish additive from reductive test changes.** An intent authorization tier 0 classification for test-only changes should require that the change is *additive* — new test cases, new assertions, increased coverage. Changes that weaken assertions, broaden mocks, reduce coverage, or remove checks should not qualify as intent authorization tier 0. See [intent-representation.md](intent-representation.md) for the refined intent authorization tier 0 definition. +- **Opaque files are never intent authorization tier 0.** Binary blobs, encoded data, and other files that cannot be meaningfully reviewed by agents should not qualify for autonomous merge regardless of where they appear. If an agent cannot reason about a file's content, the file requires human review. - **CODEOWNERS coverage for tests on guarded paths.** If production code at a given path is human-owned, its corresponding test files should be too. A test file is part of the security boundary for the code it tests. - **Scrutiny for build definitions.** Tekton pipeline and task definitions (`.tekton/`), Dockerfiles, and build scripts define what runs during the build. Agents may legitimately need to modify these files as part of feature implementation — adding a build step, changing a base image, updating a pipeline to support a new artifact type. Blanket CODEOWNERS on all build files would force human approval on every such change, which may be appropriate for some repos but too restrictive for others. The alternative is relying on review agents to apply heightened scrutiny to build definition changes without CODEOWNERS gating — treating them as security-sensitive context for the Security sub-agent rather than as a hard gate. - **Coverage regression as a merge gate.** Not just "do tests pass" but "does meaningful coverage decrease on security-sensitive paths." A PR that weakens assertions without reducing line coverage is harder to catch, but assertion-density metrics or mutation testing scores can help. diff --git a/docs/problems/testing-agents.md b/docs/problems/testing-agents.md index 9a055be8c..67c75954c 100644 --- a/docs/problems/testing-agents.md +++ b/docs/problems/testing-agents.md @@ -1,10 +1,10 @@ # Testing the Agents -We have CI for code, but no CI for prompts. If someone tweaks the Intent & Coherence sub-agent's instructions, how do we prove it didn't forget how to detect Tier Escalation? +We have CI for code, but no CI for prompts. If someone tweaks the Intent & Coherence sub-agent's instructions, how do we prove it didn't forget how to detect intent authorization tier escalation? ## Why this is a distinct problem -Testing application code is a solved problem with mature tooling: unit tests, integration tests, CI pipelines, coverage reports. But agent instructions — system prompts, CLAUDE.md files, review criteria, escalation rules — are a fundamentally different artifact. They're natural language, not code. Their behavior is probabilistic, not deterministic. And the consequences of a regression can be severe: an agent that silently stops catching tier escalation, or starts rubber-stamping security-sensitive changes, or loses its ability to detect prompt injection. +Testing application code is a solved problem with mature tooling: unit tests, integration tests, CI pipelines, coverage reports. But agent instructions — system prompts, CLAUDE.md files, review criteria, escalation rules — are a fundamentally different artifact. They're natural language, not code. Their behavior is probabilistic, not deterministic. And the consequences of a regression can be severe: an agent that silently stops catching intent authorization tier escalation, or starts rubber-stamping security-sensitive changes, or loses its ability to detect prompt injection. Today, if someone modifies a review agent's instructions, the only verification is human review of the prose change. There is no automated way to confirm the agent still behaves correctly after the modification. This is the equivalent of shipping code changes with no test suite — something we would never accept for application code. @@ -20,7 +20,7 @@ An agent's behavior is the product of its instructions, the model it runs on, th ### Absence detection -The hardest bugs to catch are capabilities that silently disappear. If someone simplifies the Intent & Coherence sub-agent's instructions and removes the paragraph about tier escalation detection, the agent won't error — it will simply stop checking for tier escalation. There's no compile error, no stack trace, no failing import. The capability quietly vanishes, and you only discover it when a tier-gaming attack succeeds. +The hardest bugs to catch are capabilities that silently disappear. If someone simplifies the Intent & Coherence sub-agent's instructions and removes the paragraph about intent authorization tier escalation detection, the agent won't error — it will simply stop checking for intent authorization tier escalation. There's no compile error, no stack trace, no failing import. The capability quietly vanishes, and you only discover it when an intent-authorization-tier-gaming attack succeeds. ### Interaction effects @@ -43,7 +43,7 @@ When someone modifies an agent's system prompt, CLAUDE.md, or configuration: ### Capability coverage -For each agent role described in [agent-architecture.md](agent-architecture.md) and [code-review.md](code-review.md), there's an implicit set of capabilities. The Intent & Coherence sub-agent should detect tier escalation. The Security sub-agent should catch known injection patterns and flag RBAC changes. These capabilities need explicit test coverage. +For each agent role described in [agent-architecture.md](agent-architecture.md) and [code-review.md](code-review.md), there's an implicit set of capabilities. The Intent & Coherence sub-agent should detect intent authorization tier escalation. The Security sub-agent should catch known injection patterns and flag RBAC changes. These capabilities need explicit test coverage. ### Cross-agent composition @@ -63,7 +63,7 @@ Maintain a curated set of test cases — inputs with known-correct outputs — f agent-tests/ intent-coherence/ golden-set/ - tier-escalation-detection.yaml + intent-tier-escalation-detection.yaml scope-mismatch.yaml cross-repo-intent.yaml ... @@ -81,7 +81,7 @@ Each test case specifies an input (a synthetic PR, issue, or diff) and the expec **Pros:** - Concrete, auditable, version-controlled -- Directly tests for known capabilities — if tier escalation detection breaks, the golden-set test for it fails +- Directly tests for known capabilities — if intent authorization tier escalation detection breaks, the golden-set test for it fails - Fast feedback loop compared to production monitoring - Can be run in CI on every instruction change @@ -103,7 +103,7 @@ Define contracts for each agent — formal statements about what the agent must - MUST flag any PR where the linked issue describes a bug fix but the diff adds new API surface - MUST flag any PR that modifies files in more than 3 directories when the linked issue is labeled "bug" -- MUST NOT approve a PR with no linked issue unless the change is classified as Tier 0 +- MUST NOT approve a PR with no linked issue unless the change is classified as intent authorization tier 0 - MUST escalate when the diff scope exceeds what the linked intent file authorizes ### Trade-offs @@ -198,7 +198,7 @@ Under the hood, promptfoo makes direct HTTP calls to model provider APIs. Each t - The pytest integration means eval suites look like normal test suites, which lowers the adoption barrier for teams already testing in Python - Has explicit "Agentic Metrics" (Task Completion, Tool Correctness, Goal Accuracy, Step Efficiency, Plan Adherence) — but these score agent *traces*, they don't run agents -The main gap: deepeval's built-in metrics are oriented toward conversational AI (relevancy, faithfulness to a source document). Evaluating whether a review agent correctly detected tier escalation requires custom metrics — the framework supports this, but the useful metrics would need to be written. +The main gap: deepeval's built-in metrics are oriented toward conversational AI (relevancy, faithfulness to a source document). Evaluating whether a review agent correctly detected intent authorization tier escalation requires custom metrics — the framework supports this, but the useful metrics would need to be written. #### lightspeed-evaluation diff --git a/internal/scaffold/fullsend-repo/skills/code-review/SKILL.md b/internal/scaffold/fullsend-repo/skills/code-review/SKILL.md index a7e05c12a..9a037ff60 100644 --- a/internal/scaffold/fullsend-repo/skills/code-review/SKILL.md +++ b/internal/scaffold/fullsend-repo/skills/code-review/SKILL.md @@ -166,9 +166,9 @@ already stripped the payload. - Does the change trace to a linked issue or authorized feature request? - Does the implementation match what the linked issue describes? -- Is the scope appropriate to the claimed tier (bug fix vs. new - feature)? A change that adds new capability is a feature, not a bug - fix, regardless of how it is labeled. +- Is the scope appropriate to the claimed intent authorization tier + (bug fix vs. new feature)? A change that adds new capability is a + feature, not a bug fix, regardless of how it is labeled. - Does the change go beyond what the linked issue authorized? - Does the change fit the overall design of the module/system? - Is the complexity proportional to the value delivered? diff --git a/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md b/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md index f4a37a30a..4eee2f929 100644 --- a/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md +++ b/internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md @@ -41,7 +41,7 @@ and `description`. |------------------------|--------|------------|--------------------------------------------------------------------------------| | `correctness` | opus | parallel | Logic errors, edge cases, nil handling, API contracts, test adequacy/integrity | | `security` | opus | parallel | Auth, data exposure, privilege escalation, injection defense, content security | -| `intent-coherence` | sonnet | parallel | Authorization, scope, tier matching, architectural fit, design coherence | +| `intent-coherence` | sonnet | parallel | Authorization, scope, intent authorization tier matching, architectural fit, design coherence | | `style-conventions` | sonnet | parallel | Naming, error handling idioms, API shape, code organization | | `docs-currency` | sonnet | parallel | Documentation staleness (follows docs-review skill inline) | | `cross-repo-contracts` | sonnet | parallel | API contract breakage affecting other repos (conditional) | diff --git a/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/intent-coherence.md b/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/intent-coherence.md index c80793fc2..26c10d201 100644 --- a/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/intent-coherence.md +++ b/internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/intent-coherence.md @@ -10,7 +10,7 @@ You are a staff engineer reviewing for intent alignment and architectural coherence. **Own:** Whether the change traces to authorized work (linked issue), -whether its scope matches the claimed tier (bug fix vs. feature), scope +whether its scope matches the claimed intent authorization tier (bug fix vs. feature), scope creep beyond the issue's authorization, whether the design fits the project's documented architecture (CLAUDE.md, ADRs, AGENTS.md), and whether naming/abstraction choices align with existing project trajectory.