aaif-goose
diff --git a/‎.agents/skills/code-review/SKILL.md‎
Lines changed: 25 additions & 8 deletions b/‎.agents/skills/code-review/SKILL.md‎
Lines changed: 25 additions & 8 deletions
diff --git a/‎.github/workflows/bundle-goose2.yml‎
Lines changed: 81 additions & 9 deletions b/‎.github/workflows/bundle-goose2.yml‎
Lines changed: 81 additions & 9 deletions
diff --git a/‎.github/workflows/pr-smoke-test.yml‎
Lines changed: 16 additions & 8 deletions b/‎.github/workflows/pr-smoke-test.yml‎
Lines changed: 16 additions & 8 deletions
diff --git a/‎.github/workflows/pr-website-preview.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/pr-website-preview.yml‎
Lines changed: 1 addition & 1 deletion
@@ -92,6 +92,16 @@ You are a senior engineer conducting a thorough code review. Review **only the l
 - **AnimatePresence**: Is it used properly with unique keys for dialog/modal transitions?
 - **Reduced Motion**: Is `useReducedMotion()` respected for accessibility?
 
+### Async State, Defaults & Persistence
+- **Async Source of Truth**: During async provider/model/session mutations, does UI/session/localStorage state update only after the backend accepts the change? If the UI updates optimistically, is there an explicit rollback path?
+- **UI/Backend Drift**: Could the UI show provider/model/project/persona X while the backend is still on Y after a failed mutation, delayed prepare, or pending-to-real session handoff?
+- **Requested vs Fallback Authority**: Do explicit user or caller selections stay authoritative over sticky defaults, saved preferences, aliases, or fallback resolution?
+- **Dependent State Invalidation**: When a parent selection changes (provider/project/persona/workspace/etc.), are dependent values like `modelId`, `modelName`, defaults, or cached labels cleared or recomputed so stale state does not linger?
+- **Persisted Preference Validation**: Are stored selections validated against current inventory/capabilities before reuse, and do stale values fail soft instead of breaking creation flows?
+- **Compatibility of Fallbacks**: Are default or sticky selections guaranteed to remain compatible with the active concrete provider/backend, instead of leaking across providers?
+- **Best-Effort Lookups**: Do inventory/config/default-resolution lookups degrade gracefully on transient failure, or can they incorrectly block a primary flow that should still work with a safe fallback?
+- **Draft/Home/Handoff Paths**: If the product has draft, Home, pending, or pre-created sessions, did you review those handoff paths separately from the already-active session path?
+
 ### General Code Quality
 - **Error Handling**: Are errors handled gracefully with user-friendly messages?
 - **Loading States**: Are loading states shown during async operations?
@@ -104,13 +114,18 @@ You are a senior engineer conducting a thorough code review. Review **only the l
 
 ### Step 0: Run Quality Checks
 
-Before reading any code, run the project's CI gate to establish a baseline:
+Before reading any code, run the project's CI gate to establish a baseline. Use **check-only** commands so the baseline never mutates the working tree — otherwise auto-formatters can introduce unstaged diffs and you'll end up reviewing formatter output instead of the author's actual changes.
+
+Avoid `just check-everything` as the baseline in this repo: that recipe runs `cargo fmt --all` in write mode and will modify the working tree. Run the non-mutating equivalents instead:
 
 ```bash
-just ci
+cargo fmt --all -- --check
+cargo clippy --all-targets -- -D warnings
+(cd ui/desktop && pnpm run lint:check)
+./scripts/check-openapi-schema.sh
 ```
 
-This runs: `pnpm check` (Biome lint/format + file sizes), `pnpm typecheck`, `just clippy` (Rust linting), `pnpm test`, `pnpm build`, and `just tauri-check` (Rust type checking).
+If the project has a stronger pre-push or CI gate than this helper set, run that fuller gate when the review is meant to be PR-ready, but only after confirming it is also non-mutating (or run it from a clean stash). In this repo, targeted tests for the changed area plus the pre-push checks are often the practical follow-up.
 
 Report the results as pass/fail. Any failures are automatically **P0** issues and should appear at the top of the findings list. Do not skip this step even if the user only wants a quick review.
 
@@ -120,7 +135,8 @@ For each file in the list:
 
 1. Run `git diff main...HEAD -- <file>` to get the exact lines that changed
 2. Review **only those changed lines** against the Review Checklist — do not flag issues in unchanged code
-3. Note the file path and line numbers from the diff output for each issue found
+3. For stateful UI or async flow changes, trace the full path end to end: user selection -> local/session state update -> persistence -> backend prepare/set/update call -> failure/rollback path
+4. Note the file path and line numbers from the diff output for each issue found
 
 ### Step 2: Categorize Issues
 
@@ -152,16 +168,17 @@ After reviewing all files, provide:
 
 ### Step 3b: Self-Check
 
-Before presenting findings to the user, silently review the issue list two more times:
+Before presenting findings to the user, silently review the issue list three times:
 
 1. **Pass 1**: For each issue, ask — is this genuinely a problem, or could it be intentional/acceptable? Remove false positives.
 2. **Pass 2**: For each remaining issue, ask — does the recommended fix actually improve the code, or is it a matter of preference?
+3. **Pass 3**: For async state/default-resolution issues, ask — can the UI, persisted state, and backend ever disagree after a failure, fallback, or session handoff?
 
-After both passes, tag each surviving issue as one of:
+After these passes, tag each surviving issue as one of:
 - **[Must Fix]** — clear violation, will likely get flagged in PR review
 - **[Your Call]** — valid concern but may be intentional or a reasonable tradeoff (e.g. stepping outside the design system for a specific reason). Present it but let the user decide.
 
-Only present issues that survived both passes.
+Only present issues that survived these passes.
 
 ### Step 4: Fix Issues
 
@@ -189,7 +206,7 @@ Once all issues are fixed, display:
 
 **✅ Code review complete! All issues have been addressed.**
 
-Your code is ready to commit and push. Lefthook will run the full CI gate (`just ci`) automatically when you push.
+Your code is ready to commit and push. Lefthook and CI will run the repo's configured gates when you push.
 
 Next steps: generate a PR summary that explains the intent of this change, what files were modified and why, and how to verify the changes work.
 
 
@@ -38,6 +38,11 @@ on:
         required: false
         default: ""
         type: string
+      windows-signing:
+        description: "Whether to perform Windows signing via Azure Trusted Signing"
+        required: false
+        default: false
+        type: boolean
       cli-run-id:
         description: >
           Run ID of a prior build-cli.yml workflow run to download the goose
@@ -125,7 +130,7 @@ jobs:
 
       - name: Cache Rust dependencies
         if: inputs.cli-run-id == ''
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
         with:
           key: goose2-macos-arm64
 
@@ -175,13 +180,11 @@ jobs:
           certificate-password: ${{ secrets.APPLE_CERTIFICATE_PASSWORD }}
 
       # ── Tauri bundle ──
-      - name: Check disk space before bundle
-        run: df -h
-
       - name: Bundle Goose 2 (pnpm tauri build)
         env:
+          APPLE_SIGNING_IDENTITY: ${{ inputs.signing && 'Developer ID Application' || '' }}
           APPLE_ID: ${{ inputs.signing && secrets.APPLE_ID || '' }}
-          APPLE_ID_PASSWORD: ${{ inputs.signing && secrets.APPLE_ID_PASSWORD || '' }}
+          APPLE_PASSWORD: ${{ inputs.signing && secrets.APPLE_ID_PASSWORD || '' }}
           APPLE_TEAM_ID: ${{ inputs.signing && secrets.APPLE_TEAM_ID || '' }}
         working-directory: ui/goose2
         run: |
@@ -291,7 +294,7 @@ jobs:
 
       - name: Cache Rust dependencies
         if: inputs.cli-run-id == ''
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
         with:
           key: goose2-macos-x86_64
 
@@ -360,8 +363,9 @@ jobs:
       # ── Tauri bundle (cross-compile for Intel) ──
       - name: Bundle Goose 2 for Intel
         env:
+          APPLE_SIGNING_IDENTITY: ${{ inputs.signing && 'Developer ID Application' || '' }}
           APPLE_ID: ${{ inputs.signing && secrets.APPLE_ID || '' }}
-          APPLE_ID_PASSWORD: ${{ inputs.signing && secrets.APPLE_ID_PASSWORD || '' }}
+          APPLE_PASSWORD: ${{ inputs.signing && secrets.APPLE_ID_PASSWORD || '' }}
           APPLE_TEAM_ID: ${{ inputs.signing && secrets.APPLE_TEAM_ID || '' }}
         working-directory: ui/goose2
         run: |
@@ -477,7 +481,7 @@ jobs:
 
       - name: Cache Rust dependencies
         if: inputs.cli-run-id == ''
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
         with:
           key: goose2-linux-x86_64
 
@@ -564,6 +568,7 @@ jobs:
     runs-on: windows-latest
     timeout-minutes: 60
     permissions:
+      id-token: write
       contents: read
       actions: read
     steps:
@@ -621,7 +626,7 @@ jobs:
 
       - name: Cache Rust dependencies
         if: inputs.cli-run-id == ''
-        uses: Swatinem/rust-cache@v2
+        uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
         with:
           key: goose2-windows-x86_64
 
@@ -697,3 +702,70 @@ jobs:
           name: Goose2-windows-x64-msi
           path: ui/goose2/src-tauri/target/x86_64-pc-windows-msvc/release/bundle/msi/*.msi
           if-no-files-found: warn
+
+  sign-windows:
+    name: "Sign Windows installers"
+    needs: bundle-windows
+    if: inputs.windows-signing
+    runs-on: windows-latest
+    environment: signing
+    permissions:
+      id-token: write
+      contents: read
+      actions: read
+    steps:
+      - name: Download NSIS installer
+        uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7
+        with:
+          name: Goose2-windows-x64-nsis
+          path: unsigned/nsis
+
+      - name: Download MSI installer
+        uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7
+        with:
+          name: Goose2-windows-x64-msi
+          path: unsigned/msi
+
+      - name: Azure login
+        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2
+        with:
+          client-id: ${{ secrets.AZURE_CLIENT_ID }}
+          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
+          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
+
+      - name: Sign Windows installers with Azure Trusted Signing
+        uses: azure/trusted-signing-action@db7a3a6bd3912025c705162fb7475389f5b69ec6 # v1
+        with:
+          endpoint: ${{ secrets.AZURE_SIGNING_ENDPOINT }}
+          trusted-signing-account-name: ${{ secrets.AZURE_SIGNING_ACCOUNT_NAME }}
+          certificate-profile-name: ${{ secrets.AZURE_CERTIFICATE_PROFILE_NAME }}
+          files-folder: ${{ github.workspace }}/unsigned
+          files-folder-filter: exe,msi
+          files-folder-recurse: true
+
+      - name: Verify signed installers
+        shell: pwsh
+        run: |
+          $files = Get-ChildItem -Path unsigned -Recurse -Include *.exe,*.msi
+          foreach ($file in $files) {
+            Write-Output "Verifying signature: $($file.FullName)"
+            $sig = Get-AuthenticodeSignature $file.FullName
+            if ($sig.Status -ne "Valid") {
+              throw "Signature invalid for $($file.Name): $($sig.Status)"
+            }
+            Write-Output "✅ Signature valid: $($file.Name)"
+          }
+
+      - name: Upload signed NSIS installer
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6
+        with:
+          name: Goose2-windows-x64-nsis-signed
+          path: unsigned/nsis/*.exe
+          if-no-files-found: error
+
+      - name: Upload signed MSI installer
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6
+        with:
+          name: Goose2-windows-x64-msi-signed
+          path: unsigned/msi/*.msi
+          if-no-files-found: error
@@ -108,9 +108,13 @@ jobs:
           node-version: '22'
 
       - name: Install agentic providers
-        run: npm install -g @anthropic-ai/claude-code @openai/codex @google/gemini-cli @zed-industries/claude-agent-acp @zed-industries/codex-acp
+        run: npm install -g @anthropic-ai/claude-code @zed-industries/claude-agent-acp @zed-industries/codex-acp
 
-      - name: Run Smoke Tests with Provider Script
+      - name: Install Node.js Dependencies
+        run: source ../../bin/activate-hermit && pnpm install --frozen-lockfile
+        working-directory: ui/desktop
+
+      - name: Run Smoke Tests (Normal Mode)
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
@@ -127,12 +131,10 @@ jobs:
           SKIP_BUILD: 1
           SKIP_PROVIDERS: ${{ vars.SKIP_PROVIDERS || '' }}
         run: |
-          # Ensure the HOME directory structure exists
           mkdir -p $HOME/.local/share/goose/sessions
           mkdir -p $HOME/.config/goose
-
-          # Run the provider test script (binary already built and downloaded)
-          bash scripts/test_providers.sh
+          source ../../bin/activate-hermit && pnpm run test:integration:providers
+        working-directory: ui/desktop
 
       - name: Set up Python
         uses: actions/setup-python@v5
@@ -188,6 +190,10 @@ jobs:
       - name: Make Binary Executable
         run: chmod +x target/debug/goose
 
+      - name: Install Node.js Dependencies
+        run: source ../../bin/activate-hermit && pnpm install --frozen-lockfile
+        working-directory: ui/desktop
+
       - name: Run Provider Tests (Code Execution Mode)
         env:
           ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
@@ -205,7 +211,8 @@ jobs:
         run: |
           mkdir -p $HOME/.local/share/goose/sessions
           mkdir -p $HOME/.config/goose
-          bash scripts/test_providers_code_exec.sh
+          source ../../bin/activate-hermit && pnpm run test:integration:providers-code-exec
+        working-directory: ui/desktop
 
   compaction-tests:
     name: Compaction Tests
@@ -277,7 +284,8 @@ jobs:
           GOOSE_PROVIDER: anthropic
           GOOSE_MODEL: claude-sonnet-4-5-20250929
           SHELL: /bin/bash
+          SKIP_BUILD: 1
         run: |
             echo 'export PATH=/some/fake/path:$PATH' >> $HOME/.bash_profile
-            source ../../bin/activate-hermit && pnpm run test:integration:debug
+            source ../../bin/activate-hermit && pnpm run test:integration:goosed
         working-directory: ui/desktop
@@ -51,7 +51,7 @@ jobs:
   cleanup:
     runs-on: ubuntu-latest
     needs: deploy
-    if: github.event.action == 'closed'
+    if: github.event.action == 'closed' && github.event.pull_request.head.repo.full_name == 'aaif-goose/goose'
     permissions:
       contents: write
     steps: