Skip to content

Commit ce784b4

Browse files
authored
Merge pull request #2736 from ggallen/worktree-git-based-directory-fetch
feat(#2735): replace GitHub Contents API with git sparse checkout
2 parents 7086a61 + fc0d479 commit ce784b4

22 files changed

Lines changed: 1372 additions & 424 deletions

action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,10 +362,10 @@ runs:
362362
if: inputs.agent != '__install_only__'
363363
shell: bash
364364
env:
365-
GH_TOKEN: ${{ inputs.github_token }}
366365
AGENT: ${{ inputs.agent }}
367366
FULLSEND_DIR: ${{ inputs.fullsend-dir }}
368367
TARGET_REPO: ${{ inputs.target-repo }}
368+
GH_TOKEN: ${{ inputs.github_token }}
369369
STATUS_RUN_URL: ${{ inputs.run-url }}
370370
STATUS_REPO: ${{ inputs.status-repo }}
371371
STATUS_NUMBER: ${{ inputs.status-number }}

docs/ADRs/0005-forge-abstraction-layer.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,4 @@ No code outside `internal/forge/` imports forge-specific packages directly.
3636
- The `FakeClient` enables deterministic testing of every layer without network calls.
3737
- Sentinel errors (`ErrNotFound`, `ErrBranchProtected`, `ErrAlreadyExists`) with `errors.Is()` helpers provide forge-agnostic error classification. `ErrNotFound` and `ErrAlreadyExists` are mapped in `APIError.Unwrap()` for automatic propagation. `ErrBranchProtected` is wrapped contextually at the call site (e.g., `commitFilesTo`) where the operation context disambiguates branch-protection 422s from other validation failures.
3838
- `CommitFilesToBranch` complements `CommitFiles` (default branch) by targeting a specific branch, enabling the protected-branch fallback path where scaffold files are committed to a feature branch and delivered via PR.
39+
- Git-protocol operations (e.g., `gitfetch.FetchTree` for skill directory fetching) are intentionally outside `forge.Client` scope. These use forge-agnostic git commands (sparse checkout, shallow clone) that work identically across GitHub, GitLab, and Forgejo without per-forge implementation.

docs/ADRs/0038-universal-harness-access.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,14 @@ harnesses:
353353

354354
**Current recommendation:** Use commit-pinned URLs for GitHub-hosted resources. For single-file resources (agents, policies), use `raw.githubusercontent.com` URLs (e.g., `https://raw.githubusercontent.com/fullsend-ai/library/8cd3799.../agents/code.md#sha256=...`). For directory resources (skills), use `github.com/.../tree/...` URLs (e.g., `https://github.com/fullsend-ai/library/tree/8cd3799.../skills/rust#sha256=<tree-hash>...`). The commit SHA in the URL path provides immutability at the URL level, and the `#sha256=...` fragment provides content integrity.
355355

356+
#### 8. Git subprocess vs Go git library for directory fetching
357+
358+
**Decision:** Use the `git` CLI as a subprocess (`os/exec`) rather than a Go git library (e.g., go-git) for skill directory fetching.
359+
360+
**Rationale:** The key optimization is `--filter=blob:none` partial clone combined with `--depth 1` shallow fetch and sparse checkout — this avoids downloading blobs outside the target path. go-git's sparse checkout and partial clone support is limited. Additionally, `git` is already required in the runtime environment (sandbox bootstrap uses it), so this adds zero new dependencies. Auth via `GIT_CONFIG_COUNT` env vars works identically across all forges without per-forge setup code.
361+
362+
**Trade-off:** Subprocess execution requires temp-dir I/O and is slightly slower than an in-memory implementation. For the current use case (single-digit files per skill directory, <1s typical), this is acceptable. An in-memory library could be revisited if performance becomes a bottleneck.
363+
356364
## Related Work
357365

358366
This pattern is well-established in other ecosystems:
@@ -366,3 +374,13 @@ The proposed model follows the GitHub Actions approach: URL-based references wit
366374
## Implementation Plan
367375

368376
See `docs/plans/universal-harness-access.md` for full implementation details, security analysis, and migration path. See `docs/plans/universal-harness-access-phase1.md` for the phased PR breakdown (Phase 1 MVP), `docs/plans/universal-harness-access-phase2.md` for Phase 2 (transitive dependency resolution), `docs/plans/universal-harness-access-phase3.md` for Phase 3 (lock files), and `docs/plans/universal-harness-access-phase4.md` for Phase 4 (runtime dependency loading).
377+
378+
## Amendments
379+
380+
### 2026-06-30: Git sparse checkout replaces forge APIs for skill directory fetching (#2735)
381+
382+
Skill directory fetching now uses git sparse checkout (`internal/gitfetch/gitfetch.go`) instead of forge-specific REST APIs (`ListDirectoryContents` / `GetFileContentAtRef`). This change affects Decision sections that reference "forge APIs" for skill resolution — the implementation now uses `gitfetch.FetchTree` with `--filter=blob:none --depth 1` partial clone and sparse checkout, which is forge-agnostic (works identically across GitHub, GitLab, and Forgejo without per-forge implementation). See resolved design question 8 above for the git subprocess vs go-git rationale.
383+
384+
**Stale cache fallback:** When a cached skill directory exists but re-fetch fails due to a transient network error (connection refused, DNS failure, timeout, context deadline exceeded), the runner falls back to the stale cached content and attaches a warning to the dependency record. Non-transient errors (authentication failures, integrity mismatches) still propagate as hard errors.
385+
386+
**Token model change:** Token resolution failure is no longer a hard error. When no git token is available, the runner warns and proceeds — public repos fetch without authentication, private repos fail at the git layer with an actionable hint message. This eliminates the chicken-and-egg token problem described in issue #2722.

docs/plans/adr-0045-forge-portable-harness-phase3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Phase 3 completes the "Deprecate" milestone from the ADR migration path. Specifi
88

99
1. **`Lint()` diagnostic method warns on missing `role`** — today `Validate()` returns hard errors only. Phase 3 adds a separate `Lint()` method that returns non-fatal diagnostics (warnings), starting with "role is not set; it will be required in a future version." This keeps `Validate()` callers (which treat all errors as hard stops) unaffected.
1010

11-
2. **Consumers migrate to harness-first discovery** — today `loadKnownSlugs()`, `runUninstall`, and `runGitHubUninstall` read agent identity exclusively from `config.yaml`'s `agents:` block. Phase 3 adds remote harness discovery via `forge.Client.ListDirectoryContents` + `GetFileContentAtRef`, and migrates these consumers to check harness files first, falling back to the `agents:` block.
11+
2. **Consumers migrate to harness-first discovery** — today `loadKnownSlugs()`, `runUninstall`, and `runGitHubUninstall` read agent identity exclusively from `config.yaml`'s `agents:` block. Phase 3 adds remote harness discovery via `forge.Client.ListDirectoryContents` + `GetFileContentAtRef` (used for harness wrapper discovery in config repos, distinct from skill directory fetching which uses `gitfetch.FetchTree`), and migrates these consumers to check harness files first, falling back to the `agents:` block.
1212

1313
3. **`OrgConfig.Agents` becomes optional** — the `Agents` field gains `omitempty` so config.yaml can omit the `agents:` block. When present during load, a deprecation notice is logged. The dual-write during install continues (Phase 4 stops it).
1414

docs/plans/universal-harness-access-phase1.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,15 +143,15 @@ PRs 1, 2, 4, and 6 have no dependencies and can be developed/merged in parallel.
143143

144144
**Create `internal/resolve/resolve.go`:**
145145
- `Dependency` struct: URL, LocalPath (cache path), SHA256, FetchedAt, CacheHit, Type (`"file"` or `"directory"`)
146-
- `ResolveOpts` struct: WorkspaceRoot, FetchPolicy, TraceID, AuditLogPath, ForgeClient (`forge.Client` for skill directory resolution)
146+
- `ResolveOpts` struct: WorkspaceRoot, FetchPolicy, TraceID, AuditLogPath, TreeFetcher (`gitfetch.TreeFetchFunc` for skill directory resolution), GitToken
147147
- `ResolveHarness(ctx, h *harness.Harness, opts) ([]Dependency, error)`:
148148
- Modifies the harness in place, replacing URL fields with local cache paths
149149
- For each declarative field (Agent, Policy):
150150
- Local path: return as-is
151151
- URL: extract/require integrity hash → validate against `AllowedRemoteResources` → check cache (with re-verification) → if miss and not offline: `fetch.FetchURL` → verify hash → `CachePut``AppendFetchAudit` → return cache `content` path
152152
- For Skills (directory resources):
153153
- Local path: return as-is
154-
- URL: extract/require integrity hash → validate against `AllowedRemoteResources` → use `ParseForgeURL` to extract forge components (owner, repo, path, ref) → check directory cache via `CacheGetDir` (with re-verification) → if miss and not offline: call `ForgeClient.ListDirectoryContents` to discover files, fetch each file with `ForgeClient.GetFileContentAtRef`, reconstruct directory tree, verify tree hash, store via `CachePutDir``AppendFetchAudit` → return cache `tree/` path
154+
- URL: extract/require integrity hash → validate against `AllowedRemoteResources` → use `ParseForgeURL` to extract forge components (owner, repo, path, ref) → check directory cache via `CacheGetDir` (with re-verification) → if miss and not offline: call `TreeFetcher` (git sparse checkout via `gitfetch.FetchTree`) to fetch all files, verify tree hash, store via `CachePutDir``AppendFetchAudit` → return cache `tree/` path
155155
- Non-forge HTTPS URLs for skills are rejected with error: "skill URLs must use a supported forge (GitHub, GitLab)"
156156
- Single-level resolution; transitive deps added in Phase 2, security scanning deferred
157157

docs/plans/universal-harness-access-phase2.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ New test cases (in addition to existing Phase 1 tests, which remain unchanged):
244244
- **Transitive dependency not in allowlist:** Skill A depends on Skill B at a URL outside `allowed_remote_resources`. Verify error contains "not in allowed_remote_resources".
245245
- **Transitive dependency hash mismatch:** Skill A depends on Skill B; Skill B's content doesn't match its declared hash. Verify error contains "integrity check failed".
246246
- **Mixed local and transitive:** Harness with local skills and one URL skill that has transitive deps. Verify local skills are untouched, URL skill and its transitive deps are all resolved.
247-
- **Relative URL in dependency:** Skill directory at `https://github.com/org/skills/tree/abc123/rust` declares dependency `../common/formatting#sha256=<tree-hash>...`. Verify resolved to `https://github.com/org/skills/tree/abc123/common/formatting` and fetched as a directory via forge API.
247+
- **Relative URL in dependency:** Skill directory at `https://github.com/org/skills/tree/abc123/rust` declares dependency `../common/formatting#sha256=<tree-hash>...`. Verify resolved to `https://github.com/org/skills/tree/abc123/common/formatting` and fetched as a directory via git sparse checkout (`gitfetch.FetchTree`).
248248

249249
**Depends on:** PR 1 (imports `internal/skill`)
250250

@@ -350,10 +350,10 @@ dependencies:
350350
```
351351

352352
The resolver will:
353-
1. Fetch `rust-conventions` skill directory via forge API (list files, fetch each), verify tree hash, cache under `tree/`.
353+
1. Fetch `rust-conventions` skill directory via git sparse checkout (`gitfetch.FetchTree`), verify tree hash, cache under `tree/`.
354354
2. Read `SKILL.md` from the cached `tree/` subdirectory, parse its frontmatter, discover 2 transitive skill dependencies.
355355
3. Resolve `../cargo-integration` relative to the parent URL (sibling directory).
356-
4. Fetch and cache both transitive skill directories (each via forge API with tree hash verification and allowlist checks).
356+
4. Fetch and cache both transitive skill directories (each via git sparse checkout with tree hash verification and allowlist checks).
357357
5. Append all resolved cache `tree/` paths to `h.Skills`.
358358
6. The sandbox upload loop uploads all skill directory trees.
359359

docs/plans/universal-harness-access-phase4.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The `fullsend fetch-skill` subcommand reuses the existing fullsend binary alread
2626
1. Agent calls: `fullsend fetch-skill https://github.com/fullsend-ai/library/tree/abc123/skills/python-linting#sha256=<tree-hash>...`
2727
2. Subcommand sends HTTP request to runner via `FULLSEND_FETCH_URL` with bearer token from `FULLSEND_FETCH_TOKEN`
2828
3. Runner validates URL against `allowed_remote_resources`
29-
4. Runner uses forge API to list and fetch the skill directory (skills are directories, requiring `ListDirectoryContents` and `GetFileContentAtRef`)
29+
4. Runner uses `gitfetch.FetchTree` (git sparse checkout) to fetch the skill directory
3030
5. Runner verifies tree hash (hash covers entire directory tree)
3131
6. Runner stores in cache via `CachePutDir` and uploads directory tree to sandbox
3232
7. Subcommand returns the sandbox-local skill directory path
@@ -37,7 +37,7 @@ The `fullsend fetch-skill` subcommand reuses the existing fullsend binary alread
3737
- All URLs must match `allowed_remote_resources` prefixes
3838
- Integrity hash required on all URLs (tree hash for skill directories)
3939
- Rate limited: `max_runtime_fetches` (default 10) per agent run
40-
- Skills are directories -- requires forge API access (same as static resolution)
40+
- Skills are directories -- fetched via git sparse checkout (same as static resolution)
4141
- Non-forge HTTPS URLs are rejected for skills (no HTTP directory listing standard)
4242
- All fetched skills pass security scanning pipeline
4343
- Audit log records all runtime fetches with `fetch_type: "runtime"`
@@ -48,7 +48,7 @@ The `fullsend fetch-skill` subcommand reuses the existing fullsend binary alread
4848
- HTTP service implementing `fetchsvc.Service` with `ServeHTTP` handler
4949
- Request/response protocol: URL -> local path or error
5050
- Rate limiting enforcement via atomic counter
51-
- Forge API integration for skill directory fetching (reuses Phase 1 forge client)
51+
- Git sparse checkout integration for skill directory fetching (reuses `gitfetch.FetchTree`)
5252
- Audit logging with `fetch_type: "runtime"`
5353
5454
#### PR 2: In-sandbox fetch subcommand (#2223)

docs/plans/universal-harness-access.md

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Resolution logic (`internal/harness/harness.go`):
4444
- All paths must resolve within the `.fullsend` directory tree
4545
- No network fetches; all resources must exist locally
4646

47-
Skills are directories containing `SKILL.md` plus optional companion files (`scripts/`, `sub-agents/`, `assets/`). The entire directory tree is uploaded to the sandbox. When referenced via URL, skills require forge API access (e.g., GitHub Contents API) to discover and fetch all files in the directory. Policies are OpenShell YAML files. Agent definitions are Markdown files with YAML frontmatter.
47+
Skills are directories containing `SKILL.md` plus optional companion files (`scripts/`, `sub-agents/`, `assets/`). The entire directory tree is uploaded to the sandbox. When referenced via URL, skills are fetched via git sparse checkout (`gitfetch.FetchTree`) to retrieve all files in the directory. Policies are OpenShell YAML files. Agent definitions are Markdown files with YAML frontmatter.
4848

4949
## Proposed Design
5050

@@ -74,7 +74,7 @@ pre_script: scripts/pre-code.sh # scripts must be local (security)
7474
|---------------|----------------|-----------|
7575
| Agent definition (`.md`) | ✅ Yes | Declarative; validated by schema |
7676
| Policy (`.yaml`) | ✅ Yes | Declarative; validated by schema |
77-
| Skill (directory) | ✅ Yes (forge only) | Directory uploaded as tree; requires forge API |
77+
| Skill (directory) | ✅ Yes (forge only) | Directory uploaded as tree; fetched via git sparse checkout |
7878
| Schema (`.json`) | ✅ Yes | Declarative; validated before use |
7979
| Pre/post scripts (`.sh`) | ❌ No | Executable on host; must be local |
8080
| Host files (certs, env) | ❌ No | Configuration; must be local |
@@ -306,7 +306,7 @@ Resolution algorithm:
306306
2. For each reference:
307307
- If local path, validate it exists
308308
- If URL for a single-file resource (agent, policy): fetch via HTTPS, cache as `content` file
309-
- If URL for a directory resource (skill): use forge API (`ListDirectoryContents`, `GetFileContentAtRef`) to discover and fetch all files, cache as `tree/` directory, verify tree hash
309+
- If URL for a directory resource (skill): use `gitfetch.FetchTree` (git sparse checkout) to fetch all files, cache as `tree/` directory, verify tree hash
310310
- Non-forge HTTPS URLs for skills are rejected (HTTP has no standard directory listing)
311311
3. Parse fetched resources to extract their references
312312
4. Repeat step 2 for new references (depth-first traversal)
@@ -380,7 +380,7 @@ allow_runtime_fetch: true
380380
max_runtime_fetches: 10
381381
```
382382

383-
During execution, the agent can fetch `https://github.com/fullsend-ai/library/tree/8cd3799.../skills/python-linting#sha256=<tree-hash>...` because it matches an allowed prefix. The runner uses the forge API to list and fetch the skill directory, validates the tree hash, and caches it.
383+
During execution, the agent can fetch `https://github.com/fullsend-ai/library/tree/8cd3799.../skills/python-linting#sha256=<tree-hash>...` because it matches an allowed prefix. The runner uses git sparse checkout to fetch the skill directory, validates the tree hash, and caches it.
384384

385385
**Audit:** All fetches (static and runtime) are logged:
386386

@@ -998,33 +998,24 @@ func ComputeTreeHash(files map[string][]byte) string {
998998
}
999999
```
10001000

1001-
### 4a. Forge Interface Extension for Skill Directories
1001+
### 4a. Git-Based Skill Directory Fetching
10021002

1003-
**File:** `internal/forge/forge.go` (additions to the forge.Client interface)
1003+
**File:** `internal/gitfetch/gitfetch.go`
10041004

1005-
Skills are directories, not single files. To fetch a skill from a forge URL, the resolver must list the directory contents and fetch each file. This requires forge API support.
1005+
Skills are directories, not single files. To fetch a skill from a forge URL, the resolver uses git sparse checkout via `gitfetch.FetchTree`, which is forge-agnostic and works with any git hosting platform.
10061006

10071007
```go
1008-
// ListDirectoryContents returns the list of files in a directory at a given ref.
1009-
// For GitHub, this uses the Trees API or Contents API.
1010-
ListDirectoryContents(ctx context.Context, owner, repo, path, ref string) ([]FileEntry, error)
1008+
// TreeFetchFunc fetches all files under path in a repository at ref.
1009+
// Returns map[relativePath]content. Token is optional.
1010+
type TreeFetchFunc func(ctx context.Context, cloneURL, path, ref, token string) (map[string][]byte, error)
10111011

1012-
// GetFileContentAtRef fetches a single file's content at a specific ref.
1013-
// For GitHub, this uses the Contents API with a ref parameter.
1014-
GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) ([]byte, error)
1012+
// FetchTree is the default TreeFetchFunc implementation using git sparse checkout.
1013+
func FetchTree(ctx context.Context, cloneURL, path, ref, token string) (map[string][]byte, error)
10151014
```
10161015

1017-
```go
1018-
type FileEntry struct {
1019-
Path string // relative path within the directory
1020-
SHA string // git blob SHA
1021-
Size int64
1022-
}
1023-
```
1024-
1025-
**File:** `internal/forge/github/directory.go` (new)
1016+
**File:** `internal/gitfetch/gitfetch.go`
10261017

1027-
GitHub implementation using the Contents API (`GET /repos/{owner}/{repo}/contents/{path}?ref={ref}`) for directory listing and file retrieval.
1018+
Skill directory fetching uses git sparse checkout (`gitfetch.FetchTree`) rather than forge-specific REST APIs, providing a forge-agnostic implementation.
10281019

10291020
**File:** `internal/fetch/forgeurl.go` (new)
10301021

@@ -1034,7 +1025,7 @@ GitHub implementation using the Contents API (`GET /repos/{owner}/{repo}/content
10341025
func ParseForgeURL(rawURL string) (host, owner, repo, path, ref string, err error)
10351026
```
10361027

1037-
**Why non-forge HTTPS URLs are rejected for skills:** HTTP has no standard mechanism for listing directory contents. A URL like `https://example.com/skills/rust/` might serve an HTML index page, but there is no reliable way to discover all files in the directory. Forge APIs (GitHub Contents API, GitLab Repository Files API) provide structured directory listings. Skills from non-forge URLs are rejected at validation time with a clear error message.
1028+
**Why non-forge HTTPS URLs are rejected for skills:** HTTP has no standard mechanism for listing directory contents. A URL like `https://example.com/skills/rust/` might serve an HTML index page, but there is no reliable way to discover all files in the directory. Forge-hosted repositories support git sparse checkout for efficient directory fetching. Skills from non-forge URLs are rejected at validation time with a clear error message.
10381029

10391030
### 5. Dependency Resolver
10401031

0 commit comments

Comments
 (0)