Skip to content

Commit 7312f00

Browse files
chore(pi): minor
1 parent 80bfcf3 commit 7312f00

1 file changed

Lines changed: 23 additions & 16 deletions

File tree

pi/.pi/agent/extensions/web-fetch/README.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ A `web_fetch` tool fetches content from a URL and converts it into a clean, LLM-
55
What makes it more than a plain HTTP client:
66

77
- **HTML → Markdown** conversion by default (with `text` and `html` alternatives).
8-
- **GitHub-aware extraction**`github.com` URLs return structured repository content (file trees, README, file text) instead of raw HTML, with a local shallow clone the agent can explore further via `read`/`bash`.
9-
- **Security hardening** — HTTPS upgrade, SSRF protection (blocks private hosts), cross-host redirect detection, size guards, and timeouts.
8+
- **GitHub-aware extraction**`github.com` URLs return structured repository content (file trees, README, file text)
9+
instead of raw HTML, with a local shallow clone the agent can explore further via `read`/`bash`.
10+
- **Security hardening** — HTTPS upgrade, SSRF protection (blocks private hosts), cross-host redirect detection, size
11+
guards, and timeouts.
1012
- **Actionable errors** — failure messages include hints so the model can retry intelligently.
1113

1214
## Architecture
@@ -58,19 +60,20 @@ flowchart TD
5860

5961
## Components
6062

61-
| File | Role |
62-
|------|------|
63-
| `index.ts` | Tool definition, registration, and request dispatch. Routes GitHub URLs to the GitHub extractor and everything else to the HTTP fetcher; renders results in the TUI. |
64-
| `types.ts` | Shared types (`FetchResult`, `FetchParams`, `FetchError`, `GitHubUrlInfo`, `GitHubCloneConfig`) and constants (timeouts, size limits, GitHub defaults). |
65-
| `fetcher.ts` | Pure HTTP transport: `fetchUrl()` handles URL normalization, SSRF protection, redirects, size guards, timeouts, and Cloudflare UA fallback. Returns a normalized `FetchResult`. |
66-
| `github-extract.ts` | GitHub URL parser and the clone-or-API decision engine. Shallow-clones small repos (with session-local caching), falls back to the `gh` API for large repos or commit-SHA URLs, and assembles structured Markdown content from the result. |
67-
| `github-api.ts` | Thin, non-throwing wrappers around the `gh` CLI: auth detection, repo size, default branch, file tree, README, and single-file fetch. |
68-
| `format.ts` | `formatResultForLLM()` — converts the raw response to the requested format, prepends a redirect banner, and truncates large outputs to protect the context window. |
69-
| `html-to-markdown.ts` | Turndown-backed HTML → Markdown converter that strips scripts/styles/navigation while preserving semantic structure (headings, lists, code blocks). |
63+
| File | Role |
64+
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
65+
| `index.ts` | Tool definition, registration, and request dispatch. Routes GitHub URLs to the GitHub extractor and everything else to the HTTP fetcher; renders results in the TUI. |
66+
| `types.ts` | Shared types (`FetchResult`, `FetchParams`, `FetchError`, `GitHubUrlInfo`, `GitHubCloneConfig`) and constants (timeouts, size limits, GitHub defaults). |
67+
| `fetcher.ts` | Pure HTTP transport: `fetchUrl()` handles URL normalization, SSRF protection, redirects, size guards, timeouts, and Cloudflare UA fallback. Returns a normalized `FetchResult`. |
68+
| `github-extract.ts` | GitHub URL parser and the clone-or-API decision engine. Shallow-clones small repos (with session-local caching), falls back to the `gh` API for large repos or commit-SHA URLs, and assembles structured Markdown content from the result. |
69+
| `github-api.ts` | Thin, non-throwing wrappers around the `gh` CLI: auth detection, repo size, default branch, file tree, README, and single-file fetch. |
70+
| `format.ts` | `formatResultForLLM()` — converts the raw response to the requested format, prepends a redirect banner, and truncates large outputs to protect the context window. |
71+
| `html-to-markdown.ts` | Turndown-backed HTML → Markdown converter that strips scripts/styles/navigation while preserving semantic structure (headings, lists, code blocks). |
7072

7173
## GitHub extraction in detail
7274

73-
When the agent fetches a `github.com` URL, the tool recognizes the URL shape and extracts structured content instead of fetching rendered HTML:
75+
When the agent fetches a `github.com` URL, the tool recognizes the URL shape and extracts structured content instead of
76+
fetching rendered HTML:
7477

7578
- **Repo root** (`/owner/repo`) → file tree + README.
7679
- **Directory** (`/owner/repo/tree/<ref>/<path>`) → directory listing with file sizes.
@@ -81,14 +84,18 @@ The decision between cloning and using the API:
8184
1. **Cached clone?** → reuse the session-local clone.
8285
2. **Full commit-SHA URL?** → use the `gh` API (can't shallow-clone a SHA).
8386
3. **Repo larger than `maxRepoSizeMB`?** → use the `gh` API (tree + README). The `forceClone` parameter overrides this.
84-
4. **Otherwise** → shallow clone (`gh repo clone` when authenticated, `git clone` for public repos as fallback). If cloning fails, fall back to the API.
87+
4. **Otherwise** → shallow clone (`gh repo clone` when authenticated, `git clone` for public repos as fallback). If
88+
cloning fails, fall back to the API.
8589

86-
Non-code GitHub paths (`/issues`, `/pull`, `/discussions`, etc.) are intentionally **not** intercepted — they fall through to the normal HTTP fetcher, since they serve HTML pages rather than repository content.
90+
Non-code GitHub paths (`/issues`, `/pull`, `/discussions`, etc.) are intentionally **not** intercepted — they fall
91+
through to the normal HTTP fetcher, since they serve HTML pages rather than repository content.
8792

88-
> **Note:** The `gh` CLI is required for API calls, private repos, and the size-check preflight. Without `gh` authentication, public repos still work via `git clone`.
93+
> **Note:** The `gh` CLI is required for API calls, private repos, and the size-check preflight. Without `gh`
94+
> authentication, public repos still work via `git clone`.
8995
9096
## Configuration
9197

92-
All configuration is defined in code — there is no external config file. Edit `DEFAULT_GITHUB_CONFIG` in [`types.ts`](./types.ts) to change GitHub behaviour.
98+
All configuration is defined in code — there is no external config file. Edit `DEFAULT_GITHUB_CONFIG` in
99+
[`types.ts`](./types.ts) to change GitHub behaviour.
93100

94101
HTTP-fetch defaults (timeout, max bytes, User-Agents) are also constants in `types.ts`.

0 commit comments

Comments
 (0)