You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pi/.pi/agent/extensions/web-fetch/README.md
+23-16Lines changed: 23 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,10 @@ A `web_fetch` tool fetches content from a URL and converts it into a clean, LLM-
5
5
What makes it more than a plain HTTP client:
6
6
7
7
-**HTML → Markdown** conversion by default (with `text` and `html` alternatives).
8
-
-**GitHub-aware extraction** — `github.com` URLs return structured repository content (file trees, README, file text) instead of raw HTML, with a local shallow clone the agent can explore further via `read`/`bash`.
-**Actionable errors** — failure messages include hints so the model can retry intelligently.
11
13
12
14
## Architecture
@@ -58,19 +60,20 @@ flowchart TD
58
60
59
61
## Components
60
62
61
-
| File | Role |
62
-
|------|------|
63
-
|`index.ts`| Tool definition, registration, and request dispatch. Routes GitHub URLs to the GitHub extractor and everything else to the HTTP fetcher; renders results in the TUI. |
|`fetcher.ts`| Pure HTTP transport: `fetchUrl()` handles URL normalization, SSRF protection, redirects, size guards, timeouts, and Cloudflare UA fallback. Returns a normalized `FetchResult`. |
66
-
|`github-extract.ts`| GitHub URL parser and the clone-or-API decision engine. Shallow-clones small repos (with session-local caching), falls back to the `gh` API for large repos or commit-SHA URLs, and assembles structured Markdown content from the result. |
67
-
|`github-api.ts`| Thin, non-throwing wrappers around the `gh` CLI: auth detection, repo size, default branch, file tree, README, and single-file fetch. |
68
-
|`format.ts`|`formatResultForLLM()` — converts the raw response to the requested format, prepends a redirect banner, and truncates large outputs to protect the context window. |
69
-
|`html-to-markdown.ts`| Turndown-backed HTML → Markdown converter that strips scripts/styles/navigation while preserving semantic structure (headings, lists, code blocks). |
|`index.ts`| Tool definition, registration, and request dispatch. Routes GitHub URLs to the GitHub extractor and everything else to the HTTP fetcher; renders results in the TUI.|
|`fetcher.ts`| Pure HTTP transport: `fetchUrl()` handles URL normalization, SSRF protection, redirects, size guards, timeouts, and Cloudflare UA fallback. Returns a normalized `FetchResult`.|
68
+
|`github-extract.ts`| GitHub URL parser and the clone-or-API decision engine. Shallow-clones small repos (with session-local caching), falls back to the `gh` API for large repos or commit-SHA URLs, and assembles structured Markdown content from the result. |
69
+
|`github-api.ts`| Thin, non-throwing wrappers around the `gh` CLI: auth detection, repo size, default branch, file tree, README, and single-file fetch.|
70
+
|`format.ts`|`formatResultForLLM()` — converts the raw response to the requested format, prepends a redirect banner, and truncates large outputs to protect the context window.|
71
+
|`html-to-markdown.ts`| Turndown-backed HTML → Markdown converter that strips scripts/styles/navigation while preserving semantic structure (headings, lists, code blocks). |
70
72
71
73
## GitHub extraction in detail
72
74
73
-
When the agent fetches a `github.com` URL, the tool recognizes the URL shape and extracts structured content instead of fetching rendered HTML:
75
+
When the agent fetches a `github.com` URL, the tool recognizes the URL shape and extracts structured content instead of
76
+
fetching rendered HTML:
74
77
75
78
-**Repo root** (`/owner/repo`) → file tree + README.
76
79
-**Directory** (`/owner/repo/tree/<ref>/<path>`) → directory listing with file sizes.
@@ -81,14 +84,18 @@ The decision between cloning and using the API:
81
84
1.**Cached clone?** → reuse the session-local clone.
82
85
2.**Full commit-SHA URL?** → use the `gh` API (can't shallow-clone a SHA).
83
86
3.**Repo larger than `maxRepoSizeMB`?** → use the `gh` API (tree + README). The `forceClone` parameter overrides this.
84
-
4.**Otherwise** → shallow clone (`gh repo clone` when authenticated, `git clone` for public repos as fallback). If cloning fails, fall back to the API.
87
+
4.**Otherwise** → shallow clone (`gh repo clone` when authenticated, `git clone` for public repos as fallback). If
88
+
cloning fails, fall back to the API.
85
89
86
-
Non-code GitHub paths (`/issues`, `/pull`, `/discussions`, etc.) are intentionally **not** intercepted — they fall through to the normal HTTP fetcher, since they serve HTML pages rather than repository content.
90
+
Non-code GitHub paths (`/issues`, `/pull`, `/discussions`, etc.) are intentionally **not** intercepted — they fall
91
+
through to the normal HTTP fetcher, since they serve HTML pages rather than repository content.
87
92
88
-
> **Note:** The `gh` CLI is required for API calls, private repos, and the size-check preflight. Without `gh` authentication, public repos still work via `git clone`.
93
+
> **Note:** The `gh` CLI is required for API calls, private repos, and the size-check preflight. Without `gh`
94
+
> authentication, public repos still work via `git clone`.
89
95
90
96
## Configuration
91
97
92
-
All configuration is defined in code — there is no external config file. Edit `DEFAULT_GITHUB_CONFIG` in [`types.ts`](./types.ts) to change GitHub behaviour.
98
+
All configuration is defined in code — there is no external config file. Edit `DEFAULT_GITHUB_CONFIG` in
99
+
[`types.ts`](./types.ts) to change GitHub behaviour.
93
100
94
101
HTTP-fetch defaults (timeout, max bytes, User-Agents) are also constants in `types.ts`.
0 commit comments