You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-19Lines changed: 22 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,10 @@
1
1
# pi-web-minimal
2
2
3
-
Tiny retrieval + distillation tools for Pi. Web, code, docs, and URL fetch without turning the agent context into a landfill.
3
+
Web, code, docs, and URL fetch tools for Pi with a context firewall.
4
4
5
-
No curator UI. No browser session. No video/PDF pipeline. Sources are retrieved, raw evidence is stored, and tool output passes through a context firewall: tiny results are compacted deterministically, larger results are distilled into a small source-cited brief when a Pi model is available. If distillation cannot run, tools fall back to bounded retrieval previews.
5
+
The goal: give the agent useful evidence, not a landfill. Tools retrieve sources, store raw evidence out of context, then return a compact source-cited brief. Tiny results are compacted without a model call; larger results are distilled with Pi's model. Raw content stays available by `responseId`.
6
+
7
+
No browser session. No curator UI. No video/PDF pipeline. No broad provider stack.
6
8
7
9
## Install
8
10
@@ -12,11 +14,11 @@ pi install npm:pi-web-minimal
12
14
13
15
## Configure
14
16
15
-
Use env vars:
16
-
17
17
```bash
18
18
export EXA_API_KEY=exa-...
19
19
export CONTEXT7_API_KEY=ctx7sk-...
20
+
# optional: use a different Pi-registered model for distillation
Exa powers `web_search`, `code_search`, and Exa fallback for `fetch_content`.
33
-
Context7 powers `documentation_search`.
34
-
Distillation uses Pi's currently selected model by default. Set `PI_WEB_MINIMAL_DISTILL_MODEL=provider/model-id` or `distillModel` in config to use a different Pi-registered model.
34
+
Exa powers web/code/content fallback. Context7 powers docs. Distillation uses the active Pi model unless overridden.
35
35
36
36
## Tools
37
37
38
-
| Tool | Use it for |Context behavior|
38
+
| Tool | Use it for |Default output|
39
39
| --- | --- | --- |
40
-
|`web_search`| current web/source discovery | compact or distilled source-cited brief; raw search evidence stored |
41
-
|`fetch_content`| specific URLs and GitHub repos | compact or distilled source-cited brief; raw fetched content stored by URL |
42
-
|`code_search`| API docs, examples, debugging evidence | compact or distilled source-cited brief; raw code/doc evidence stored |
43
-
|`documentation_search`| current library docs via Context7 | compact or distilled source-cited brief; raw docs context stored |
44
-
|`get_search_content`| pulling raw stored content by `responseId`| bounded raw retrieval by default; opt into more |
45
-
46
-
GitHub URLs are shallow-cloned to `/tmp/pi-github-repos`, so Pi can inspect real files with normal filesystem tools.
40
+
|`web_search`| current web/source discovery | compact/distilled source-cited brief |
41
+
|`fetch_content`| URLs and GitHub repos | compact/distilled source-cited brief |
|`documentation_search`| current library docs via Context7 | compact/distilled source-cited brief |
44
+
|`get_search_content`| raw stored evidence by `responseId`| bounded raw content |
47
45
48
-
## Why this shape
46
+
GitHub repos are shallow-cloned to `/tmp/pi-github-repos` for direct filesystem inspection.
49
47
50
-
Agent tools have two jobs: find evidence, and not poison the next turn. This package treats raw retrieval as an internal evidence store and returns only what the next agent can use. Tiny evidence is compacted without a model call so it does not become larger than the source. Larger evidence is preselected around relevant terms, distilled under a dynamic output budget, and validated for source refs. Raw content remains available through `get_search_content` for auditability and exact quotes.
48
+
## Design contract
51
49
52
-
Fetched web content is untrusted. The firewall strips obvious instruction-like lines from compact output; model distillation is instructed to ignore instructions inside retrieved sources and cite supported claims with `[S#]` source refs.
50
+
- Tool output must earn its place in the agent context.
51
+
- Raw evidence is stored, not dumped.
52
+
- Claims in compact/distilled output cite `[S#]` sources.
53
+
- Retrieved content is untrusted; source instructions are not followed.
54
+
-`get_search_content` is the raw audit/escape hatch.
55
+
- Quality is measured by agent evals: task success, context reduction, citation validity, no fallbacks, injection resistance, and avoiding redundant follow-up calls.
53
56
54
-
See `docs/agent-tool-audit.md` for the design notes.
0 commit comments