Skip to content

Commit f231a2a

Browse files
j-mendezclaude
andcommitted
docs: expand SKILL.md crawl section — library API + RemoteFetcher hook
Adds the subscriber-sugar (`crawl_builder().on_page().run()`) and raw stream library snippets and surfaces spider 2.51.198's new `RemoteFetcher` hook as a documented integration option. Updates the route-count line to 27 (the local.crawl + spider.cloud.crawl additions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dc2f77d commit f231a2a

1 file changed

Lines changed: 36 additions & 1 deletion

File tree

SKILL.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,45 @@ Engines:
8787
`{{param:k}}` (numbers and JSON literals parse correctly; everything else is a
8888
string). Use this for vendor-specific knobs without editing TOML.
8989

90+
### Library use — subscriber sugar over a Stream
91+
92+
```rust
93+
use std::sync::Arc;
94+
use gottem_core::{CancelToken, ControlFlow, CrawlRequest, Orchestrator};
95+
use url::Url;
96+
97+
let orch: Arc<Orchestrator> = /* built with crawl adapters installed */;
98+
orch.crawl_builder(
99+
CrawlRequest::new(Url::parse("https://example.com")?)
100+
.with_limit(50)
101+
.with_depth(2),
102+
)
103+
.on_page(|page| async move {
104+
save(page).await;
105+
ControlFlow::Continue
106+
})
107+
.run(CancelToken::new())
108+
.await?;
109+
```
110+
111+
Or the raw stream: `orch.crawl(req, cancel).await?` returns
112+
`Stream<Item = Result<PageEntry>>`.
113+
114+
### Custom transport via `spider::RemoteFetcher`
115+
116+
Spider 2.51.198 exposes `Website::with_remote_fetcher` — implement
117+
`spider::fetcher::RemoteFetcher` and spider drives the full crawl engine
118+
(visited / depth / allow-deny / robots / link extraction / subscription
119+
channel) using your transport for the per-URL fetch. Useful when you want
120+
spider's engine but a non-default transport (an internal API, a custom
121+
proxy mesh, etc.). gottem's own local engine doesn't currently route
122+
through this hook — its scrape ladder needs hop-depth gating which spider
123+
will add in a future patch.
124+
90125
## routes — inspect the vendor catalog
91126

92127
```sh
93-
gottem routes list # tabular catalog (22 builtin routes, 13 vendors)
128+
gottem routes list # tabular catalog (27 builtin routes, 13 vendors + local crawl)
94129
gottem routes show <route-id> # full detail for one route
95130
gottem routes validate # check env vars are set for each route's auth
96131
gottem --config routes.toml fetch URL # layer custom vendor routes on top of builtin

0 commit comments

Comments
 (0)