Operator runbook for advanced and maintenance tasks.
Use SKILL.md for the concise day-to-day workflow.
- Run collector:
python3 -m collector.pipeline.scrape
# one profile only:
python3 -m collector.pipeline.scrape --profile <profile>- Read generated artifacts:
data/runs/<profile>/latest.json
- If needed, create report output:
python3 -m collector.pipeline.report --profile <profile> --target <slug> --days 3- Run
scrapeonce daily at roughly the same time. - Generate
--days 1reports for urgent daily review. - Generate
--days 7reports for deeper weekly review.
Registry files:
data/config/targets.jsondata/config/targets.history.jsonl
Add target:
python3 -m collector.pipeline.manage_targets add \
--profile <profile> \
--slug <slug> \
--name "<display name>" \
--source webpage=https://example.com \
--source linkedin_company=https://www.linkedin.com/company/example/ \
--source x_profile=https://x.com/exampleAdd target without auto-discovery:
python3 -m collector.pipeline.manage_targets add \
--profile <profile> \
--slug <slug> \
--name "<display name>" \
--source webpage=https://example.com \
--skip-discoveryList targets:
python3 -m collector.pipeline.manage_targets list --profile <profile>
python3 -m collector.pipeline.manage_targets list --profile <profile> --active-onlyDeactivate target:
python3 -m collector.pipeline.manage_targets remove --profile <profile> --slug <slug>Hard-delete target:
python3 -m collector.pipeline.manage_targets remove --profile <profile> --slug <slug> --hard-deleteDiscover source suggestions:
python3 -m collector.pipeline.discover_sources --profile <profile>If BRAVE_SEARCH_API_KEY or BRAVE_API_KEY is set, discovery uses Brave Search to find more feed/blog/news/changelog candidates.
Rediscover and append sources for one target:
python3 -m collector.pipeline.manage_targets rediscover --profile <profile> --slug <slug>Cleanup LinkedIn source entries:
python3 -m collector.pipeline.manage_targets cleanup-linkedin --profile <profile>
python3 -m collector.pipeline.manage_targets cleanup-linkedin --profile <profile> --slug <slug>- Open
data/config/targets.json. - Choose a profile under
profiles(for exampledefault). - Locate target by
slug. - Append source entries in
sourceslist.
Example:
{
"type": "rss",
"url": "https://example.com/rss.xml",
"label": "News RSS"
}Common source types:
rssblogchangelogwebpagelinkedin_companysitemap
Manual edit rules:
- Keep URLs public and unauthenticated.
- Avoid duplicate
(type, url)pairs. - Prefer section/index pages over one-off article URLs.
- Run a new scrape after edits:
python3 -m collector.pipeline.scrape- JSON: stable machine-readable summary/manifests.
- JSONL: append-only item streams for audit and downstream filtering.
- If a source fails, inspect
errorsindata/runs/<profile>/latest.json. - LinkedIn blocking is expected in some environments; collector can fall back via Brave when
BRAVE_SEARCH_API_KEYis set. - If schema validation fails, verify
data/config/targets.jsonagainstconfig/sources.schema.json.