What happened
The current workflow is CLI-script heavy and scattered. For crawler/collector data (recipe.run + collector scripts), raw outputs (raw/...) and sync state (state.json/heartbeat files) are created across multiple recipe-specific folders. In practice there is no normal in-project interface to see: which data sources are active, what was crawled recently, what is still pending, and whether the raw data is complete.
Because there is no unified management surface, using the same project on another device becomes risky. I can only move the vault via Baidu Netdisk/夸克/NAS, but I still do not have a clear "same dataset" verification loop for raw crawler inputs and collector state.
What I expected
A dedicated management interface (normal TUI in the repo workflow) for crawler/raw-data and recipe sync operations that can:
- list all installed collectors and their last sync status
- show raw material health (latest raw payloads, size/age, and state pointers)
- trigger/re-run collector jobs from one place
- show whether local raw data is ready for cross-device parity before syncing
- let me mark a collector workflow as "safe to mirror" across devices
Steps to reproduce
- Configure and run one or more recipe collectors (
recipe.run for e.g. source-to-vault recipes).
- Confirm new artifacts are written under the recipe outputs and
raw/ trees.
- Open a second machine or a fresh shell session and try to do the same management operations from one place.
- You still have to remember and run per-recipe scripts/commands manually, with no single interface for job status, raw-data inventory, or parity checks.
Additional context
The repo already has recipe-based collectors, and those docs describe each source, but orchestration/state visibility is per-recipe and disconnected. This request is not about a single missing command; it is about a unified raw-data management UX and cross-device sync confidence layer.
What happened
The current workflow is CLI-script heavy and scattered. For crawler/collector data (
recipe.run+ collector scripts), raw outputs (raw/...) and sync state (state.json/heartbeatfiles) are created across multiple recipe-specific folders. In practice there is no normal in-project interface to see: which data sources are active, what was crawled recently, what is still pending, and whether the raw data is complete.Because there is no unified management surface, using the same project on another device becomes risky. I can only move the vault via Baidu Netdisk/夸克/NAS, but I still do not have a clear "same dataset" verification loop for raw crawler inputs and collector state.
What I expected
A dedicated management interface (normal TUI in the repo workflow) for crawler/raw-data and recipe sync operations that can:
Steps to reproduce
recipe.runfor e.g. source-to-vault recipes).raw/trees.Additional context
The repo already has
recipe-based collectors, and those docs describe each source, but orchestration/state visibility is per-recipe and disconnected. This request is not about a single missing command; it is about a unified raw-data management UX and cross-device sync confidence layer.