Skip to content

feat: add read-only knowledge processing dashboard#1742

Open
FFFFFFpy wants to merge 5 commits into
Tencent:mainfrom
FFFFFFpy:codex/knowledge-processing-dashboard
Open

feat: add read-only knowledge processing dashboard#1742
FFFFFFpy wants to merge 5 commits into
Tencent:mainfrom
FFFFFFpy:codex/knowledge-processing-dashboard

Conversation

@FFFFFFpy

Copy link
Copy Markdown
Contributor

Description

Add a read-only knowledge processing dashboard for observing the current execution state of the knowledge ingestion pipeline.

The dashboard aggregates existing processing spans, Asynq task states, and Wiki pending operations into nine logical stages:

  • Document parsing
  • Document chunking
  • Index construction
  • Multimodal processing
  • Post-processing orchestration
  • Summary generation
  • Question generation
  • Knowledge graph extraction
  • Wiki synthesis

Each stage is aggregated by knowledge, processing attempt, and logical stage. Internal fan-out tasks such as individual images, question batches, graph chunks, Wiki pages, and their retries are presented as a single parent-stage progress entry instead of separate tasks.

The dashboard shows:

  • Knowledge items currently running in each stage
  • The number of knowledge items waiting or retrying
  • Aggregated progress for multimodal, question, graph, and Wiki processing
  • Per-knowledge processing pipeline details
  • Knowledge base filtering and title search
  • Automatic and manual refresh
  • Read-only queue and processing-state inspection
  • Degraded behavior when the Asynq queue snapshot is unavailable
  • Support for both Asynq and Lite execution modes

This feature is observability-only. It does not modify task scheduling, workers, retries, queue priorities, processing state transitions, or knowledge-processing execution paths.

Implementation notes

  • Reuses the existing knowledge processing span data.
  • Reads Asynq queue state through a read-only inspector.
  • Reads Wiki pending operations without changing their state.
  • Separates processing attempts so stale tasks are not merged into the current attempt.
  • Aggregates fan-out spans in the repository layer to avoid loading large numbers of graph, image, and question child spans into memory.
  • Marks truncated queue counts as unreliable instead of presenting incomplete counts as exact values.
  • Adds database indexes for dashboard queries.
  • Adds frontend request guards to prevent stale responses from replacing newer dashboard data.

Known limitation

Stage drawer pagination currently builds the state of all active knowledge items and then paginates the filtered result in memory. Its cost is proportional to the number of active knowledge items, not the number of historical completed items or fan-out child spans.

This is expected to be sufficient for the current workload. Repository-level stage filtering and database cursor pagination can be added later if the deployment regularly has approximately 1,000 or more concurrently active knowledge items.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📚 Documentation update
  • 🎨 Refactor
  • ⚡ Performance improvement
  • 🧪 Test
  • 🔧 Configuration / Build / CI

Related Issue

N/A

Testing

Run the following backend checks:

go test ./internal/...
go test -race ./internal/router/... -run ProcessingQueueSnapshot

Run the following frontend checks:

cd frontend
npm run test
npm run type-check
npm run build

Manual verification:

  • Opened the knowledge processing dashboard.
  • Verified the nine logical processing stages.
  • Verified filtering by knowledge base and keyword.
  • Verified automatic and manual refresh.
  • Verified stage queue drawers and knowledge processing details.
  • Verified multimodal, question, graph, and Wiki progress aggregation.
  • Verified that completed historical knowledge items are not loaded into the active dashboard.
  • Verified that the dashboard does not expose task control actions.
  • Verified the empty state and queue-data degradation message.

Checklist

  • make fmt && make lint && make test pass locally
  • Self-reviewed the code
  • Added/updated tests covering the change
  • Added Swagger annotations and localized frontend text
  • No breaking changes
  • Confirmed that the dashboard is read-only and does not modify workers or task execution

Screenshots / Recordings

Add screenshots showing:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant