feat(rag): generation/pointer model for crash-safe re-chunking and embedding lifecycle

### Context

PR https://github.com/xorbitsai/xagent/pull/202 addresses stale chunk rows on re-chunk (#199) with a short-term **delete/insert + scoped locking** approach (`replace_chunks`, per-document locks, visible embedding cleanup failures).

Review feedback (2026-05-15) notes that this pattern remains fundamentally limited:

- **Crash safety** depends on insert-before-delete ordering; incomplete runs can leave duplicate or partially updated generations searchable until cleanup completes.
- **Concurrency** is hard to make correct with process-local locks alone: destructive updates are keyed by `(collection, doc_id, parse_hash, user scope)`, while ingestion locks may be keyed differently (e.g. `source_path` vs `file_id`-derived `doc_id`). Two concurrent runs for the same logical document can still race across chunk replacement and embedding writes.
- **Embedding cleanup** failures are high-impact: retrieval reads `embeddings_*` directly; stale embedding rows remain searchable even when `chunks` has been replaced.

The durable fix is an explicit **generation / active-pointer** model rather than relying on delete/insert timing.

### Proposed design (high level)

1. Each re-chunk / re-embed run writes under a new immutable **`generation_id`** (or equivalent).
2. Maintain a small **active-generation pointer** table keyed by document scope, e.g. `(collection, doc_id, parse_hash, user scope)` → `active_generation_id`.
3. **Retrieval** only reads chunks and embeddings belonging to the **active** generation.
4. **Publish** flow:
   - Write all chunks + embeddings for the new generation completely.
   - Atomically update the active pointer to the new generation (this is the only step requiring strict atomicity).
5. **Cleanup** old generations asynchronously (best-effort is acceptable once they are no longer active/searchable).

### Benefits

| Concern | Current delete/insert model | Generation/pointer model |
|--------|-----------------------------|---------------------------|
| Crash mid-run | Duplicates or partial state may be searchable | Incomplete generations never published |
| Concurrent re-chunk | Race between chunk replace and embedding write-back | Old runs cannot publish; pointer move is atomic |
| Embedding cleanup failure | Stale embeddings may remain searchable | Inactive generations ignored by retrieval |
| Multi-worker | Requires cross-process locks | Pointer update + read path scoped by generation |

### Short-term (tracked in PR #202)

Until this issue is implemented, PR #202 uses:

- Scoped locking keyed by actual replace scope `(collection, doc_id, parse_hash, user scope)` through chunk replace → embedding write
- Cross-process lock (`filelock`) if ingestion runs in multiple workers
- **Visible** embedding cascade-delete failures (raise / surface partial failure), not silent best-effort success

### Scope / likely touch points

- Storage abstraction: `VectorIndexStore` — generation-aware write/read APIs
- `chunk_document` / `replace_chunks` — write under new generation instead of in-place delete
- `vector_manager` / embedding upsert — tag rows with `generation_id`
- Retrieval (dense/sparse/hybrid) — filter by active generation from pointer table
- Migration: backfill pointer for existing data (single implicit generation per scope)

### Acceptance criteria (draft)

- [ ] Re-chunk with different `config_hash` never returns chunks/embeddings from a non-active generation
- [ ] Crash after partial write of new generation does not change searchable results until pointer publish
- [ ] Concurrent re-chunk for same `(collection, doc_id, parse_hash)` cannot resurrect stale embeddings
- [ ] Old generations can be garbage-collected without affecting active retrieval
- [ ] Documented migration path for existing LanceDB deployments

### References

- #199 — stale chunk rows / chunk_size regression
- PR #202 — short-term `replace_chunks` + locking (review: generation model as follow-up)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): generation/pointer model for crash-safe re-chunking and embedding lifecycle #438

Context

Proposed design (high level)

Benefits

Short-term (tracked in PR #202)

Scope / likely touch points

Acceptance criteria (draft)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Concern	Current delete/insert model	Generation/pointer model
Crash mid-run	Duplicates or partial state may be searchable	Incomplete generations never published
Concurrent re-chunk	Race between chunk replace and embedding write-back	Old runs cannot publish; pointer move is atomic
Embedding cleanup failure	Stale embeddings may remain searchable	Inactive generations ignored by retrieval
Multi-worker	Requires cross-process locks	Pointer update + read path scoped by generation

feat(rag): generation/pointer model for crash-safe re-chunking and embedding lifecycle #438

Description

Context

Proposed design (high level)

Benefits

Short-term (tracked in PR #202)

Scope / likely touch points

Acceptance criteria (draft)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions