You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today push targets the live chunks of a document, not a version. The relationship between History (the version timeline) and the Ingest tab (the push timeline) is implicit — they share a chunkset_hash but nothing else.
The Stale badge on the Ingest tab does not say which version drifted from what. The user has to deduce mentally.
chunk_pushes.chunk_ids is a list of IDs; a hard-deleted chunk leaves the audit row pointing at a ghost. The frozen chunks_snapshot in document_versions does not suffer from that, but pushes do not reference it today.
"Which version of doc X is in store Y right now?" requires joining hashes, not entities. Painful in queries, harder in conversations.
When prod RAG starts hitting the store, the question "which version of this doc is the RAG using?" has no clean answer.
The mental model we want is the git release pattern: you do not deploy a moving target, you tag, then deploy the tag. Push = deploy. The version is the tag.
Goal
Make the relationship explicit at the data + UX level:
The unit of ingestion is a document_version, not the live chunks.
Every chunk_pushes row references a document_versions.id.
The History drawer surfaces per-store ingestion state per version.
The Ingest tab states "v3 is in Neo4j Local" rather than "hash A93B…".
Diff endpoint becomes version A vs version B with dates and summaries.
Scope
In scope
Backend
domain/models.py — ChunkPush gains document_version_id: str (mandatory after migration window).
domain/models.py — DocumentStoreLink switches its source-of-truth from chunkset_hash to document_version_id (or carries both during the transition).
persistence/database.py — chunk_pushes.document_version_id column + FK to document_versions(id)ON DELETE SET NULL (audit survives version purge).
services/chunk_service.py::push_to_store — resolution policy: if no fresh version exists, auto-create a CHUNKS version snapshot before pushing (analogous to git tag before git push). Configurable: implicit (default) vs explicit (require user to "+ Generate chunks" first).
services/store_service.py — diff endpoint takes two version_ids instead of (doc, store) → (doc, store).
Frontend
History drawer — each version row carries per-store badges: [Neo4j Local ✓ · OpenSearch ✗].
Ingest tab — replaces hash-based language with version language: "v3 is in Neo4j Local since 14:32".
Restore-from-version — show "this version is not ingested anywhere yet" affordance.
Out of scope (follow-ups)
Push of a version other than the current one (i.e. push v2 while live is v4). Useful for rollback but adds UX complexity around what "live" means after restore. Track as a separate ticket.
Recommended sequencing: this issue → re-spec the three above → implement them.
ADR required
The semantic change — "push targets a version, not live chunks" — is load-bearing enough to warrant an ADR under docs/architecture/adrs/. Topics to cover:
Why not keep the live-chunks model and just expose the hash better in the UI? (Answer: the data-integrity argument around hard-deleted chunks + the audit story).
Implicit version-on-push vs explicit version-required-before-push? (Trade-off: UX friction vs version-table growth.)
Severity: P0 for 0.8.0 (Production ready). The "which version is in prod?" question is foundational for any production RAG deployment.
Why not 0.6.1: 0.6.1 is a stabilisation milestone; this is a model change with cross-cutting impact on at least three other open issues. Slipping it in would derail the v0.6.1 tag.
Why not 0.7.0: there is no 0.7.0 milestone yet (UI redesign milestone is named after the maquette work, not numbered). 0.8.0 (Production ready) is the natural home: production semantics need this.
Context
Today push targets the live chunks of a document, not a version. The relationship between History (the version timeline) and the Ingest tab (the push timeline) is implicit — they share a
chunkset_hashbut nothing else.Symptoms of the gap:
Stalebadge on the Ingest tab does not say which version drifted from what. The user has to deduce mentally.chunk_pushes.chunk_idsis a list of IDs; a hard-deleted chunk leaves the audit row pointing at a ghost. The frozenchunks_snapshotindocument_versionsdoes not suffer from that, but pushes do not reference it today.The mental model we want is the git release pattern: you do not deploy a moving target, you tag, then deploy the tag. Push = deploy. The version is the tag.
Goal
Make the relationship explicit at the data + UX level:
document_version, not the live chunks.chunk_pushesrow references adocument_versions.id.Scope
In scope
domain/models.py—ChunkPushgainsdocument_version_id: str(mandatory after migration window).domain/models.py—DocumentStoreLinkswitches its source-of-truth fromchunkset_hashtodocument_version_id(or carries both during the transition).persistence/database.py—chunk_pushes.document_version_idcolumn + FK todocument_versions(id)ON DELETE SET NULL(audit survives version purge).services/chunk_service.py::push_to_store— resolution policy: if no fresh version exists, auto-create a CHUNKS version snapshot before pushing (analogous togit tagbeforegit push). Configurable: implicit (default) vs explicit (require user to "+ Generate chunks" first).services/store_service.py— diff endpoint takes twoversion_ids instead of (doc, store) → (doc, store).[Neo4j Local ✓ · OpenSearch ✗].Out of scope (follow-ups)
Dependent / blocked issues
These all assume the live chunks model and need either re-spec or to wait on this:
Recommended sequencing: this issue → re-spec the three above → implement them.
ADR required
The semantic change — "push targets a version, not live chunks" — is load-bearing enough to warrant an ADR under
docs/architecture/adrs/. Topics to cover:document_version_idreservation? No — out of scope for 0.6.1.).Acceptance criteria
chunk_pushesrows carry adocument_version_id. Repository + API tests assert it.document_store_linksstate semantics rewritten in terms of versions:Ingested= the latest version of the doc matches the version recorded on the link.Stale= the doc has a newer version than the one in the link.Failed= unchanged.?from=<version_id>&to=<version_id>and returns a chunk-by-chunk diff between two frozen snapshots.docs/architecture/adrs/documents the decision (implicit vs explicit, why versions over hash, migration plan).Impact / priority
References
chunk_pushes/document_store_links).docs/design/279-store-connection-credentials.md— the precedent for design-doc-driven changes.