Vector Index — Build, Validate, and Atomic Swap

🧭 Quick Return to Map

You are in a sub-page of OpsDeploy.
To reorient, go back here:

OpsDeploy — operations automation and deployment pipelines

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A zero-downtime pattern to rebuild your vector index offline, prove it’s correct, then cut traffic over with a single reversible pointer. Works with FAISS, Milvus, Weaviate, Qdrant, pgvector, Redis, Chroma, Vespa, Typesense, and more.

Open these first

RAG map and recovery: rag-architecture-and-recovery.md
Retrieval knobs end-to-end: retrieval-playbook.md
Why this snippet (schema): retrieval-traceability.md, data-contracts.md
Embedding vs meaning: embedding-vs-semantic.md
Fragmentation failure mode: pattern_vectorstore_fragmentation.md
Metric mismatch & normalization: metric_mismatch.md, normalization_and_scaling.md

When to use

You change embedding model, dimension, normalization, or distance metric.
You modify chunk schema, overlaps, or anchor fields.
You add a reranker or change k, cutoff, weights.
You suspect index skew or store fragmentation.
You need a region-by-region cutover with instant rollback.

Acceptance targets

ΔS(question, retrieved) ≤ 0.45 on three paraphrases (gold set).
Coverage to the correct section ≥ 0.70.
λ remains convergent across two seeds.
No schema drift in required fields: {snippet_id, section_id, source_url, offsets, tokens}.
p95 latency change within ±15% of baseline at k used in prod.

Build pipeline (offline)

Freeze specs
Pin EMBED_MODEL_VER, EMBED_DIM, NORM, metric, TOK_VER, ANALYZER_CONF, CHUNK_SCHEMA_VER.
Record as a manifest next to the index.
Re-chunk & re-embed
Apply your chunking checklist; write vectors with doc ids and anchor metadata.
Construct index docs_vB
Use store-appropriate build parameters. Keep docs_vA live.
Attach reranker (if any)
Persist RERANK_CONF and deterministic ordering parameters.
Write integrity probes
Store INDEX_HASH = hash(all vectors + manifest). Emit alongside the retriever.

Validation (before any traffic)

Gold set eval (20–40 Qs)
Run baseline docs_vA vs candidate docs_vB. Log ΔS, coverage, λ, latency.
Anchor triangulation
Compare ΔS(retrieved, anchor) vs ΔS(retrieved, decoy). If close, fix chunking.
Metric sanity
If cosine neighbors look right but meaning is off, re-check metric/norm rules.
Fragmentation scan
If top-k distributions differ wildly across partitions, de-frag or rebuild.
Contracts
Verify snippet fields and cite-then-explain are intact.

Atomic swap patterns

Alias swap (preferred)

Keep a stable read name (e.g., docs_live) and flip alias from docs_vA to docs_vB.
Rollback = flip back to docs_vA.

Config pointer (KV)

Keep a single INDEX_PTR=docs_vX in a one-writer KV. All readers dereference at request start.
Rollback = set back to previous pointer.

Per-region staged swap

Swap one region at a time; watch ΔS/coverage/λ for 15–30 minutes before next region.

60-second cutover checklist

Candidate docs_vB passes gold eval targets.
INDEX_HASH, EMBED_SCHEMA, RERANK_CONF emitted by retriever match the manifest.
Caches are primed or partitioned by INDEX_HASH.
Canary at 5% is green.
Alias flip or pointer update is one operation and reversible.

Stop & rollback

Hard stop if ΔS p95 drift > 0.15 or coverage < 0.60 or λ flip rate > 0.20.
Hard stop if tool loops or 5xx > 1%.
Rollback = alias flip back to docs_vA (or pointer revert).
After rollback, open: debug_playbook.md.

Store-specific notes

FAISS / Chroma: ensure build-time metric matches query-time; re-normalize if switching from dot to cosine.
Qdrant / Weaviate / Milvus: pin HNSW/IVF params; rebuild rather than mutate when dimension changes.
pgvector: match vector_l2_ops/vector_cosine_ops with embedding norm; verify ANALYZER_CONF if paired with text search.
Redis: keep alias via FT.ALIASADD; avoid multi-writer FT.CREATE races.
Vespa / Typesense: schema and field types versioned; perform shadow feed before activation.

Pseudo commands

Build

# generate chunks and vectors
wfgy_chunk --schema s128/o32 corpus.jsonl > chunks.jsonl
wfgy_embed --model text-embedding-3-large chunks.jsonl > vectors.fbin

# build candidate index
wfgy_index build --metric cosine --norm l2 --in vectors.fbin --out docs_vB

# compute and store manifest + hash
wfgy_index manifest --index docs_vB > manifest.json
wfgy_index hash --index docs_vB > INDEX_HASH.txt

Validate vs baseline

wfgy_eval rag --gold gold_40.json \
  --indexes docs_vA,docs_vB \
  --targets "ds<=0.45,cov>=0.70,lambda=convergent"

Swap and rollback (alias style)

vec alias update docs_live --to docs_vB
# rollback
vec alias update docs_live --to docs_vA

Common pitfalls

Mixed analyzers/tokenizers across ingest vs query.
“Minor” embedding model update that changes dimension or norm assumptions.
Reranker cutoff mismatch between staging and prod.
Cache keys without INDEX_HASH causing stale blends.
Two writers touching the same live alias.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Index — Build, Validate, and Atomic Swap

Open these first

When to use

Acceptance targets

Build pipeline (offline)

Validation (before any traffic)

Atomic swap patterns

Alias swap (preferred)

Config pointer (KV)

Per-region staged swap

60-second cutover checklist

Stop & rollback

Store-specific notes

Pseudo commands

Build

Validate vs baseline

Swap and rollback (alias style)

Common pitfalls

🔗 Quick-Start Downloads (60 sec)

Explore More

FilesExpand file tree

vector_index_build_and_swap.md

Latest commit

History

vector_index_build_and_swap.md

File metadata and controls

Vector Index — Build, Validate, and Atomic Swap

Open these first

When to use

Acceptance targets

Build pipeline (offline)

Validation (before any traffic)

Atomic swap patterns

Alias swap (preferred)

Config pointer (KV)

Per-region staged swap

60-second cutover checklist

Stop & rollback

Store-specific notes

Pseudo commands

Build

Validate vs baseline

Swap and rollback (alias style)

Common pitfalls

🔗 Quick-Start Downloads (60 sec)

Explore More