Skip to content

Latest commit

 

History

History
65 lines (51 loc) · 2 KB

File metadata and controls

65 lines (51 loc) · 2 KB

Reverse Wiktionary

Reverse Wiktionary is a semantic lexical search app. This repository contains the online serving layer: FastAPI, Qdrant query integration, Redis-backed UI state, Docker/Nginx deployment files, and Azure beta deployment scripts.

Offline artifact production lives in Reverse-Wiktionary-Offline.

Test coverage and validation harnesses live in Reverse-Wiktionary-Test-Suite.

Execution Boundary

Reverse-Wiktionary-Offline
  Wiktionary/Kaikki dump
    -> normalized lexical rows
    -> sentence-transformer embeddings
    -> indexed and quantized Qdrant collection
    -> deployable artifacts

artifact handoff
  Qdrant snapshot
  taxonomy files
  manifest metadata

Reverse-Wiktionary
  deploy artifacts to serving VM
    -> Qdrant restores the search index
    -> FastAPI serves search and templates
    -> Redis stores lightweight UI state
    -> Nginx fronts the beta web service

Current Serving Baseline

collection_name: reverse_wiktionary_v3
current model: sentence-transformers/all-mpnet-base-v2
current vector_size: 768
indexed points: 3,869,247
filtered retrieval: Qdrant ACORN
compression: scalar int8 quantization, original vectors on disk

This v3 artifact restores the 768-dimensional embedding model after the 512-dimensional beta artifact did not meet filtered retrieval quality requirements.

Design Documents

Principles

  • Treat Qdrant snapshots as the deployable contract.
  • Keep serving independent from offline indexing.
  • Keep Qdrant and Redis private to the VM/container network.
  • Prefer explicit manifests and smoke tests over implicit state.
  • Benchmark before changing VM size or retrieval settings.