Skip to content

Latest commit

 

History

History
212 lines (147 loc) · 11 KB

File metadata and controls

212 lines (147 loc) · 11 KB

FAQ — Fast Answers for Busy Builders (Problem Map Edition)

Short, practical answers to the questions we get every day. This FAQ merges previous content with the latest Problem Map guidance, so you have one canonical page.

Quick Nav
Getting Started · Grandma Clinic (1–16) · Problem Map 1–16 Index · Retrieval Playbook · Chunking · Embeddings · Rerankers · Eval · Ops · Global Fix Map


General

What is WFGY, in one line?
A semantic firewall and diagnostic layer that sits above your stack. It measures semantic drift (ΔS), watches stability (λ_observe), and applies repair operators (BBMC / BBPF / BBCR / BBAM). No infra rewrite needed.

Do I need a GPU?
No for first fixes. You can prototype on CPU with light embeddings and strict guardrails. GPU helps with heavy rerankers or larger local LLMs, but it is optional. See: Retrieval Playbook.

How is this different from LangChain/LlamaIndex?
Those orchestrate tools. WFGY hardens reasoning and retrieval with pre-output gates and measurable acceptance targets. It works regardless of framework.

License and commercial use?
MIT. Commercial use allowed. PRs welcome (docs, patterns, examples).


Getting started & scope

Fastest 60-second tryout?
Load TXT OS and the WFGY paper. Paste TXT OS into any LLM, then follow Getting Started for a minimal “semantic firewall before output” routine.

Where do I look if I don’t know the failure type yet?
Open the triage page: Grandma Clinic (1–16). Each item has a grandma story, a minimal guardrail, and the pro link.

Which embedding model to start with?
General English: all-MiniLM-L6-v2 or bge-base. Multilingual: bge-m3 or LaBSE. Keep write/read normalization identical. See: Embeddings and Semantic ≠ Embedding.

Do I need a reranker right away?
Not usually. First prove your candidate pool: if recall@50 ≥ 0.85 and Top-k precision is still weak, add a reranker. Otherwise fix retrieval shape first. See: Rerankers.

How big can my PDFs be?
Start with a gold set (10–50 Q/A with citations). Ingestion by sections, not fixed tokens. Verify ΔS thresholds before scaling. See: Chunking.


Diagnosing failures

The chunks look right but the answer is wrong. Now what?
Measure ΔS(question, retrieved).

Hybrid (BM25+dense) made results worse. Why?
Likely analyzer/tokenizer mismatch or query splitting. Unify analyzers and log per-retriever queries. See: Query Parsing Split.

Citations bleed or point to mixed sources.
Enforce “cite-then-explain”, per-source fences, and retrieval trace with IDs/lines. See: Retrieval Traceability and Symbolic Constraint Unlock.

Fixes don’t stick after refresh.
You’re hitting Memory Desync. Stamp mem_rev/mem_hash, gate writes, and audit trace keys. See: Memory Desync.


Retrieval, chunking, and OCR

Optimal chunk size and rules?
Prefer structural sections, stable titles, and table/code preservation. Avoid splitting tables/code blocks. See: Chunking.

OCR keeps breaking layout.
Use layout-aware parsing and keep headers/footers separate. See: OCR Parsing.

Multilingual retrieval drifts.
Check tokenizer/analyzer per language and enable hybrid multilingual ranking with guardrails. See: Language and LanguageLocale.


Embeddings & metrics

Compute ΔS quickly?
Unit-normalize sentence embeddings; ΔS = 1 − cos(I, G). Operating zones: <0.40 stable, 0.40–0.60 transit, ≥0.60 act.

Why are top neighbors semantically wrong with high cosine?
Cross-space vectors, scale/normalization mismatch, or casing/tokenization skew. Audit metrics first. See: Embeddings and Semantic ≠ Embedding.

When to switch dimensions or project?
Only after metric/normalization audit and contract checks. See: Dimension Mismatch & Projection.


Reasoning guardrails

What are BBMC / BBPF / BBCR / BBAM?

  • BBMC: minimize semantic residue vs anchors.
  • BBPF: branch safely across multiple paths.
  • BBCR: detect collapse and restart via a clean bridge node.
  • BBAM: modulate attention variance to avoid entropy melt.

How do I decide when to reset?
Monitor ΔS and λ_observe mid-chain. If ΔS spikes twice or λ diverges, run BBCR and re-anchor. See: Logic Collapse.

How do I clamp chain-of-thought variance without killing creativity?
Run λ_diverse for 2–3 candidates, score against the same anchor, and apply a bounded entropy window. See: Creative Freeze.

Symbols/tables keep getting flattened.
Keep a separate symbol channel, preserve code/table blocks, and anchor units/operators. See: Symbolic Collapse.


Multi-agent & tool chaos

Agents overwrite each other’s notes.
Assign role/state keys, memory fences, and tool timeouts. See: Multi-Agent Problems.

Debug path is a black box.
Log query → chunk IDs → acceptance metrics; show the card (source) before answer. See: Retrieval Traceability.


Eval & acceptance targets

What to measure on every PR?
Commit a gold set and track recall@50, nDCG@10, and ΔS across prompts. Gate merges on stability. See: Eval RAG Precision/Recall and Eval Semantic Stability.

Acceptance targets we use

  • ΔS ≤ 0.45
  • Coverage ≥ 0.70
  • λ state convergent
  • Source present before final

Ops & deployment

First calls fail or stall. Where to look?

Index build & swap, shadow traffic, rollback?
See: Ops and the detailed ops pages in the Global Fix Map.


Known limits

  • Noisy OCR may require manual anchors or char-level retrieval.
  • Abstract cross-domain reasoning (#11/#12) improves with stronger models.
  • Rerankers add latency; prove gains via nDCG before shipping.

Beginner one-liners (map to Problem Map numbers)


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer Page What it’s for
⭐ Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars