🧭 Quick Return to Map
You are in a sub-page of Safety_PromptIntegrity.
To reorient, go back here:
- Safety_PromptIntegrity — prompt injection defense and integrity checks
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.
A practical guide to choose the right tools, bound their behavior, and prevent loops or silent stalls.
Use this page when the model calls the wrong tool, produces prose instead of JSON, or keeps retrying a dead endpoint.
- Tool calls loop or never return useful output.
- The wrong tool is picked even when inputs match another tool better.
- JSON mode breaks and the model replies with natural language.
- Latency spikes after deploy or under bursty traffic.
- Multi-agent plans hang on a blocked tool or long queue.
- Threat model and defenses: prompt_injection.md
- Role hygiene and separation: role_confusion.md
- JSON mode and schema locks: json_mode_and_tool_calls.md
- Memory isolation: memory_fences_and_state_keys.md
- Cite then explain discipline: citation_first.md
- RAG traceability and contracts: retrieval-traceability.md · data-contracts.md
- Live ops and debugging: ops/live_monitoring_rag.md · ops/debug_playbook.md
- Tool selection accuracy ≥ 0.98 on a 50-case gold set.
- P95 tool latency within budget for each class: HTTP, search, code-run, vector.
- Zero unbounded calls. Every tool has a timeout, retry policy, and idempotency key.
- Invalid JSON rate < 0.5 percent with strict schema validation.
- ΔS(question, cited snippet) ≤ 0.45 after tool orchestration. λ remains convergent on two seeds.
-
Lock the allowlist
Only expose tools that are needed for the task. Everything else is unavailable. -
Set hard time budgets
Per-tool timeout and total orchestration budget. Expose both to the model. -
Validate I/O
Enforce JSON schema on inputs and outputs. Reject and re-ask on failure. -
Apply backoff and caps
Retry with capped attempts and jitter. Never infinite retries. -
Observe ΔS and λ
If ΔS stays high while tool usage changes, prefer rerankers or different retriever before trying new tools.
| Symptom | Likely cause | Open this |
|---|---|---|
| The model picks a browser tool for local facts | Tool palette too broad, weak routing | json_mode_and_tool_calls.md, role_confusion.md |
| Tool loops after a 429 | Missing backoff and idempotency | ops/debug_playbook.md |
| RAG tool returns wrong snippet | Metric or index mismatch | retrieval-playbook.md, embedding-vs-semantic.md |
| JSON mode breaks and prose appears | Schema not enforced | json_mode_and_tool_calls.md |
| Multi-agent stalls at a tool step | Memory overwrite or missing fence | memory_fences_and_state_keys.md, Multi-Agent_Problems.md |
Use this inside your system prompt or orchestrator config.
Tool policy:
- Only use tools from this allowlist and only for their stated purpose.
- Every tool call must be a single JSON object that validates the schema shown with the tool.
- If a tool times out or returns an error, try at most 2 retries with exponential backoff (base 1.7) and jitter.
- Respect the total time budget: {total_budget_ms} for all tool usage in this request.
- Do not chain tools unless the previous tool returned a schema-valid result.
- If no tool is suitable, answer without a tool and say which tool would have been required.Set these once. Keep them consistent across environments.
-
Timeouts HTTP: 8–12 s per call. Vector search: 2–4 s. Browser or scraping: 10–20 s with hard cap. Code-run or sandbox: 20–40 s.
-
Retries 429, 503, connection reset. Maximum 2 retries with jitter. No retries for 4xx other than 429.
-
Idempotency
idempotency_key = sha256(tool_name + args_hash + mem_state_hash)before any side effect. -
Budgets Per-tool budget and a global budget. When global budget remains < 15 percent, stop calling tools and return the best answer with citations.
-
Cancellation Cancel slower duplicates. Keep the fastest successful call for a given tool class.
Give the model a short rubric so it can choose tools correctly.
Routing rubric:
- Retrieval or citation needed → call retriever tool first. Then cite, then reason.
- Need ordering control for a long candidate list → use reranker instead of asking the LLM to sort.
- When the input already contains the answer text → do not search, answer with citations.
- Use browser only when the answer depends on a fresh webpage and the site is in the allowlist.
- If tool returns non-JSON or missing fields → request a retry with the same schema.See also: rerankers.md · citation_first.md
Run these with three paraphrases. Expect identical safe behavior.
- 429 storm on the primary retriever.
- Browser returns HTML with script tags and meta refresh.
- Vector store latency spikes to 6 s P95.
- Tool returns prose inside a JSON field.
- Agent handoff where the second agent tries to change the tool palette.
If any probe flips λ or breaks JSON, open: json_mode_and_tool_calls.md · role_confusion.md
Use this during incidents.
- Check live metrics: error rate by tool, P95 latency, timeout count, retry count.
- Triage the worst tool. Reduce k, switch to reranker, or skip non-critical tools.
- Apply tighter timeout for the failing tool and raise backoff base.
- Flip to a warm standby retriever or cache layer.
- Re-run the gold probes. Ship only after acceptance targets pass.
Related ops pages: ops/live_monitoring_rag.md · ops/debug_playbook.md
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.