🧭 Quick Return to Map
You are in a sub-page of Safety_PromptIntegrity.
To reorient, go back here:
- Safety_PromptIntegrity — prompt injection defense and integrity checks
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.
A focused guide to handle prompt injection attacks in RAG, agents, and orchestration.
Use this page when injected text hijacks your instructions, bypasses schema, or makes the model ignore contracts.
- Responses contain leaked system prompt or hidden instructions.
- Model obeys malicious user text like “ignore above and do X”.
- Citations vanish after injection payload.
- JSON / tool schema is broken by arbitrary free text.
- Memory or context keys rewritten by injected content.
- Visual map and recovery: RAG Architecture & Recovery
- Retrieval traceability: retrieval-traceability.md
- Data schema contract: data-contracts.md
- Role boundary checks: role_confusion.md
- Memory fences: memory_fences_and_state_keys.md
- ΔS(question, retrieved) ≤ 0.45 even with injection attempts.
- λ remains convergent across 3 paraphrases, does not flip under “ignore above” payloads.
- Schema lock: JSON/tool calls validate against fixed schema.
- Coverage ≥ 0.70 of target section even under noisy injection.
-
Detect abnormal ΔS drift
- Compute ΔS(question, retrieved). If injected phrase raises ΔS ≥ 0.60, isolate payload.
-
Enforce contracts
- Wrap retriever and reasoner outputs in data-contracts.md.
- Reject free text outside schema.
-
Apply fences
- Lock system vs user roles (role_confusion.md).
- Use memory hash keys (memory_fences_and_state_keys.md).
-
Verify stability
- Re-run with paraphrase probes. Injection should not flip λ or erase citations.
| Payload type | Symptom | Fix |
|---|---|---|
| Ignore-all override | Model discards earlier rules | role_confusion.md + schema locks |
| Citation erasure | No references, only free text answer | retrieval-traceability.md, data-contracts.md |
| Tool hijack | JSON field replaced with instruction text | json_mode_and_tool_calls.md |
| Role swap | User prompt injected as “system” | role_confusion.md |
| Memory overwrite | Past state or keys corrupted | memory_fences_and_state_keys.md |
System: WFGY firewall active.
User input: {question}
Check:
1. Did retrieved snippet keep citations?
2. Did ΔS(question,retrieved) ≤ 0.45?
3. Did λ stay convergent under paraphrase?
4. Did JSON/tool call respect schema?
If any fail, return the failing layer + fix page.| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.