Keyboard Input Methods — Guardrails and Fix Pattern

🧭 Quick Return to Map

You are in a sub-page of LanguageLocale.
To reorient, go back here:

LanguageLocale — localization, regional settings, and context adaptation

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A focused guide for bugs that originate from IME composition on Windows, macOS, Linux, iOS, and Android. Scope includes CJK IMEs (Pinyin, Wubi, Kana/Kanji, 2-set/3-set), Indic transliteration, RTL keyboards, and mixed fullwidth/halfwidth states. Use this when text looks fine to the eye but retrieval or validation behaves inconsistently across devices.

When to use this page

Reports say “works on Mac, fails on Windows IME” or “mobile input breaks search.”
Fields contain invisible marks after copy or composition (ZWJ, ZWNJ, NBSP, RLM/LRM).
Users toggle fullwidth digits or punctuation and recall suddenly collapses.
Romanization IMEs produce composed characters that differ from pasted text.

Open these first

Visual map and recovery: RAG Architecture & Recovery
End to end retrieval knobs: Retrieval Playbook
Multilingual overview: Multilingual Guide
Tokenizer mismatch: Tokenizer Mismatch
CJK word breaks: CJK Segmentation & Wordbreak
RTL markers and controls: RTL & Bidi Controls
Script mixing: Script Mixing
Diacritics: Diacritics & Folding
Snippet schema: Data Contracts · Retrieval Traceability

Core acceptance

ΔS(question, retrieved) ≤ 0.45
Coverage of target section ≥ 0.70
λ_observe remains convergent across 3 paraphrases, 2 seeds, and 2 devices
E_resonance stays flat on long input windows

Failure smells

“Cannot reproduce” until tester types through an IME rather than pasting.
Same glyphs, different bytes. Equality checks fail, search misses.
Index recall drops after mobile users enable fullwidth digits.
Mixed NBSP and normal space in otherwise identical queries.
Sporadic RTL flip caused by stray RLM/LRM from bidirectional typing.

Fix in 60 seconds

Normalize early On every input boundary apply NFC, width fold, and punctuation fold. Remove ZWJ, ZWNJ, LRM, RLM unless explicitly allowed by schema.
Stabilize tokenization Lock analyzers and tokenizers used for both indexing and querying. If ΔS remains high and flat after IME normalization, revisit metric and analyzer pairing in the store. See Retrieval Playbook.
Contract the payload For forms and tool calls, require fields that capture canonical and raw strings: raw, normalized, locale, ime_mode, width_state. Enforce this in your Data Contracts.
Probe λ Run the same query by paste, by IME typing, and by mobile. If λ flips only for IME-typed paths, you have an input normalization gap.

IME-safe schema (copy block)

Use this contract for any user text that enters retrieval or matching.

{
  "text": {
    "raw": "<exact keystroke result>",
    "normalized": "<NFC + width_fold + punct_fold + bidi_strip>",
    "locale": "zh-TW | zh-CN | ja-JP | ko-KR | hi-IN | ...",
    "ime_mode": "pinyin | wubi | kana | romaji | 2set | 3set | translit | rtl",
    "width_state": "half | full | mixed",
    "bidi_marks": ["RLM","LRM","ZWJ","ZWNJ","NBSP"]
  }
}

Store both raw and normalized. Index normalized. Retain raw for audits and display.

Normalization and folding rules

Issue	Action	Notes
Composition variance (NFD vs NFC)	Convert to NFC	Prevents byte inequality for identical glyphs
Fullwidth digits and Latin	Width fold to ASCII	Keep CJK letters untouched
Smart quotes, ellipsis, dashes	Punctuation fold to ASCII set	Avoid tokenizer splits that differ by device
Zero-width characters (ZWJ, ZWNJ)	Strip by default	Allow only if explicitly required by language rules
Bidi controls (LRM, RLM)	Strip at input for LTR schemas	Keep only in rich text fields, never in keys
NBSP, thin space	Map to normal space	Collapse runs of spaces to a single space
Kana halfwidth/fullwidth	Fold within script	Keep semantic marks like voiced sound when needed
Romanization IMEs	Canonicalize case and spacing	For JP/KR/Indic transliteration paths

Tests you should run

Triplet equality: paste vs IME vs mobile should produce identical normalized.
Search parity: same top-k ordering after normalization across devices.
Width flip test: force fullwidth digits and punctuation, verify recall remains constant.
Bidi contamination: inject RLM/LRM in the middle, verify strip or deterministic handling.
ΔS plateaus: if ΔS remains ≥ 0.60 after normalization, suspect metric mismatch or fragmentation and jump to Embedding ≠ Semantic and Vectorstore Fragmentation.

Escalate with these pages

Tokenizer and analyzer coupling: Tokenizer Mismatch
Script collisions and mixed runs: Script Mixing
CJK segmentation: CJK Segmentation & Wordbreak
RTL handling: RTL & Bidi Controls
Traceable answers: Retrieval Traceability

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keyboard Input Methods — Guardrails and Fix Pattern

When to use this page

Open these first

Core acceptance

Failure smells

Fix in 60 seconds

IME-safe schema (copy block)

Normalization and folding rules

Tests you should run

Escalate with these pages

🔗 Quick-Start Downloads (60 sec)

Explore More

FilesExpand file tree

keyboard_input_methods.md

Latest commit

History

keyboard_input_methods.md

File metadata and controls

Keyboard Input Methods — Guardrails and Fix Pattern

When to use this page

Open these first

Core acceptance

Failure smells

Fix in 60 seconds

IME-safe schema (copy block)

Normalization and folding rules

Tests you should run

Escalate with these pages

🔗 Quick-Start Downloads (60 sec)

Explore More