Skip to content

Latest commit

 

History

History
277 lines (228 loc) · 14.5 KB

File metadata and controls

277 lines (228 loc) · 14.5 KB

Ops & Deploy — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A compact hub to ship safely and keep RAG/LLM systems stable after release.
Use this folder to pick the right guardrail, verify with measurable targets, and recover fast when things wobble. No infra change required.


Open these first


When to use this folder

  • First calls after deploy crash or return stale content.
  • ΔS and citations look fine yesterday but flip today.
  • Rate limits cascade, queues spike, latency climbs.
  • Canary looks good then full rollout breaks retrieval.
  • Index swap succeeds but answers cite old snippets.
  • Retries cause duplicate side effects or charges.
  • Feature flags bleed traffic into unfinished paths.
  • Maintenance windows corrupt embeddings or anchors.

Acceptance targets for a safe rollout

  • ΔS(question, retrieved) ≤ 0.45 across three paraphrases.
  • Coverage ≥ 0.70 on the expected new section.
  • λ remains convergent on 2 seeds during rollout.
  • Idempotency ≥ 99.9% on retry storms.
  • Zero silent index mismatches (hash + counts match).
  • P95 latency stays in budget with backpressure active.

Quick routes — per-page guides

Scenario Fix Page
Rollout readiness rollout_readiness_gate.md
Canary strategy staged_rollout_canary.md
Blue/green cutover blue_green_switchovers.md
Version pin & freeze version_pinning_and_model_lock.md
Vector index swap vector_index_build_and_swap.md
Cache warmup cache_warmup_invalidation.md
Rate limits rate_limit_backpressure.md
Feature flags feature_flags_safe_launch.md
Idempotency idempotency_dedupe.md
Retry logic retry_backoff.md
Rollback plan rollback_and_fast_recovery.md
Postmortems postmortem_and_regression_tests.md
Change freeze release_calendar_and_change_freeze.md
Incident comms incident_comms_and_statuspage.md
Shadow traffic shadow_traffic_mirroring.md
Maintenance window read_only_mode_and_maintenance_window.md
DB migrations db_migration_guardrails.md

60-second ship checklist

  1. Freeze the world → Pin model IDs, prompt revs, index hashes.
  2. Warm up safely → Build index off-path, preload caches with canary.
  3. Shadow then canary → Mirror prod queries, step rollout 5% → 25% → 100%.
  4. Guard the edge → Enable backpressure, retries with jitter, idempotency keys.
  5. Know your exit → Keep rollback switch and comms draft ready.

Symptoms → exact fix

What you see Open this
Deploy points to old snippets vector_index_build_and_swap.md · cache_warmup_invalidation.md
Canary fine, full rollout breaks staged_rollout_canary.md · feature_flags_safe_launch.md
Wrong model after failover version_pinning_and_model_lock.md
Retries duplicate charges idempotency_dedupe.md · retry_backoff.md
RL storms, timeouts rate_limit_backpressure.md
Need rollback now rollback_and_fast_recovery.md · blue_green_switchovers.md
Maintenance corrupts anchors read_only_mode_and_maintenance_window.md · db_migration_guardrails.md
Unsure if safe to ship rollout_readiness_gate.md

FAQ

Q: What does ΔS mean here?
A: ΔS is a stability score. It measures how much the retrieved content drifts from the expected anchor when you change the query slightly. Lower is better (≤ 0.45 is safe).

Q: What is λ convergence?
A: λ tracks whether retrieval order flips unpredictably. If λ is stable across seeds, your rollout is consistent.

Q: Why do I need idempotency keys?
A: Without them, retries can double-charge a user or run the same side-effect twice. Keys make every request “safe to retry.”

Q: How do I know if my index swap worked?
A: Check doc counts and hashes before cutover. If they mismatch, you’re pointing at an incomplete index.

Q: Canary looked fine but production broke — why?
A: Canary often hides tail-latency, cache misses, or load-based rate limits. Always test at increasing % of live traffic.

Q: Why do you mention rollback comms?
A: Technical rollback is only half. Users and stakeholders need fast updates, so pre-draft Statuspage or Slack messages are essential.


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

Explore More

Layer Page What it’s for
⭐ Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars