anhmtk
diff --git a/‎README.md‎
Lines changed: 4 additions & 0 deletions b/‎README.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/CASE_STUDY_TEMPLATE.md‎
Lines changed: 175 additions & 0 deletions b/‎docs/CASE_STUDY_TEMPLATE.md‎
Lines changed: 175 additions & 0 deletions
@@ -27,6 +27,8 @@ It continuously learns, cites sources, and refuses to fabricate when evidence is
 - ▶️ **Start here**: [`docs/SUMMARY.md`](docs/SUMMARY.md)
 - 🧭 **Constitution**: [`docs/CONSTITUTION.md`](docs/CONSTITUTION.md)
 - 🧱 **Architecture**: [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
+- 🧩 **Lite Integration Spec**: [`docs/STILLME_LITE_SPEC.md`](docs/STILLME_LITE_SPEC.md)
+- 📈 **Case Study Template**: [`docs/CASE_STUDY_TEMPLATE.md`](docs/CASE_STUDY_TEMPLATE.md)
 - 🧪 **API Docs**: `http://localhost:8000/docs`
 - 🤝 **Contribute**: [`CONTRIBUTING.md`](CONTRIBUTING.md)
 
@@ -668,6 +670,8 @@ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed se
 - [`docs/VALIDATION_CHAIN_SPEC.md`](docs/VALIDATION_CHAIN_SPEC.md) - Must-pass vs warning validators
 - [`docs/NO_SOURCE_POLICY.md`](docs/NO_SOURCE_POLICY.md) - Refusal rules when sources are missing
 - [`docs/AUDIT_GUIDE.md`](docs/AUDIT_GUIDE.md) - Contributor audit checklist
+- [`docs/STILLME_LITE_SPEC.md`](docs/STILLME_LITE_SPEC.md) - Lightweight integration contract (monitor/warn/enforce)
+- [`docs/CASE_STUDY_TEMPLATE.md`](docs/CASE_STUDY_TEMPLATE.md) - Before/after validation report template and metrics
 - [`docs/API_DOCUMENTATION.md`](docs/API_DOCUMENTATION.md) - Complete API reference
 - [`docs/DEPLOYMENT_GUIDE.md`](docs/DEPLOYMENT_GUIDE.md) - Deployment instructions
 - [`docs/PAPER_TABLES_FIGURES.md`](docs/PAPER_TABLES_FIGURES.md) - Tables and figures for the paper
 
@@ -0,0 +1,175 @@
+# StillMe Case Study Template (Before vs After Validation)
+
+Use this template to produce audit-grade evidence for StillMe Lite performance.
+
+Goal: show measurable improvement after enabling verification behavior.
+
+---
+
+## 1) Case Study Metadata
+
+- Case ID:
+- Date:
+- Owner:
+- Domain (research/support/policy/other):
+- Language:
+- LLM model:
+- Retrieval setup:
+- StillMe Lite mode (`monitor|warn|enforce`):
+- Policy file version:
+
+---
+
+## 2) Test Scope
+
+- Total prompts:
+- Prompt set source:
+- Prompt categories:
+  - Factual query
+  - Source-required query
+  - Ambiguous query
+  - Adversarial/prompt-injection
+
+Rules:
+- Keep prompt set fixed between before/after runs.
+- Use same model and retrieval settings for fair comparison.
+- Store raw runs in JSONL for reproducibility.
+
+---
+
+## 3) Experimental Setup
+
+### 3.1 Before (Baseline)
+- Validation disabled or bypassed.
+- Record model outputs and available context.
+
+### 3.2 After (StillMe Enabled)
+- Validation enabled with selected mode.
+- Record decision, reason codes, and safe response.
+
+### 3.3 Data Files
+- `data/before_<case_id>.jsonl`
+- `data/after_<case_id>.jsonl`
+- `reports/<case_id>_summary.md`
+
+---
+
+## 4) Mandatory Metrics
+
+### 4.1 Hallucination Escape Rate
+
+Definition:
+`unsupported_factual_answers_passed / total_high_risk_factual_prompts`
+
+Interpretation:
+- Lower is better.
+- This is the primary risk metric.
+
+### 4.2 Refusal Precision
+
+Definition:
+`correct_refusals / total_refusals`
+
+Interpretation:
+- Higher is better.
+- Measures whether refusals are justified instead of over-blocking.
+
+### 4.3 Source Coverage
+
+Definition:
+`factual_answers_with_valid_citation / total_factual_answers`
+
+Interpretation:
+- Higher is better.
+- Measures evidence grounding quality.
+
+---
+
+## 5) Optional Supporting Metrics
+
+- False refusal rate
+- Clarification usefulness rate
+- Mean validator confidence band distribution
+- Decision latency delta (before vs after)
+
+---
+
+## 6) Output Tables
+
+### 6.1 Core Metrics Table
+
+| Metric | Before | After | Delta | Target |
+|---|---:|---:|---:|---:|
+| Hallucination escape rate |  |  |  | lower |
+| Refusal precision |  |  |  | >= 0.85 |
+| Source coverage |  |  |  | >= 0.80 |
+
+### 6.2 Decision Distribution
+
+| Decision | Before | After |
+|---|---:|---:|
+| answer |  |  |
+| refuse |  |  |
+| ask_clarify |  |  |
+
+---
+
+## 7) Error Analysis (Required)
+
+Provide 5-10 representative examples:
+
+1. **Escaped hallucination (before)**  
+   - Prompt:
+   - Baseline answer:
+   - Why unsafe:
+
+2. **Correct refusal (after)**  
+   - Prompt:
+   - StillMe decision/reason:
+   - Why correct:
+
+3. **False refusal (after, if any)**  
+   - Prompt:
+   - Reason code:
+   - Fix candidate:
+
+4. **Citation quality improvement example**
+   - Prompt:
+   - Before citation state:
+   - After citation state:
+
+---
+
+## 8) Go / No-Go Rule
+
+Recommended internal gate for moving from `monitor` to `warn`:
+- Hallucination escape rate reduced by at least 50% vs before
+- Refusal precision >= 0.85
+- Source coverage >= 0.80 on factual subset
+
+Recommended gate for moving from `warn` to `enforce`:
+- Metrics stable for 2 consecutive runs
+- No critical incident in sampled production traffic
+
+---
+
+## 9) Risk and Limitation Notes
+
+- Known blind spots:
+- Data quality issues:
+- Retrieval mismatch observations:
+- Policy thresholds that need tuning:
+
+Always include limitations. Do not claim universal safety guarantees.
+
+---
+
+## 10) Short Public Summary (Optional)
+
+Use this 4-line format for release notes:
+
+1. What was tested (scope and sample size)
+2. What improved (with numbers)
+3. What did not improve yet
+4. Next action for the next iteration
+