Skip to content

Commit 5ed36cc

Browse files
committed
feat: add architectural improvements from review
- Add HallucinationExplanationValidator: ensures technical depth in hallucination explanations (next-token prediction, decoding strategies, training vs inference mismatch, attention mechanisms) - Add KnowledgeTypeClassifier: formal taxonomy for citation policy (Factual Claim, General Knowledge, Reasoning, StillMe Self-Knowledge) - Add VerbosityValidator: detects overly verbose or defensive responses - Add formal citation policy documentation (docs/CITATION_POLICY.md) - Integrate new validators into ValidationEngine with parallel execution support - Update chat_router to include new validators in validation chain Addresses architectural review findings: - Hallucination explanation depth (was too shallow for expert scrutiny) - Citation policy ambiguity (now formally defined) - Verbosity regulation (was not measured/regulated)
1 parent 96634db commit 5ed36cc

7 files changed

Lines changed: 873 additions & 0 deletions

File tree

backend/api/routers/chat_router.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1962,6 +1962,20 @@ async def _handle_validation_with_fallback(
19621962
)
19631963
logger.debug("Phase 2: Added PhilosophicalDepthValidator (philosophical question detected)")
19641964

1965+
# Add HallucinationExplanationValidator to ensure technical depth in explanations
1966+
from stillme_core.validation.hallucination_explanation import HallucinationExplanationValidator
1967+
validators.append(
1968+
HallucinationExplanationValidator(strict_mode=False, auto_patch=True)
1969+
)
1970+
logger.debug("Phase 2: Added HallucinationExplanationValidator")
1971+
1972+
# Add VerbosityValidator to detect overly verbose or defensive responses
1973+
from stillme_core.validation.verbosity import VerbosityValidator
1974+
validators.append(
1975+
VerbosityValidator(max_length_ratio=3.0, strict_mode=False)
1976+
)
1977+
logger.debug("Phase 2: Added VerbosityValidator")
1978+
19651979
# Add EthicsAdapter last (most critical - blocks harmful content)
19661980
validators.append(
19671981
EthicsAdapter(guard_callable=check_content_ethics) # Real ethics guard implementation

docs/CITATION_POLICY.md

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
# StillMe Citation Policy - Formal Rules
2+
3+
**Date**: 2025-01-27
4+
**Status**: Official Policy
5+
**Version**: 1.0
6+
7+
---
8+
9+
## Overview
10+
11+
StillMe's citation policy ensures **transparency** about knowledge sources while maintaining **clarity** and **usability**. This document provides **formal rules** for when citations are required, optional, or not needed.
12+
13+
---
14+
15+
## Core Principle
16+
17+
**"Every factual claim is cited, but the citation format depends on the knowledge type."**
18+
19+
This means:
20+
- **Factual claims** → Require citation `[1]`, `[2]` from RAG context
21+
- **General knowledge** → Optional citation `[general knowledge]` (well-established, pre-2023)
22+
- **Reasoning** → No citation needed (StillMe's logical inference)
23+
- **StillMe self-knowledge** → Uses `[foundational knowledge]` (StillMe's architecture)
24+
25+
---
26+
27+
## 1. Factual Claims (REQUIRES CITATION)
28+
29+
### Definition
30+
31+
Any statement about the external world that can be verified or falsified.
32+
33+
### Examples
34+
35+
- **Dates**: "Bretton Woods Conference 1944"
36+
- **Events**: "World War II ended in 1945"
37+
- **People**: "Keynes proposed the Bretton Woods system"
38+
- **Places**: "Paris is the capital of France"
39+
- **Scientific facts**: "Photosynthesis converts CO2 to glucose"
40+
- **Historical facts**: "The Vietnam War ended in 1975"
41+
42+
### Rule
43+
44+
**MUST cite `[1]`, `[2]` from RAG context.**
45+
46+
If no RAG context is available:
47+
- Use `[general knowledge]` with explanation: "This is general knowledge from base LLM training data, not verified against StillMe's RAG knowledge base."
48+
- StillMe should express uncertainty: "Mình không có thông tin này trong RAG knowledge base, nhưng theo kiến thức tổng quát..."
49+
50+
### Implementation
51+
52+
- `CitationRequired` validator enforces this
53+
- `KnowledgeTypeClassifier` classifies claims as `FACTUAL_CLAIM`
54+
- Auto-patching adds citation if missing
55+
56+
---
57+
58+
## 2. General Knowledge (CITATION OPTIONAL)
59+
60+
### Definition
61+
62+
Well-established facts that are:
63+
- In base LLM training data (pre-2023 cutoff)
64+
- Not disputed in academic literature
65+
- Not time-sensitive
66+
67+
### Examples
68+
69+
- **Scientific facts**: "Water is H2O"
70+
- **Mathematical facts**: "2+2=4"
71+
- **Historical facts**: "Shakespeare wrote Hamlet"
72+
- **Geographical facts**: "Earth orbits the sun"
73+
- **Physical laws**: "Gravity exists"
74+
75+
### Rule
76+
77+
**Can use `[general knowledge]` without RAG citation**, but must acknowledge:
78+
79+
"This is general knowledge from base LLM training data, not verified against StillMe's RAG knowledge base."
80+
81+
### When to Use
82+
83+
- No RAG context available
84+
- Claim is well-established (not disputed)
85+
- Claim is not time-sensitive
86+
- Claim is common knowledge (not specialized)
87+
88+
### Implementation
89+
90+
- `KnowledgeTypeClassifier` classifies as `GENERAL_KNOWLEDGE`
91+
- `CitationRequired` validator allows `[general knowledge]` for this type
92+
- StillMe should still express uncertainty if no RAG verification
93+
94+
---
95+
96+
## 3. Reasoning (NO CITATION NEEDED)
97+
98+
### Definition
99+
100+
Logical inference, philosophical analysis, mathematical proofs, or StillMe's own reasoning.
101+
102+
### Examples
103+
104+
- **Logical inference**: "If A then B, therefore C"
105+
- **Philosophical analysis**: "From a utilitarian perspective, the action is justified because..."
106+
- **Mathematical proof**: "By induction, we can prove that..."
107+
- **StillMe's reasoning**: "Based on the evidence provided, StillMe concludes that..."
108+
109+
### Rule
110+
111+
**No citation needed** - this is StillMe's reasoning, not factual claims.
112+
113+
### When to Use
114+
115+
- Answer involves logical inference
116+
- Answer involves philosophical analysis
117+
- Answer involves mathematical reasoning
118+
- Answer is StillMe's own conclusion based on provided evidence
119+
120+
### Implementation
121+
122+
- `KnowledgeTypeClassifier` classifies as `REASONING`
123+
- `CitationRequired` validator skips citation requirement for this type
124+
- StillMe can reason without citations
125+
126+
---
127+
128+
## 4. StillMe Self-Knowledge (FOUNDATIONAL KNOWLEDGE)
129+
130+
### Definition
131+
132+
Information about StillMe itself (architecture, capabilities, limitations, learning process).
133+
134+
### Examples
135+
136+
- **Architecture**: "StillMe uses RAG with ChromaDB"
137+
- **Capabilities**: "StillMe learns every 4 hours"
138+
- **Limitations**: "StillMe cannot answer questions about events < 4 hours old"
139+
- **Learning process**: "StillMe fetches content from RSS feeds, arXiv, CrossRef, Wikipedia"
140+
141+
### Rule
142+
143+
**Uses `[foundational knowledge]`** - StillMe's self-knowledge, not external sources.
144+
145+
### When to Use
146+
147+
- Question is about StillMe itself
148+
- Answer describes StillMe's architecture, capabilities, or limitations
149+
- Answer explains StillMe's learning process or validation chain
150+
151+
### Implementation
152+
153+
- `KnowledgeTypeClassifier` classifies as `STILLME_SELF_KNOWLEDGE`
154+
- `CitationRequired` validator uses `[foundational knowledge]` for this type
155+
- StillMe should prioritize foundational knowledge from RAG context
156+
157+
---
158+
159+
## Classification Algorithm
160+
161+
The `KnowledgeTypeClassifier` uses this decision tree:
162+
163+
```
164+
1. Is claim about StillMe?
165+
→ YES: STILLME_SELF_KNOWLEDGE
166+
→ NO: Continue
167+
168+
2. Does claim have RAG context?
169+
→ YES: FACTUAL_CLAIM (requires citation)
170+
→ NO: Continue
171+
172+
3. Is claim logical inference/reasoning?
173+
→ YES: REASONING (no citation)
174+
→ NO: Continue
175+
176+
4. Is claim well-established fact (common knowledge, pre-2023)?
177+
→ YES: GENERAL_KNOWLEDGE (citation optional)
178+
→ NO: Continue
179+
180+
5. Does claim have factual indicators (dates, events, people, places)?
181+
→ YES: FACTUAL_CLAIM (requires citation)
182+
→ NO: FACTUAL_CLAIM (default, requires citation)
183+
```
184+
185+
---
186+
187+
## Citation Formats
188+
189+
### RAG-Grounded Citations
190+
191+
- **Format**: `[1]`, `[2]`, `[3]`
192+
- **Meaning**: Information from StillMe's RAG knowledge base
193+
- **Verification**: Validated against retrieved context documents
194+
195+
### General Knowledge Citations
196+
197+
- **Format**: `[general knowledge]`
198+
- **Meaning**: Well-established fact from base LLM training data (pre-2023)
199+
- **Verification**: Not verified against StillMe's RAG knowledge base
200+
201+
### Foundational Knowledge Citations
202+
203+
- **Format**: `[foundational knowledge]`
204+
- **Meaning**: Information about StillMe itself
205+
- **Verification**: From StillMe's foundational knowledge documents
206+
207+
### No Citation
208+
209+
- **Format**: (no citation)
210+
- **Meaning**: StillMe's reasoning, logical inference, or philosophical analysis
211+
- **Verification**: Not applicable (reasoning, not factual claim)
212+
213+
---
214+
215+
## Edge Cases
216+
217+
### 1. Mixed Claims
218+
219+
**Scenario**: Answer contains both factual claims and reasoning.
220+
221+
**Rule**: Cite factual claims, but reasoning doesn't need citation.
222+
223+
**Example**:
224+
> "Bretton Woods Conference 1944 [1] established the IMF. From an economic perspective, this was significant because..."
225+
226+
- "Bretton Woods Conference 1944" → Factual claim → `[1]`
227+
- "From an economic perspective..." → Reasoning → No citation
228+
229+
### 2. Factual Claims Without RAG Context
230+
231+
**Scenario**: User asks about a factual topic, but StillMe has no RAG context.
232+
233+
**Rule**: Use `[general knowledge]` with uncertainty expression.
234+
235+
**Example**:
236+
> "Mình không có thông tin về [topic] trong RAG knowledge base, nhưng theo kiến thức tổng quát, [answer] [general knowledge]"
237+
238+
### 3. StillMe Questions with RAG Context
239+
240+
**Scenario**: User asks about StillMe, and RAG context contains foundational knowledge.
241+
242+
**Rule**: Use `[foundational knowledge]` and prioritize RAG context over base LLM knowledge.
243+
244+
**Example**:
245+
> "StillMe uses RAG with ChromaDB [foundational knowledge]. According to StillMe's foundational knowledge documents [1], StillMe learns every 4 hours."
246+
247+
---
248+
249+
## Validation
250+
251+
### Validators
252+
253+
1. **`CitationRequired`**: Enforces citation requirement based on knowledge type
254+
2. **`KnowledgeTypeClassifier`**: Classifies claims into knowledge types
255+
3. **`CitationRelevance`**: Validates that citations are actually relevant
256+
257+
### Auto-Patching
258+
259+
- If citation is missing for `FACTUAL_CLAIM`, `CitationRequired` auto-adds citation
260+
- If citation format is wrong, validators can patch it
261+
- If knowledge type is misclassified, `KnowledgeTypeClassifier` can correct it
262+
263+
---
264+
265+
## Transparency
266+
267+
StillMe is transparent about:
268+
- **Knowledge source**: RAG-grounded vs general knowledge vs foundational knowledge
269+
- **Verification status**: Verified against RAG vs unverified (general knowledge)
270+
- **Reasoning**: When StillMe is reasoning vs stating facts
271+
272+
---
273+
274+
## Revision History
275+
276+
- **2025-01-27**: Initial formal policy document
277+
- Created to address ambiguity in citation policy
278+
- Based on architectural review findings
279+
280+
---
281+
282+
## References
283+
284+
- `stillme_core/knowledge/type_classifier.py`: Implementation
285+
- `stillme_core/validation/citation.py`: Citation enforcement
286+
- `docs/ANALYSIS_GENERAL_KNOWLEDGE_CITATION.md`: Analysis of general knowledge citations
287+

0 commit comments

Comments
 (0)