Zero-hallucination OLMo deployment in Brazilian tax law — architectural case study

Hi OLMo team,

I'm sharing a production deployment of OLMo in a highly regulated domain that might be of interest to the community — particularly around hallucination prevention and auditability.

### Context

Brazilian tax law is one of the most complex regulatory environments in the world. In January 2025, Brazil enacted LC 214/2025 — a comprehensive tax reform that restructured the entire indirect tax system. Professionals in this domain need AI systems that are **100% reliable**: a wrong answer doesn't just look bad — it creates legal and financial liability.

Most LLM-based legal/tax tools rely on prompting or RLHF to reduce hallucinations. The best published results still show significant hallucination rates (industry benchmarks range from 17% to 58%+ depending on the system). We took a fundamentally different approach.

### Architecture: "Take the pen away from the LLM"

Our core design principle: **the LLM never generates factual content**. Instead, it serves two narrow roles:

1. **Intent classification** — understanding what the user is asking
2. **Natural language formatting** — presenting retrieved facts in conversational Portuguese

All factual content comes from a **deterministic retrieval pipeline** that combines:
- Hybrid search (dense + sparse retrieval)
- A repository of **human-approved solutions** (case-based reasoning, documented since 1992)
- Coverage gating: if the system can't cover ≥70% of the query from verified sources, it **refuses to answer** rather than hallucinate

The LLM is deliberately constrained to a "classifier + formatter" role. It never writes the answer — it only decides which pre-approved answer to surface and how to phrase it.

### Results

| Metric | Value |
|--------|-------|
| Hallucination rate | **0%** (across all test cases) |
| Entity resolution accuracy | 100% (31/31 cases) |
| Latency | 60–200ms |
| Audit trail | Every response includes: source, hash, confidence score, timestamp |

### Published artifacts

- HuggingFace: [[jonywolff1818/or-v11](https://huggingface.co/jonywolff1818/or-v11)](https://huggingface.co/jonywolff1818/or-v11)
- The system runs fully offline when required (air-gapped deployment for sensitive regulatory environments)

### Why I'm sharing this

I believe this approach — using OLMo as a **classifier rather than a generator** in high-stakes domains — is an underexplored pattern that could be valuable for the OLMo research community. It's also directly relevant to the work on OLMoTrace, since our architecture achieves full traceability by design (every response maps to a verified source, not to training data).

I'd welcome feedback from the team, particularly:
- Has the OLMo team seen similar "LLM-as-classifier" architectures in other regulated domains?
- Would this be a useful case study for the OLMo ecosystem documentation?
- Are there OLMo-specific optimizations for classification-only workloads (vs. generation)?

Happy to share more technical details or do a walkthrough.

**Jony Wolff**
Kesshet AI Systems — Tel Aviv · São Paulo
jony@kesshet.com.br
[[LinkedIn](https://www.linkedin.com/in/jony-wolff-25510839b)](https://www.linkedin.com/in/jony-wolff-25510839b)

---

## LABELS SUGERIDAS:
`discussion` `community` `use-case`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-hallucination OLMo deployment in Brazilian tax law — architectural case study #904

Context

Architecture: "Take the pen away from the LLM"

Results

Published artifacts

Why I'm sharing this

LABELS SUGERIDAS:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
Hallucination rate	0% (across all test cases)
Entity resolution accuracy	100% (31/31 cases)
Latency	60–200ms
Audit trail	Every response includes: source, hash, confidence score, timestamp

Zero-hallucination OLMo deployment in Brazilian tax law — architectural case study #904

Description

Context

Architecture: "Take the pen away from the LLM"

Results

Published artifacts

Why I'm sharing this

LABELS SUGERIDAS:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions