You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add batch processing LLM, cn prompts
* Restructure predictors
* lettuce multi
* first version of the eurobert blogpost
* Change blog post and README
* Added Openai API key
* A bit more stuff to README
* Remove Redundancy
* Bump version
* Bump version in README
* Remove emojis
* Mini changes in README and EUROBERT
* Final changes in blog
* Typo
* Predictions
* Different image
* Changed pytest
* Fixed tests
Copy file name to clipboardExpand all lines: README.md
+60-23Lines changed: 60 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
<br><em>Because even AI needs a reality check! 🥬</em>
7
7
</p>
8
8
9
-
LettuceDetect is a lightweight and efficient tool for detecting hallucinations in Retrieval-Augmented Generation (RAG) systems. It identifies unsupported parts of an answer by comparing it to the provided context. The tool is trained and evaluated on the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset and leverages [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) for long-context processing, making it ideal for tasks requiring extensive context windows.
9
+
LettuceDetect is a lightweight and efficient tool for detecting hallucinations in Retrieval-Augmented Generation (RAG) systems. It identifies unsupported parts of an answer by comparing it to the provided context. The tool is trained and evaluated on the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset and leverages [ModernBERT](https://github.com/AnswerDotAI/ModernBERT) for English and [EuroBERT](https://huggingface.co/blog/EuroBERT/release) for multilingual support, making it ideal for tasks requiring extensive context windows.
10
10
11
11
Our models are inspired from the [Luna](https://aclanthology.org/2025.coling-industry.34/) paper which is an encoder-based model and uses a similar token-level approach.
12
12
@@ -21,17 +21,24 @@ Our models are inspired from the [Luna](https://aclanthology.org/2025.coling-ind
21
21
- LettuceDetect addresses two critical limitations of existing hallucination detection models:
22
22
- Context window constraints of traditional encoder-based methods
23
23
- Computational inefficiency of LLM-based approaches
24
-
- Our models currently **outperforms** all other encoder-based and prompt-based models on the RAGTruth dataset and are significantly faster and smaller
24
+
- Our models currently **outperform** all other encoder-based and prompt-based models on the RAGTruth dataset and are significantly faster and smaller
25
25
- Achieves higher score than some fine-tuned LLMs e.g. LLAMA-2-13B presented in [RAGTruth](https://aclanthology.org/2024.acl-long.585/), coming up just short of the LLM fine-tuned in the [RAG-HAT paper](https://aclanthology.org/2024.emnlp-industry.113.pdf)
26
-
- We release the code, the model and the tool under the **MIT license**
26
+
27
+
## 🚀 Latest Updates
28
+
29
+
-**May 18, 2025** - Released version **0.1.7**: Multilingual support (thanks to EuroBERT) for 7 languages: English, German, French, Spanish, Italian, Polish, and Chinese!
30
+
- Up to **17 F1 points improvement** over baseline LLM judges like GPT-4.1-mini across different languages
We've trained 210m and 610m variants of EuroBERT, see our HuggingFace collection: [HF models](https://huggingface.co/collections/KRLabsOrg/multilingual-hallucination-detection-682a2549c18ecd32689231ce)
68
+
69
+
70
+
*See the full list of models and smaller variants in our [HuggingFace page](https://huggingface.co/KRLabsOrg).*
56
71
57
72
You can get started right away with just a few lines of code.
58
73
59
74
```python
60
75
from lettucedetect.models.inference import HallucinationDetector
# Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}]
76
100
```
77
101
78
-
## Performance
102
+
Check out our [HF collection](https://huggingface.co/collections/KRLabsOrg/multilingual-hallucination-detection-682a2549c18ecd32689231ce) for more examples.
79
103
80
-
**Example level results**
104
+
We also implemented LLM-based baselines, for that add your OpenAI API key:
81
105
82
-
We evaluate our model on the test set of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. Our large model, **lettucedetect-large-v1**, achieves an overall F1 score of 79.22%, outperforming prompt-based methods like GPT-4 (63.4%) and encoder-based models like [Luna](https://aclanthology.org/2025.coling-industry.34.pdf) (65.4%). It also surpasses fine-tuned LLAMA-2-13B (78.7%) (presented in [RAGTruth](https://aclanthology.org/2024.acl-long.585/)) and is competitive with the SOTA fine-tuned LLAMA-3-8B (83.9%) (presented in the [RAG-HAT paper](https://aclanthology.org/2024.emnlp-industry.113.pdf)). Overall, **lettucedetect-large-v1** and **lettucedect-base-v1** are very performant models, while being very effective in inference settings.
106
+
```bash
107
+
export OPENAI_API_KEY=your_api_key
108
+
```
83
109
84
-
The results on the example-level can be seen in the table below.
At the span level, our model achieves the best scores across all data types, significantly outperforming previous models. The results can be seen in the table below. Note that here we don't compare to models, like [RAG-HAT](https://aclanthology.org/2024.emnlp-industry.113.pdf), since they have no span-level evaluation presented.
We've evaluated our models against both encoder-based and LLM-based approaches. The key findings include:
97
125
126
+
- In English, our model **outperform** all other encoder-based and prompt-based models on the RAGTruth dataset and are significantly faster and smaller
127
+
- Our multilingual models are better than baseline LLM judges like GPT-4.1-mini
128
+
- Our models are also significantly faster and smaller than the LLM-based judges
129
+
130
+
For detailed performance metrics and evaluations of our models:
131
+
-[English model documentation](docs/README.md)
132
+
-[Multilingual model documentation](docs/EUROBERT.md)
133
+
-[Paper](https://arxiv.org/abs/2502.17125)
134
+
-[Model cards](https://huggingface.co/KRLabsOrg)
98
135
99
136
## How does it work?
100
137
@@ -229,11 +266,11 @@ positional arguments:
229
266
options:
230
267
-h, --help show this help message and exit
231
268
--model MODEL Path or huggingface URL to the model. The default value is
232
-
"KRLabsOrg/lettucedect-base-modernbert-en-v1".
269
+
"KRLabsOrg/lettucedetect-base-modernbert-en-v1".
233
270
--method {transformer}
234
271
Hallucination detection method. The default value is
0 commit comments