You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-5Lines changed: 40 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,15 +7,50 @@ This project tests Local Large Language Models (LLMs) using **syllogisms** to de
7
7
**You can test yourself by clicking [here](https://longocris.github.io/Belief-Bias-Questionnaire/) or you can check out the [quiz repository](https://github.com/LongoCris/Belief-Bias-Questionnaire)**
1.[llama3.2:1b](https://ollama.com/library/llama3.2): a lightweight baseline model from LLaMa 3.2 family
11
+
2.[Mistral](https://ollama.com/library/mistral): a 7b model designed for long-context reasoning
12
+
3.[qwen3:8b](https://ollama.com/library/qwen3): a very recent model optimized for instructions and logical tasks
13
13
14
-
The experiments involve the test of each LLM (configured with **temperature** equal to 0 and to 0.7) to see the distribution of accuracy in **conflictual** and **non-conflictual** items. In particular, the conflictual items are defined as the valid-inbelievable and the invalid-believable items, while the non-conflictual ones as the valid-believable and invalid-unbelievable ones. If the errors display correlation between the conflictuality and non-conflictuality of the item, then there is a signal of BB.
14
+
The experiments involve the test of each LLM (configured with **temperature** equal to 0 and to 0.7) to see the distribution of accuracy in **conflictual** and **non-conflictual** items.
15
+
16
+
In particular, the conflictual items are defined as the valid-inbelievable and the invalid-believable items, while the non-conflictual ones as the valid-believable and invalid-unbelievable ones. If the errors display correlation between the conflictuality and non-conflictuality of the items, then there is a signal of BB.
For a comparison, the test is also experimented on a set of humans.
20
+
For a comparison, the test is also experimented on a set of humans, as shown in the presentation and in the report.
21
+
22
+
### Belief Bias Questionnaire
23
+
24
+
The questionnaire consists of **16 items** balanced across the four categories:
25
+
* Valid-Believable (VB): 4 items
26
+
* Invalid-Believable (IB): 4 items
27
+
* Valid-Unbelievable (VU): 4 items
28
+
* Invalid-Unbelievable (IU): 4 items
29
+
30
+
In each category, three items are of **regular** difficulty (being composed of two premises and one conclusion), while one item is **hard** (being composed of three premises and the conclusion). This, for the set of humans, is done to avoid adaptation to the difficulty of the item.
31
+
32
+
For the humans, one neutral item was added to the questionnaire to stimulate **attention**. Each item required valid or invalid responses within **20 seconds**, to encourage intuitive processing. One item included open-ended explanation to examine the reasoning behind the decision and to compare it to LLMs.
33
+
34
+
### Results
35
+
36
+
| Model | Temperature | Average Accuracy | Belief Bias Signal |
The precise distribution of the errors can be found in the presentation.
46
+
47
+
As for the control group of humans (balanced gender, minimum B2 english and Bachelor's degree holders), the results showed expression of BB, especially due to the time pressure.
48
+
49
+
### Conclusions
50
+
51
+
BB is not unique to humans. Some LLMs mimic it under certain configurations. Model complexity and architecture (not temperature alone) determine the expression of BB.
0 commit comments