This repo contains our work from the Andon Labs Hackathon at Linköping University, where we evaluated LLMs on SweSAT (Högskoleprovet) question types.
openai/gpt-4o-mini
openai/gpt-4o
anthropic/claude-3-5-haiku-latest
anthropic/claude-3-5-sonnet-latest
o1-mini
Questions sourced from: 👉 github.com/ViktorAlm/HP
Covers:
- Reading Comprehension (RC) – SV & EN
- Sentence Completion (MEK) – SV & EN
- Vocabulary (Words) – SV only
- GPT-4o and Claude 3.5 Sonnet consistently outperformed others.
- LLMs are nearing human-level performance on standardized tests.
- Prompt quality had a notable impact on accuracy.
Can an LLM pass the SweSAT? For some question types, yes—especially with the right prompt and a top-tier model.