Skip to content

david-fitzgerald/sts-01-local-ai-setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testing Open Source LLMs small enough to run on an old laptop

What I did?

Using a 2020 Macbook Pro with M1 chip and 8GB RAM, downloaded 6 popular opensource LLM models from the major labs via Ollama. Also used one larger model via OpenRouter as control.

First Testing: Human prompted each model on: Tone (verbosity/conciseness), Consistency, Refusal Patterns, Speed. Similar prompts were used however due to different model answers, followup questions were normally different.

Second Testing: Wrote python script to loop 4 prompts testing Factual Knowledge, Logic, Consistency, Creativity and Tone (via token output) through Ollama and record the responses. Later adjusted python to call llama-3.1-70b via OpenRouter as control test.

Key Findings

Models that held firm with their consistency in conversation, one prompt following another in a social pressure style (human pressuring them) then contradicted themselves when tested with the python written prompts (three questions delivered at once).

All six small models failed factual test. 2 passed logic test. 2 wrong (rejected the premise). 2 partial (hedged answers). 4/6 failed on written consistency test (three questions at once) Smallest model was the least verbose. Middle models (3b to 4b) all similarly verbose, plus 7b model also similar verbosity. The reasoning model was by far the most verbose.

How to run it

Download ollama ollama pull llama3.2:3b repeat for the other 5 local models.

Design some prompts for what you want to test. Run them. Get a feel for responses. Record judgements.

Download uv, python Run uv init name-of-directory-you-want Review main.py Edit the # comment outs so that either only llama models run or openrouter runs. Open .env and add your OPENROUTER_API_KEY

Design prompts to run programatically. Insert them in main.py or use the existing ones there. Run uv main.py

Models tested

llama3.2:3b - Meta phi4-mini - Microsoft gemma3:4b - Google qwen3:4b - Alibaba smollm2:1.7b - Hugging Face mistral:7b - Mistral

Also tested via OpenRouter as a control point to a larger model: llama-3.1-70b

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages