On-Device AI Hackathon co-hosted by Qualcomm Technologies, Microsoft, and Northeastern University. Won Second Place out of 28 teams and the People's Choice Award
A narrative-driven Connect Four game powered by local LLMs that react to your moves with dynamic storytelling.
ConquestFour transforms the classic Connect Four game into an immersive narrative experience using local large language models. Instead of reinventing the wheel, we've enhanced a familiar game with AI-powered storytelling that runs entirely on your device - no internet connection required!
The game features an intelligent AI opponent that not only challenges your strategic thinking but also narrates the game with contextual commentary, adapting its tone based on your play style and move quality.
- Intelligent AI Opponent: Three difficulty levels (Easy, Medium, Hard) with different play styles
- Narrative Generation: LLM provides contextual commentary on moves using Mistral-7B
- Narrative Director: Story premise + opening/mid/end arc control driven by move quality signals
- π Apple Silicon Optimized Path: Stable inference via Ollama + llama.cpp + Metal GPU backend (tested on Apple M3 Pro)
- Optimized Inference: INT8 quantization for reduced latency and memory usage
- Thermal Management: Automatically detects system temperature and scales AI operations
- Themed Experience: Narratives adapt to different contextual themes (Western, Fantasy, Sci-Fi)
- Interactive Chat: Communicate directly with the AI during gameplay
- Local Processing: All AI processing runs locally - no data leaves your device
[Gameplay Screenshot] [Narrative Example]
- Python 3.9-3.12 (Do not go past 3.12 or compatibility issues)
- Ollama - for running local LLMs
- FFmpeg (for speech features)
-
Clone the repository:
git clone https://github.com/your-username/llm-conquestfour.git cd llm-conquestfour -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate.bat
-
Install dependencies:
pip install -r requirements.txt
-
Install and start Ollama:
# Install from https://ollama.com/ # Then run: ollama serve
-
Download the Mistral model:
# In a separate terminal ollama pull mistral
For Windows users, we provide automated setup scripts:
# Using Git Bash
./demo/setup_windows_gitbash.sh
# Or using Command Prompt
demo\run_game_windows.batSee Windows Setup Guide for detailed instructions.
To start the game, execute the following command:
python main.py- Click on a column to drop your piece
- Use the chat box to communicate with the AI narrator
- Select different themes and difficulty levels in the settings menu
ConquestFour is built with a modular architecture:
- Game Logic Core: Python implementation with minimax algorithm and alpha-beta pruning
- UI Layer: PyQt6-based responsive user interface
- AI Engine: Ollama-based LLM integration using llama.cpp + Metal GPU acceleration
- Narrative System: Context-aware prompt generation with themed templates
- System Monitoring: Thermal-aware performance adjustment
βββ ai/
β βββ ollama/ # Ollama LLM integration (llama.cpp + Metal)
β βββ onnx_runtime/ # ONNX POC tests (archived - CoreML incompatible with LLMs)
βββ game/ # Core game logic and minimax implementation
βββ ui/ # PyQt6 user interface
βββ speech_to_text/ # Speech recognition (optional)
βββ text_to_speech/ # Text-to-speech synthesis (optional)
βββ docs/ # Technical documentation and benchmarks
βββ assets/ # Game assets and images
Hardware: Apple M3 Pro (18GB RAM)
Methodology: Interactive timing from player move to AI response.
| Model | Mean Latency | vs Baseline | Method | Status |
|---|---|---|---|---|
| Mistral-7B | 5.55s Β± 0.38s | Baseline (100%) | llama.cpp + Metal GPU | β Validated |
| Phi-3-mini (3.8B), run #1 (n=15) | 4.25s Β± 0.49s | -23.4% | llama.cpp + Metal GPU | β Measured |
| Phi-3-mini (3.8B), run #2 after latency tuning (n=15) | 4.45s Β± 0.38s | -19.8% | llama.cpp + Metal GPU | β Measured |
| Phi-3-mini combined (n=30) | 4.35s Β± 0.31s | -21.6% | llama.cpp + Metal GPU | β Measured |
Key Findings:
- ONNX + CoreML approach failed: Only 3.8% operator coverage on transformers (99/2,616 nodes supported)
- CoreML designed for vision models, not LLMs (max embedding dim 16,384 vs 32,000+ needed)
- Current stack already optimal: Ollama uses llama.cpp with Metal GPU (correct approach for Apple Silicon)
- Measured improvement: Phi-3-mini + runtime tuning reduced mean latency from 5.55s to 4.35s across two runs (~21.6% faster, n=30)
- Consistency gain: latest run stayed below 6s (max 5.35s), removing prior long-tail spikes
- Where time goes: minimax move/eval is typically sub-150ms; LLM generation dominates end-to-end latency
From [perf] logs:
llm_chat_ms: mean4011.8ms, min2965.8ms, max4933.8ms, stdev759.0msminimax_ai_move_ms: mostly0.5msto133.7msminimax_eval_ms: mostly2.6msto40.9ms
Interpretation:
- The game logic path is fast and stable.
- Most variance comes from LLM output length/style and prompt complexity, not minimax.
- The app uses one LLM narrator voice (
Gemma) for move-by-move storytelling. - If player reactions are added, prefer rule-based templates for
good/mediocre/badevents. - Avoid a second per-move LLM call for "Andrew replies" by default:
- doubles token generation load,
- increases latency variance,
- grows history faster and can cause style drift.
- Reserve LLM player text for explicit user input (typed or spoken chat).
- macOS (Apple Silicon): Prefer
Ollama + llama.cpp + Metal(default in this repo). - Windows/Linux: ONNX can be a good deployment path when paired with a strong execution provider (for example TensorRT/CUDA/DirectML), but performance depends on provider coverage for transformer ops.
- This repo's ONNX/CoreML path remains experimental/research-only for LLMs on macOS.
See docs/baseline_performance.md and docs/lessons_learned_onnx_coreml.md for details.
What We Learned: ONNX + CoreML approach was tested but found incompatible with LLMs.
POC Test Results (TinyLlama-1.1B):
β
PyTorch baseline: 2.90s
β ONNX + CoreML: Failed (only 99/2,616 nodes supported = 3.8%)
Error: "CoreML does not support input dim > 16384" (LLMs need 32K+ vocab)
Why CoreML Failed:
- CoreML designed for vision models (ResNet, MobileNet), not transformers
- Missing operators:
Split, dynamic shapes, KV-cache patterns - Max tensor dimension: 16,384 (LLMs need 32,000+ for embeddings)
Correct Approach (already implemented):
- Your baseline uses llama.cpp + Metal GPU via Ollama
- This is the optimal stack for LLMs on Apple Silicon
- Neural Engine (CoreML) is for vision/audio models only
The game now supports backend selection in the startup UI:
Auto (Recommended): Uses stable Ollama + llama.cpp + MetalOllama + Metal (Stable): Forces stable backendApple NPU / CoreML (Experimental): Attempts ONNX/CoreML only when Apple M3 is detected, otherwise falls back to stable backend
Notes:
- Experimental NPU mode is for research and may be slower or less stable for LLM inference.
- If ONNX/CoreML initialization fails, the app automatically falls back to stable Ollama/Metal.
- Optional env vars for experimental mode:
CONQUEST4_ONNX_MODEL_PATH(default:models/mistral-onnx)CONQUEST4_USE_NEURAL_ENGINE(1by default)
To Improve Performance:
-
Use smaller model (2-3x speedup):
ollama pull phi3:mini # Already configured in main.py -
Run POC to see CoreML limitations (educational):
python ai/onnx_runtime/poc_neural_engine.py
See docs/lessons_learned_onnx_coreml.md for full technical analysis. For a deeper practical explanation, see docs/onnx_coreml_deep_dive.md. For narrative quality controls, see docs/narrative_coherence.md.
- ONNX is a model exchange format, not a guaranteed performance layer by itself.
- For this app's LLM inference on Apple Silicon, GPU via Metal is the stable path.
- The stable runtime here is Ollama + llama.cpp + Metal (not CoreML/NPU).
- This path is GPU-accelerated (not CPU-only "bare metal").
- eBPF is unrelated to this inference stack.
- Voice Integration: Speech-to-text and text-to-speech for natural conversation
- Dynamic Difficulty: AI that learns from your play style
- Extended Narrative Memory: AI remembers previous games
- Expanded Themes: Additional themes with unique narrative styles
- "Connection refused" error: Make sure Ollama is running (
ollama serve) - "Model 'mistral' not found": Download the model (
ollama pull mistral) - Performance issues: Check system temperature, the game will automatically reduce computational load
- macOS crash (
Abort trap: 6) while clicking board/chat:- This is typically a PyQt callback exception path, not an ONNX/CoreML conversion problem.
- A fix was applied for PyQt6 mouse event handling and unhandled bot-call exceptions in UI slots.
- See docs/pyqt_crash_fix_2026-03-31.md for root cause and patch details.
- Experimental NPU mode selected but not used:
- This is expected unless Apple M3 is detected and ONNX/CoreML initializes successfully.
- The app falls back automatically to stable Ollama + llama.cpp + Metal.
- Check startup notice dialog and terminal logs for resolved backend.
If you previously saw ONNX/CoreML errors, those are a separate class of issue:
- ONNX/CoreML limitations in this repo relate to operator coverage and tensor constraints for LLMs.
- The
SIGABRTcrash fixed above was caused by UI event/exception handling in PyQt, not by ONNX conversion.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for a 24-hour Hackathon
- Inspired by classic Connect Four gameplay
- Powered by Ollama and Mistral AI
- UI built with PyQt6
