Solving GeoGuessr games using VLMs to 90%+ accuracy.
❓ What is GeoGuessr: the game puts you into Google Street View and challenges you to guess the location. Play for free here.
🤖 Uses Playwright to play the game and move around.
🎯 Automatically spots and zooms in on interesting stuff (signs, landmarks, etc.) to get a better view.
🔍 Location detection using GPT-4o and O1 vision models
Played 5 games (25 rounds) of the Classic Map with no-move (but allow pan and zoom).
o1
performs insanely well with 91%+ accuracy! 4o
is significantly cheaper, faster, and performs almost as well.
Model | Score % | Avg Score/Game (/25,000) | Best Game (/25,000) | Worst Game (/25,000) | Median Guess (km) | Best Guess (km) | Worst Guess (km) |
---|---|---|---|---|---|---|---|
GPT-4o | 89.9% | 22,464.2 | 24,530 | 20,561 | 160.4 | 0.2 | 626.7 |
o1 | 91.8% | 22,943.6 | 24,777 | 21,494 | 76.6 | 0.2 | 523.3 |
Copy .env.example
and set the keys.
The Geoguessr token is found in the browser's cookie.
uv run main.py
gpt-4o
required prompt tweaking to work well.o1
was smart enough to work with a very basic prompt.- Performance drops on the
A Community World (ACW)
map to ~80% foro1
. Why? - How much does the zoom-in feature improve results?