Find the shortest path between any two Wikipedia articles. The Wikipedia game, solved.
wiki-route-demo.mp4
Note: The demo uses Simple English Wikipedia rather than full English Wikipedia for faster loading times.
Wikipedia articles form a directed graph: each article is a node, and each hyperlink to another article is an edge. Finding the shortest path between two articles is a classic graph search problem.
wiki-route uses bidirectional BFS (breadth-first search). It searches forward from the source article and backward from the target article simultaneously, expanding one level at a time. When the two search frontiers meet, the shortest path is guaranteed. This is significantly faster than a one-directional BFS, since the search space grows exponentially with depth.
The main bottleneck is loading the graph into memory. Parsing the raw SQL dumps for English Wikipedia takes significant time and RAM. To avoid repeating this cost on every query, wiki-route uses a client/server architecture over a Unix domain socket:
- Server (
wiki-route-server): Parses MediaWiki SQL dump files (or loads precomputed binary data), builds the in-memory graph, and listens for queries on a Unix socket. - Client (
wiki-route): A lightweight CLI that connects to the server, sends a query, and prints the result. Each query is near-instant since the graph is already loaded.
The server can precompute and save the graph to .bin files, making subsequent startups much faster by loading binary instead of re-parsing SQL.
wiki-route parses three SQL dump files from the Wikimedia database dumps:
| File | Description |
|---|---|
page.sql |
Page metadata (titles, namespaces) |
pagelinks.sql |
Links between pages |
linktarget.sql |
Link target resolution table |
English Wikipedia (~7M articles):
- https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
- https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
- https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-linktarget.sql.gz
Simple English Wikipedia (~280K articles):
- https://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-page.sql.gz
- https://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-pagelinks.sql.gz
- https://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-linktarget.sql.gz
Download and decompress the .gz files into a directory (e.g. sql/). Rename them to page.sql, pagelinks.sql, and linktarget.sql.
From crates.io:
cargo install wiki-route wiki-route-serverOr build from source:
git clone https://github.com/michal-pielka/wiki-route.git
cd wiki-route
cargo build --releaseFrom SQL dumps (first time):
wiki-route-server -v --sql-dir sql/ --save-dir data/From precomputed binary data (fast startup):
wiki-route-server -v --data-dir data/wiki-route United_States FranceUnited_States -> NATO -> France
Page titles use underscores in place of spaces, matching the format used in Wikipedia URLs (e.g. United_States, not United States).
wiki-route-server [OPTIONS] <--sql-dir <SQL_DIR>|--data-dir <DATA_DIR>>
Options:
--sql-dir <SQL_DIR> Directory containing page.sql, pagelinks.sql, linktarget.sql
--data-dir <DATA_DIR> Directory containing precomputed .bin files
--save-dir <SAVE_DIR> Save computed data to this directory (only valid with --sql-dir)
--socket <SOCKET> Unix socket path [default: /tmp/wiki-route.sock]
-v, --verbose... Increase verbosity (-v info, -vv debug, -vvv trace)
wiki-route [OPTIONS] <FROM> <TO>
Options:
--socket <SOCKET> Unix socket path [default: /tmp/wiki-route.sock]