A TypeScript crawler that extracts information about Swiss municipalities from Wikipedia using Google's Gemini AI via Vertex AI.
- Crawls all Swiss municipalities from the Wikipedia list
- Extracts:
- Municipality name
- BFS number (official municipality identifier)
- Image from the Wikipedia infobox (if available)
- Uses Gemini AI via Vertex AI for intelligent data extraction
- Uses Google Cloud Application Default Credentials (no API key needed)
- Outputs results to JSON
- Install dependencies:
yarn install- Authenticate with Google Cloud:
gcloud auth application-default login- Create a
.envfile with your Google Cloud project settings:
cp .env.example .envThen edit .env and set your project:
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_CLOUD_LOCATION=us-central1
Make sure you have:
- Vertex AI API enabled in your Google Cloud project
- Appropriate permissions to use Vertex AI
Run the crawler in development mode:
yarn devOr build and run:
yarn build
yarn startThe crawler will:
- Fetch the list of all Swiss municipalities from Wikipedia
- Visit each municipality's Wikipedia page
- Use Gemini to extract the BFS number and image from the infobox
- Save all results to
municipalities.json
The output file municipalities.json contains an array of municipality objects:
[
{
"name": "Zürich",
"bfsId": "0261",
"image": "https://upload.wikimedia.org/..."
},
...
]- The crawler processes municipalities in batches of 5 to avoid rate limits
- It includes a 1-second delay between batches
- Invalid or failed extractions are skipped and logged