Skip to content

panter/municipality-wiki-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Swiss Municipality Crawler

A TypeScript crawler that extracts information about Swiss municipalities from Wikipedia using Google's Gemini AI via Vertex AI.

Features

  • Crawls all Swiss municipalities from the Wikipedia list
  • Extracts:
    • Municipality name
    • BFS number (official municipality identifier)
    • Image from the Wikipedia infobox (if available)
  • Uses Gemini AI via Vertex AI for intelligent data extraction
  • Uses Google Cloud Application Default Credentials (no API key needed)
  • Outputs results to JSON

Setup

  1. Install dependencies:
yarn install
  1. Authenticate with Google Cloud:
gcloud auth application-default login
  1. Create a .env file with your Google Cloud project settings:
cp .env.example .env

Then edit .env and set your project:

GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_CLOUD_LOCATION=us-central1

Make sure you have:

  • Vertex AI API enabled in your Google Cloud project
  • Appropriate permissions to use Vertex AI

Usage

Run the crawler in development mode:

yarn dev

Or build and run:

yarn build
yarn start

The crawler will:

  1. Fetch the list of all Swiss municipalities from Wikipedia
  2. Visit each municipality's Wikipedia page
  3. Use Gemini to extract the BFS number and image from the infobox
  4. Save all results to municipalities.json

Output

The output file municipalities.json contains an array of municipality objects:

[
  {
    "name": "Zürich",
    "bfsId": "0261",
    "image": "https://upload.wikimedia.org/..."
  },
  ...
]

Notes

  • The crawler processes municipalities in batches of 5 to avoid rate limits
  • It includes a 1-second delay between batches
  • Invalid or failed extractions are skipped and logged

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors