Skip to content

DirckM/school-contact-info-scraper

Repository files navigation

School Contact Info Scraper

A Python pipeline to extract contact information from school websites in North Holland and send standardized emails.

Features

  • Filters CSV data for active schools in Noord-Holland
  • Scrapes school websites for email addresses and phone numbers
  • Composes personalized emails with school-specific information
  • Sends emails via SMTP
  • Saves all data for review and tracking

Installation

  1. Install dependencies:
pip install -r requirements.txt
  1. Configure email settings:
    • Copy .env.example to .env
    • Fill in your SMTP credentials

Usage

Step 1: Collect data and compose emails

Run the main pipeline:

python main.py

The script will:

  1. Filter schools from the CSV file (Noord-Holland)
  2. Scrape their websites for contact information
  3. Compose standardized emails using email_template.txt
  4. Save all email data to emails_to_send.json

Note: The script requires email_template.txt to exist. If it doesn't exist, email composition will be skipped.

Step 2: Send emails (separate step)

After running the main script, send emails separately:

python send-email.py

This script will:

  1. Load emails from emails_to_send.json
  2. Skip emails that have already been sent
  3. Send remaining emails via SMTP
  4. Track which emails have been sent in sent_emails_tracking.json

Scripts

  • get-schools-region.py - Filters CSV for schools in a specific province
  • scraper.py - Scrapes websites for email and phone numbers
  • email-composer.py - Composes standardized emails with variables
  • send-email.py - Sends emails via SMTP
  • main.py - Runs the entire pipeline

Output Files

  • schools_data.json - All scraped school data
  • emails_to_send.json - All composed emails ready to send (includes sent status)
  • sent_emails_tracking.json - Tracking of which schools have received emails
  • composed_emails/ - Directory with individual composed email files (for review)

Configuration

Set environment variables in .env:

  • SMTP_SERVER - SMTP server address
  • SMTP_PORT - SMTP server port
  • SMTP_USERNAME - Your email username
  • SMTP_PASSWORD - Your email password (or app password)
  • FROM_EMAIL - Sender email address

Notes

  • The scraper includes delays between requests to be polite to servers
  • Scraped data is saved to avoid re-scraping
  • Email template (email_template.txt) is required for email composition
  • Email sending is a separate step that tracks which emails have been sent
  • The system remembers which emails have been sent and won't send duplicates
  • All composed emails are saved for review before sending

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages