Skip to content

Latest commit

 

History

History
177 lines (133 loc) · 4.36 KB

TESTING.md

File metadata and controls

177 lines (133 loc) · 4.36 KB

Testing Guide for Stats Scraper

This guide explains how to test the Stats Scraper after the refactoring to make it open source and production-ready.

Prerequisites

Before testing, make sure you have:

  1. Installed all dependencies:

    pip install -r requirements.txt
    
  2. Set up your configuration:

    cp config.yaml.example config.yaml
    

    Edit config.yaml to configure your target repository and database settings.

  3. Set up your environment variables:

    cp .env.example .env
    

    Edit .env with your API tokens.

Testing with MotherDuck (Default)

To test with MotherDuck (the default database):

  1. Make sure your MOTHERDUCK_TOKEN is set in your .env file or environment variables.
  2. Run the refactored repository visitors script:
    python scripts/repo_visitors.py
    
  3. Check the logs to see if the script ran successfully.
  4. Verify the data in your MotherDuck database:
    SELECT * FROM github_visitors ORDER BY date DESC LIMIT 10;

Testing with SQLite

To test with SQLite:

  1. Edit your config.yaml to use SQLite:

    database:
      type: "sqlite"
      connection:
        sqlite:
          path: "superset_stats.db"

    Or set environment variables:

    export DATABASE_TYPE=sqlite
    export SQLITE_PATH=superset_stats.db
    
  2. Run the refactored repository visitors script:

    python scripts/repo_visitors.py
    
  3. Verify the data in your SQLite database:

    sqlite3 superset_stats.db "SELECT * FROM github_visitors ORDER BY date DESC LIMIT 10;"
    

Testing with PostgreSQL

To test with PostgreSQL:

  1. Make sure you have a PostgreSQL server running.

  2. Edit your config.yaml to use PostgreSQL:

    database:
      type: "postgresql"
      connection:
        postgresql:
          host: "localhost"
          port: 5432
          database: "superset_stats"
          username: "postgres"
          password: "your_password"

    Or set environment variables:

    export DATABASE_TYPE=postgresql
    export POSTGRESQL_HOST=localhost
    export POSTGRESQL_PORT=5432
    export POSTGRESQL_DATABASE=superset_stats
    export POSTGRESQL_USERNAME=postgres
    export POSTGRESQL_PASSWORD=your_password
    
  3. Run the refactored repository visitors script:

    python scripts/repo_visitors.py
    
  4. Verify the data in your PostgreSQL database:

    psql -U postgres -d superset_stats -c "SELECT * FROM github_visitors ORDER BY date DESC LIMIT 10;"
    

Testing with Different GitHub Repositories

To test with a different GitHub repository:

  1. Edit your config.yaml to target a different repository:

    github:
      owner: "different-org"
      repo: "different-repo"

    Or set environment variables:

    export GITHUB_OWNER=different-org
    export GITHUB_REPO=different-repo
    
  2. Run the refactored repository visitors script:

    python scripts/repo_visitors.py
    
  3. Check the logs to see if the script fetched data from the correct repository.

Testing the GitHub Actions Workflow

To test the GitHub Actions workflow locally:

  1. Install act if you haven't already.

  2. Run the test workflow script:

    ./test_workflow.sh
    
  3. The script will check for tokens in your environment or .env file, create a .secrets file for act, and run the workflow.

  4. Check the output to see if all steps ran successfully.

Troubleshooting

Database Connection Issues

If you encounter database connection issues:

  1. Check that your database credentials are correct.
  2. Verify that your database is running and accessible.
  3. Look at the logs for specific error messages.

API Rate Limiting

If you encounter GitHub API rate limiting:

  1. Make sure your GITHUB_TOKEN has sufficient permissions.
  2. Consider using a token with higher rate limits.
  3. Add rate limiting handling in the GitHub client.

Configuration Issues

If you encounter configuration issues:

  1. Check that your config.yaml file is properly formatted.
  2. Verify that your environment variables are set correctly.
  3. Look at the logs for specific error messages.

Next Steps

After testing the refactored code, you can:

  1. Refactor the remaining scripts to use the new architecture.
  2. Add unit tests for the core functionality.
  3. Consider adding a simple CLI interface for running all scrapers at once.