Contributing to Scraper MCP

Thank you for your interest in contributing to Scraper MCP! This document provides guidelines and instructions for contributing to the project.

Code of Conduct
Getting Started
Development Setup
Development Workflow
Code Standards
Testing
Submitting Changes
Areas for Contribution

Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please be respectful and constructive in all interactions.

Getting Started

Prerequisites

Python 3.12 or higher
uv package manager
Docker and Docker Compose (for testing deployment)
Git

Fork and Clone

Fork the repository on GitHub

Clone your fork locally:

git clone git@github.com:YOUR_USERNAME/scraper-mcp.git
cd scraper-mcp

Add the upstream repository:

git remote add upstream git@github.com:carrotly-ai/scraper-mcp.git

Development Setup

Install Dependencies

# Install the package with development dependencies
uv pip install -e ".[dev]"

Run the Server Locally

# Run with default settings (stdio transport)
python -m scraper_mcp

# Run with HTTP transport
python -m scraper_mcp streamable-http 0.0.0.0 8000

Access the Dashboard

Open http://localhost:8000/ in your browser to access the monitoring dashboard, playground, and configuration interface.

Development Workflow

Create a Feature Branch

git checkout -b feature/your-feature-name
# or
git checkout -b fix/your-bug-fix

Make Your Changes

Write your code following the Code Standards
Add or update tests as needed
Update documentation (README, docstrings, comments)
Run tests and linting locally

Keep Your Branch Updated

git fetch upstream
git rebase upstream/main

Code Standards

Python Style

We use Ruff for linting and formatting:

# Check for linting issues
ruff check .

# Auto-fix linting issues
ruff check . --fix

# Format code
ruff format .

Type Hints

We use type hints throughout the codebase. Run type checking with:

mypy src/

All public functions and methods should have complete type annotations.

Code Organization

Provider Pattern: New scraping backends should implement the ScraperProvider interface
Utilities: HTML processing utilities go in src/scraper_mcp/utils.py
Models: Use Pydantic v2 models for all data structures
Async/Await: Use async patterns consistently throughout

Documentation

Add docstrings to all public functions, classes, and methods
Update README.md for user-facing changes
Update CLAUDE.md for development guidance changes
Include inline comments for complex logic

Commit Messages

Use Conventional Commits format:

feat: add support for JavaScript rendering
fix: resolve timeout issue with slow sites
docs: update proxy configuration examples
refactor: simplify retry logic
test: add tests for batch operations
chore: update dependencies

Keep commits focused and atomic. Each commit should represent a single logical change.

Testing

Run Tests

# Run all tests with coverage
pytest

# Run specific test file
pytest tests/test_server.py

# Run specific test class
pytest tests/test_server.py::TestScrapeUrlTool

# Run with verbose output
pytest -v

# Run without coverage report
pytest --no-cov

Writing Tests

Use pytest with pytest-asyncio for async tests
Use pytest-mock for mocking
Place test fixtures in tests/conftest.py
Aim for >90% code coverage
Test both success and error cases
Test edge cases and boundary conditions

Test Structure

import pytest
from unittest.mock import Mock, patch

@pytest.mark.asyncio
async def test_feature_name(provider: RequestsProvider) -> None:
    """Test description."""
    # Arrange
    mock_response = Mock()
    mock_response.status_code = 200

    # Act
    with patch.object(provider.session, "get", return_value=mock_response):
        result = await provider.scrape("https://example.com")

    # Assert
    assert result.status_code == 200

Submitting Changes

Before Submitting

Run all checks:

# Format code
ruff format .

# Fix linting issues
ruff check . --fix

# Type check
mypy src/

# Run tests
pytest

Update documentation if needed
Add tests for new features or bug fixes

Rebase on latest main:

git fetch upstream
git rebase upstream/main

Create a Pull Request

Push your branch to your fork:

git push origin feature/your-feature-name

Go to the repository on GitHub
Click "New Pull Request"
Select your fork and branch
Fill out the PR template:
- Title: Brief description (50 chars max)
- Description:
  - What does this PR do?
  - Why is this change needed?
  - How was it tested?
  - Any breaking changes?
- Link related issues: Use "Closes #123" or "Fixes #123"

PR Review Process

Maintainers will review your PR
Address feedback and push updates
Once approved, maintainers will merge your PR
Your contribution will be credited in the release notes

Areas for Contribution

We welcome contributions in these areas:

High Priority

New Scraping Providers: Implement ScraperProvider for Playwright, Selenium, or Scrapy
Performance Optimizations: Improve caching, concurrency, or memory usage
Documentation: Improve examples, tutorials, or API documentation
Test Coverage: Add tests for edge cases or untested code paths

Feature Ideas

Authentication Support: Add support for authenticated requests (OAuth, cookies, headers)
Screenshot Capture: Add tools for capturing page screenshots
Rate Limiting: Implement per-domain rate limiting
Request Pooling: Connection pooling for improved performance
Webhook Support: Trigger scrapes via webhooks
Scheduled Scraping: Cron-like scheduling for periodic scrapes
Export Formats: Add JSON, XML, or CSV export options
Browser Fingerprinting: Advanced anti-detection techniques
Sitemap Support: Parse and scrape from XML sitemaps
Mobile User Agents: Better mobile scraping support

Bug Fixes

Check the Issues page for open bugs
Look for issues labeled good first issue or help wanted

Documentation

Improve README examples
Add tutorials or guides
Document common use cases
Translate documentation (if applicable)

Questions?

Open an Issue for questions
Check existing issues and discussions first
Be specific and provide context

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Thank you for contributing to Scraper MCP! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to Scraper MCP

Table of Contents

Code of Conduct

Getting Started

Prerequisites

Fork and Clone

Development Setup

Install Dependencies

Run the Server Locally

Access the Dashboard

Development Workflow

Create a Feature Branch

Make Your Changes

Keep Your Branch Updated

Code Standards

Python Style

Type Hints

Code Organization

Documentation

Commit Messages

Testing

Run Tests

Writing Tests

Test Structure

Submitting Changes

Before Submitting

Create a Pull Request

PR Review Process

Areas for Contribution

High Priority

Feature Ideas

Bug Fixes

Documentation

Questions?

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to Scraper MCP

Table of Contents

Code of Conduct

Getting Started

Prerequisites

Fork and Clone

Development Setup

Install Dependencies

Run the Server Locally

Access the Dashboard

Development Workflow

Create a Feature Branch

Make Your Changes

Keep Your Branch Updated

Code Standards

Python Style

Type Hints

Code Organization

Documentation

Commit Messages

Testing

Run Tests

Writing Tests

Test Structure

Submitting Changes

Before Submitting

Create a Pull Request

PR Review Process

Areas for Contribution

High Priority

Feature Ideas

Bug Fixes

Documentation

Questions?

License