Thank you for considering contributing to pipescraper! This document provides guidelines and instructions for contributing.
-
Fork and clone the repository
git clone https://github.com/Yasser03/pipescraper.git cd pipescraper -
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install development dependencies
pip install -e ".[dev]" -
Run tests to verify setup
pytest tests/ -v
# Run all tests
pytest tests/ -v
# Run with coverage report
pytest tests/ --cov=pipescraper --cov-report=html
# Run specific test
pytest tests/test_pipescraper.py::TestPipes::test_fetch_links_pipe -v- Place tests in
tests/test_pipescraper.py - Follow existing test structure and naming conventions
- Use pytest fixtures for common setup
- Mock external dependencies (HTTP requests, etc.)
- Aim for >80% code coverage
Example:
def test_new_feature(sample_article):
"""Test description."""
result = sample_article >> NewPipeVerb()
assert isinstance(result, ExpectedType)- Follow PEP 8
- Use Google-style docstrings
- Maximum line length: 88 characters (Black default)
- Use type hints where appropriate
We use Black for code formatting:
# Format all files
black pipescraper/ tests/
# Check formatting without changes
black --check pipescraper/ tests/Run flake8 to check for common issues:
flake8 pipescraper/ tests/-
Create a feature branch
git checkout -b feature/my-new-feature
-
Make your changes
- Write code
- Add/update tests
- Update documentation
- Follow code style guidelines
-
Test your changes
pytest tests/ -v black pipescraper/ tests/ flake8 pipescraper/
-
Commit your changes
git add . git commit -m "Add feature: description of changes"
-
Push to your fork
git push origin feature/my-new-feature
-
Open a Pull Request
- Go to the original repository
- Click "New Pull Request"
- Select your branch
- Describe your changes
Write clear, descriptive commit messages:
- Use present tense ("Add feature" not "Added feature")
- Be concise but descriptive
- Reference issues when applicable
Good examples:
Add FilterByDate pipe verb for date-based filteringFix robots.txt parsing bug in LinkFetcherUpdate documentation for ExtractArticles delay parameter
When reporting bugs, please include:
- Description — Clear description of the bug
- Reproduction steps — Minimal code to reproduce
- Expected behavior — What should happen
- Actual behavior — What actually happens
- Environment — Python version, OS, package versions
Example:
**Bug Description**
FetchLinks raises TypeError when max_links=None
**Reproduction**
```python
"https://example.com" >> FetchLinks(max_links=None)Expected: Should fetch unlimited links Actual: TypeError: 'NoneType' object is not subscriptable
Environment: Python 3.10, macOS, pipescraper 0.1.0
## 💡 Feature Requests
### Suggesting Features
We welcome feature suggestions! Please:
1. Check existing issues to avoid duplicates
2. Describe the use case clearly
3. Provide examples of desired usage
4. Explain why it would benefit users
## 📚 Documentation
### Updating Documentation
- Update README.md for user-facing changes
- Add docstrings to new classes/functions
- Update examples if behavior changes
- Keep documentation clear and concise
### Docstring Format
Use Google-style docstrings:
```python
def my_function(param1: str, param2: int = 10) -> bool:
"""
Short description of function.
Longer description if needed, explaining behavior,
edge cases, or important notes.
Args:
param1: Description of param1
param2: Description of param2 (default: 10)
Returns:
Description of return value
Raises:
ValueError: When param1 is invalid
Example:
>>> result = my_function("test", 20)
>>> print(result)
True
"""
# Implementation
- Tests pass locally
- Code follows style guidelines (Black, flake8)
- Added tests for new features
- Updated documentation
- No merge conflicts with main branch
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
Describe testing performed
## Checklist
- [ ] Tests pass
- [ ] Code formatted with Black
- [ ] Documentation updated
- [ ] No breaking changes (or documented)pipescraper/
├── __init__.py # Package exports
├── core.py # Core classes (Article, Extractors)
├── pipes.py # Pipe verb classes
└── utils.py # Utility functions
- Create class in
pipes.pyinheriting fromPipeBase - Implement
_execute(self, data)method - Add docstring with examples
- Export from
__init__.py - Add tests
- Update README
Example:
class MyNewVerb(PipeBase):
"""
Description of what this verb does.
Args:
param1: Description
Example:
>>> result = data >> MyNewVerb(param1="value")
"""
def __init__(self, param1: str):
self.param1 = param1
def _execute(self, data):
"""Execute the operation."""
# Implementation
return transformed_dataBy contributing, you agree that your contributions will be licensed under the MIT License.
Feel free to:
- Open an issue for questions
- Start a discussion on GitHub
- Contact maintainers
Thank you for contributing to pipescraper! 🎉