Contributing Guide

Development Setup

Prerequisites

Python 3.11+
Docker & Docker Compose
Git

Installation

# Clone repository
git clone https://github.com/nolancacheux/AI-Product-Photo-Detector.git
cd AI-Product-Photo-Detector

# With uv (recommended)
uv venv
source .venv/bin/activate
uv pip install -e ".[dev,ui]"

# Or with pip
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,ui]"

# Install pre-commit hooks
pre-commit install

Code Style

Formatting

We use Ruff for linting and formatting:

# Check linting
ruff check src/ tests/

# Format code
ruff format src/ tests/

Type Checking

We use mypy for static type checking:

mypy src/

Pre-commit Hooks

Pre-commit runs automatically on git commit. To run manually:

pre-commit run --all-files

Testing

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_model.py

# Run specific test
pytest tests/test_model.py::TestAIImageDetector::test_model_creation

Writing Tests

Place tests in tests/ directory
Use pytest fixtures for shared setup
Aim for >80% code coverage
Test both success and error cases

Git Workflow

Branch Naming

feature/description - New features
fix/description - Bug fixes
docs/description - Documentation
refactor/description - Code refactoring

Commit Messages

Follow Conventional Commits:

type(scope): description

[optional body]

[optional footer]

Types:

feat: New feature
fix: Bug fix
docs: Documentation
style: Formatting
refactor: Code restructuring
test: Tests
chore: Maintenance

Examples:

feat(api): add batch prediction endpoint
fix(model): handle empty image input
docs(readme): update installation instructions
test(api): add tests for health endpoint

Pull Requests

Create a feature branch from main
Make your changes with atomic commits
Ensure tests pass: pytest
Ensure linting passes: ruff check
Update documentation if needed
Submit PR with clear description

Project Structure

AI-Product-Photo-Detector/
├── .github/workflows/      # CI/CD pipelines (ci, cd, model-training, pr-preview)
├── src/
│   ├── data/               # Data download and validation
│   │   └── validate.py     # Dataset validation
│   ├── inference/          # API server
│   │   ├── routes/         # API route handlers
│   │   │   ├── v1/         # Versioned API routes
│   │   │   ├── info.py     # Info/health endpoints
│   │   │   ├── monitoring.py # Monitoring endpoints
│   │   │   └── predict.py  # Prediction endpoints
│   │   ├── api.py          # FastAPI application
│   │   ├── predictor.py    # Model loading and inference
│   │   ├── explainer.py    # Grad-CAM heatmap generation
│   │   ├── confidence.py   # Confidence calibration
│   │   ├── auth.py         # API key authentication
│   │   ├── validation.py   # Input validation
│   │   ├── schemas.py      # Pydantic models
│   │   ├── shadow.py       # Shadow model comparison
│   │   ├── state.py        # Application state
│   │   └── rate_limit.py   # Rate limiting
│   ├── training/           # Model training
│   │   ├── train.py        # Training loop with MLflow
│   │   ├── model.py        # EfficientNet-B0 architecture
│   │   ├── dataset.py      # PyTorch dataset
│   │   ├── augmentation.py # Data augmentation
│   │   ├── gcs.py          # GCS integration
│   │   └── vertex_submit.py # Vertex AI job submission
│   ├── pipelines/          # Pipeline orchestration
│   │   ├── training_pipeline.py # End-to-end training pipeline
│   │   └── evaluate.py     # Model evaluation pipeline
│   ├── monitoring/         # Observability
│   │   ├── drift.py        # Data/model drift detection
│   │   └── metrics.py      # Prometheus metrics
│   ├── ui/                 # Streamlit web interface
│   │   └── app.py          # Streamlit application
│   └── utils/              # Shared utilities
│       ├── config.py       # Configuration management
│       ├── constants.py    # Project constants
│       ├── logger.py       # Structured logging
│       └── model_loader.py # Model loading utilities
├── tests/                  # Unit and integration tests
├── configs/                # Configuration files
│   ├── train_config.yaml   # Training hyperparameters
│   ├── inference_config.yaml # Inference settings
│   ├── pipeline_config.yaml  # Pipeline configuration
│   ├── prometheus.yml      # Prometheus scrape config
│   └── grafana/            # Grafana dashboards
├── docker/                 # Dockerfiles (API, training, UI)
├── terraform/              # Infrastructure as Code
│   ├── modules/            # Reusable Terraform modules
│   │   ├── cloud-run/      # Cloud Run service
│   │   ├── storage/        # GCS buckets
│   │   ├── registry/       # Artifact Registry
│   │   ├── iam/            # Service accounts and roles
│   │   └── monitoring/     # Uptime checks and alerts
│   └── environments/       # Environment-specific configs (dev, prod)
├── scripts/                # Data download utilities
├── notebooks/              # Jupyter notebooks (Colab training)
├── dvc.yaml                # DVC pipeline definition
├── docker-compose.yml      # Local development stack
├── docker-compose.dev.yml  # Dev-specific overrides
├── docker-compose.prod.yml # Prod-specific overrides
├── Makefile                # Development commands
└── pyproject.toml          # Python dependencies

Makefile Commands

make help          # List all commands
make dev           # Install dev dependencies + pre-commit
make lint          # Ruff + mypy
make format        # Auto-format code
make test          # Run pytest with coverage
make data          # Download CIFAKE dataset
make train         # Train model
make serve         # Start API (dev mode)
make docker-up     # Start full stack (API + UI + MLflow)
make deploy        # Trigger Cloud Run deploy via GitHub Actions

Data Management (DVC)

Dataset files are tracked with DVC. Never commit raw data to Git.

# Pull existing data
dvc pull

# After adding new data
dvc add data/processed
git add data/processed.dvc
git commit -m "data: update processed dataset"
dvc push

Docker

# Build images
make docker-build

# Run full stack
make docker-up

# Check logs
make docker-logs

# Tear down
make docker-down

Infrastructure (Terraform)

The terraform/ directory provisions GCP resources using a modular structure. See INFRASTRUCTURE.md for full details.

# Choose environment
cd terraform/environments/dev   # or prod

# Configure
cp terraform.tfvars.example terraform.tfvars  # Edit with your project ID

# Deploy
terraform init
terraform plan
terraform apply

Documentation

Update README.md for user-facing changes
Update docs/ARCHITECTURE.md for system design changes
Add docstrings to all public functions
Use Google-style docstrings

Release Process

Update the version in pyproject.toml and src/__init__.py
Update CHANGELOG.md with the new version and changes
Create a release tag: git tag v1.x.x
Push the tag: git push origin v1.x.x
The CI/CD pipeline automatically builds and deploys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing Guide

Development Setup

Prerequisites

Installation

Code Style

Formatting

Type Checking

Pre-commit Hooks

Testing

Running Tests

Writing Tests

Git Workflow

Branch Naming

Commit Messages

Pull Requests

Project Structure

Makefile Commands

Data Management (DVC)

Docker

Infrastructure (Terraform)

Documentation

Release Process

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing Guide

Development Setup

Prerequisites

Installation

Code Style

Formatting

Type Checking

Pre-commit Hooks

Testing

Running Tests

Writing Tests

Git Workflow

Branch Naming

Commit Messages

Pull Requests

Project Structure

Makefile Commands

Data Management (DVC)

Docker

Infrastructure (Terraform)

Documentation

Release Process