A comprehensive template repository for data science projects on the Analytical Platform, supporting both Python and R development with built-in code quality tools, testing frameworks, and CI/CD workflows.
Note
We are currently in development and suggestions are welcome! Open an issue here.
- Click "Use this template" β Create repository
- Clone:
git clone https://github.com/moj-analytical-services/your-repo-name.git - Follow post-clone checklist below. More info in Setup Instructions.
- Complete initial ethics scan (see Ethics & SAFE-D Framework)
- Create and activate virtual environment
- Install pre-commit hooks:
pre-commit install - Update this README with project details
- Update badge URLs in README
- Set GitHub repository description
- Grant team permissions (one Admin minimum)
- Review MoJ GitHub standards
This template provides a robust foundation for data science projects:
- π Python, ποΈ SQL & π R Support: Pre-configured for all three languages with formatting and testing
- β Code Quality Tools: Pre-commit hooks for automated formatting, linting, and security checks
- π§ͺ Testing Framework: Three-tier test structure (unit, integration, end-to-end) with pytest and testthat
- π Security Scanning: Bandit for Python security, secrets detection, and container vulnerability scanning, as well as large file detection and nbstripout to detect if you're about to commit data
- π Architecture Decision Records: Built-in ADR tooling for documenting important decisions
- π³ Docker Ready: Dockerfile included for containerised deployments on airflow using the Analytical platform workflow
- π€ CI/CD Workflows: GitHub Actions for automated testing, container builds, and releases to to the ECS for use with Airflow on the analytical platform.
- π Comprehensive Documentation: README templates, ADR examples, and test documentation
- π PR and issue templates to match common data science ways of working
βββ data/ # Data files (gitignored)
β βββ raw/ # Original data
β βββ processed/ # Cleaned data
β βββ external/ # Third-party data
βββ docs/ # Documentation and ADRs
βββ models/ # Trained models (gitignored)
βββ notebooks/
βββ references/ # Data dictionaries, manuals
βββ reports/ # Generated outputs (gitignored)
β βββ figures/
βββ scripts/ # Executable scripts
βββ src/ # Reusable source code
β βββ data/ # Data processing
β βββ features/ # Feature engineering
β βββ models/ # Training and prediction
β βββ visualization/# Plotting utilities
βββ tests/
βββ unit/
βββ integration/
βββ e2e/ # End-to-end tests
See individual directories for detailed READMEs. Key points:
- data/, models/, reports/: Gitignored to prevent committing large or sensitive files
- src/: Installable as package with
pip install -e .(see src/README.md) - scripts/: One-off executables that use
src/code (see scripts/README.md) - notebooks/: Use numbered prefixes with snake_case (e.g.,
01_data_exploration.ipynb)
Tip
When starting a new project, create your module structure inside src/ (e.g., src/fraud_detection/) to keep code organised and importable.
This template includes pre-commit hooks for automated code quality checks. The hooks cover:
- Python: Black formatting, Flake8 linting, Bandit security checks
- R: styler formatting, lintr linting (requires R packages:
install.packages(c("styler", "lintr"))) - SQL: SQLFluff linting and formatting
- Notebooks: nbstripout to remove outputs
- General: trailing whitespace, file size limits, secrets detection
After setting up your environment, the hooks will run automatically on each commit. You can also run them manually:
pre-commit run --all-files # Run manuallyTesting:
pytest tests/ # Python tests (unit, integration, e2e)testthat::test_dir("tests/unit") # R testsWe use the MoJ Data Science and AI Playbook to guide projects through key stages.
Important
MANDATORY REQUIREMENT: All data science and AI projects must complete the MoJ AI & Data Science Ethics Framework Process. This is not optional guidance; it is a required process for responsible development of data-driven technologies.
For detailed guidance, see CONTRIBUTING.md.
Once you've created your repository using this template, ensure the following steps:
We are aligned with the analytical platform's guidance on setting up a Python development environment.
-
Create a virtual environment (in your project directory):
python3 -m venv venv
Add
venvto your.gitignorefile (already included in this template). -
Activate the virtual environment:
source venv/bin/activate # On macOS/Linux # or venv\Scripts\activate # On Windows
You'll see
(venv)in your terminal prompt when activated. -
Install development dependencies:
pip install -r requirements-dev.txt
-
Install pre-commit hooks (for code quality):
pre-commit install
-
Install project dependencies (when you have a
requirements.txt):pip install -r requirements.txt
-
Record your dependencies:
When you add new packages, update the requirements file:
pip freeze > requirements.txt git add requirements.txt ...
If you're working with R:
-
Install R (if not already installed):
Follow the R installation guide for your operating system.
-
Install required R packages for code quality:
# In R console install.packages(c("styler", "lintr", "testthat"))
-
Install project-specific R packages:
Create a file called
install_packages.Rin your project root and list your dependencies:# install_packages.R packages <- c( "dplyr", "ggplot2", "tidyr" # Add your packages here ) install.packages(packages)
Run it with:
Rscript install_packages.R -
Alternative - Use renv for dependency management:
# Initialise renv for your project install.packages("renv") renv::init() # Install packages (they'll be tracked by renv) install.packages("dplyr") # Create a snapshot of your dependencies renv::snapshot()
Edit this README.md file to document your project accurately. Take the time to create a clear, engaging, and informative README.md file. Include information like what your project does, how to install and run it, how to contribute, and any other pertinent details.
Also make sure the badge urls are correct for your repository, e.g.:
[](https://github.com/moj-analytical-services/YOUR-REPO-HERE/actions/workflows/pre-commit.yml)After you've created your repository, GitHub provides a brief description field that appears on the top of your repository's main page. This is a summary that gives visitors quick insight into the project. Using this field to provide a succinct overview of your repository is highly recommended.
Assign permissions to the appropriate Ministry of Justice teams. Ensure at least one team is granted Admin permissions. Whenever possible, assign permissions to teams rather than individual users.
Familiarise yourself with the GDS Way. These standards ensure consistency, maintainability, and best practices across all our repositories.
To add an Outside Collaborator to the repository, follow the guidelines for managing GitHub collaborators.
(Optional) Modify the CODEOWNERS file to specify the teams or users authorised to approve pull requests.
Adapt the dependabot.yml file to match your project's dependency manager and to enable automated pull requests for package updates.
If your repository is private with no GitHub Advanced Security license, remove the .github/workflows/dependency-review.yml file.
This project follows the Ministry of Justice's Code of Conduct. Please be respectful and professional in all interactions.
This project is licensed under the MIT License - see the LICENSE file for details.
This license may not be appropriate for all projects, because it gives permission to anyone with access to the repo to open source a copy of it. Please review and change the license as necessary for your project.
