Name: "TODO: Name of the asset"
Category: "TODO: Model;Data pipeline;App"
Description: "TODO: Short description of what the asset is and how it's used."
Impact: "TODO: Short overview of the asset's business impact."
G6 lead: "TODO: Name of G6"
SRO: "TODO: Senior Responsible Owner for the project (usually SCS)."
Technical lead: "TODO: Data scientist responsible for the asset (usually G7 or G6)."
Business lead: "TODO: Lead from ops/policy/digital who commissioned or owns the product."
Last review date: "TODO: mmm-yy e.g. Jun-24"
Next review date: "TODO: mmm-yy e.g. Jun-24"
Outage Impact: "TODO: Red/Amber/Green"
Maintenance (FTE): "TODO: Minimum FTE required for ongoing support."
Documentation: "TODO: URL link to documentation."
Contact: "TODO: Contact email address for asset register communications."A comprehensive template repository for data science projects on the Analytical Platform, supporting both Python and R development with built-in code quality tools, testing frameworks, and CI/CD workflows.
Note
We are currently in development and suggestions are welcome! Open an issue here.
- Click "Use this template" → Create repository
- Clone:
git clone https://github.com/moj-analytical-services/your-repo-name.git - Follow post-clone checklist below. More info in Setup Instructions.
- Complete initial ethics scan (see Ethics & SAFE-D Framework)
- Create and activate virtual environment
- Install prek hooks:
prek install - Update this README with project details
- Register asset — complete the YAML block at the top of this README and add your repo to the Data Science Asset Register (see below)
- Update badge URLs in README
- Set GitHub repository description
- Grant team permissions (one Admin minimum)
- Review MoJ GitHub standards
This template provides a robust foundation for data science projects:
- 🐍 Python, 🗄️ SQL & 📊 R Support: Pre-configured for all three languages with formatting and testing
- ✅ Code Quality Tools: prek hooks for automated formatting, linting, and security checks
- 🧪 Testing Framework: Three-tier test structure (unit, integration, end-to-end) with pytest and testthat
- 🔒 Security Scanning: Bandit for Python security, secrets detection, and container vulnerability scanning, as well as large file detection and nbstripout to detect if you're about to commit data
- 📝 Architecture Decision Records: Built-in ADR tooling for documenting important decisions
- 🐳 Docker Ready: Dockerfile included for containerised deployments on airflow using the Analytical platform workflow
- 🤖 CI/CD Workflows: GitHub Actions for automated testing, container builds, and releases to to the ECS for use with Airflow on the analytical platform.
- 📚 Comprehensive Documentation: README templates, ADR examples, and test documentation
- 🔄 PR and issue templates to match common data science ways of working
├── data/ # Data files (gitignored)
│ ├── raw/ # Original data
│ ├── processed/ # Cleaned data
│ └── external/ # Third-party data
├── docs/ # Documentation and ADRs
├── models/ # Trained models (gitignored)
├── notebooks/
├── references/ # Data dictionaries, manuals
├── reports/ # Generated outputs (gitignored)
│ └── figures/
├── scripts/ # Executable scripts
├── src/ # Reusable source code
│ ├── data/ # Data processing
│ ├── features/ # Feature engineering
│ ├── models/ # Training and prediction
│ └── visualization/# Plotting utilities
└── tests/
├── unit/
├── integration/
└── e2e/ # End-to-end tests
See individual directories for detailed READMEs. Key points:
- data/, models/, reports/: Gitignored to prevent committing large or sensitive files
- src/: Installable as package with
pip install -e .(see src/README.md) - scripts/: One-off executables that use
src/code (see scripts/README.md) - notebooks/: Use numbered prefixes with snake_case (e.g.,
01_data_exploration.ipynb)
Tip
When starting a new project, create your module structure inside src/ (e.g., src/fraud_detection/) to keep code organised and importable.
This template uses prek hooks for automated code quality checks. The hooks cover:
- Python: Black formatting, Flake8 linting, Bandit security checks
- R: styler formatting, lintr linting (requires R packages:
install.packages(c("styler", "lintr"))) - SQL: SQLFluff linting and formatting
- Notebooks: nbstripout to remove outputs
- General: trailing whitespace, file size limits, secrets detection
After setting up your environment, the hooks will run automatically on each commit. You can also run them manually:
prek run --all-files # Run manuallyTesting:
# Using uv (recommended - faster dependency installation)
uv venv && source .venv/bin/activate
uv pip install -r requirements-dev.txt
uv run pytest tests/ # Python tests (unit, integration, e2e)
# Or using standard pip
pytest tests/ # Python tests (unit, integration, e2e)testthat::test_dir("tests/unit") # R testsAll live data science products should be listed in the Data Science Asset Register. The asset register provides visibility of what is deployed, who owns it, and the support resource it requires.
Once you have filled in the YAML block at the top of this README, add your repository to the assets.yaml file in the asset register repo and open a pull request. Your asset will appear in the register the following morning.
We use the MoJ Data Science and AI Playbook to guide projects through key stages.
Important
MANDATORY REQUIREMENT: All data science and AI projects must complete the MoJ AI & Data Science Ethics Framework Process. This is not optional guidance; it is a required process for responsible development of data-driven technologies.
For detailed guidance, see CONTRIBUTING.md.
Once you've created your repository using this template, ensure the following steps:
We are aligned with the analytical platform's guidance on setting up a Python development environment.
-
Create a virtual environment (in your project directory):
python3 -m venv venv
Add
venvto your.gitignorefile (already included in this template). -
Activate the virtual environment:
source venv/bin/activate # On macOS/Linux # or venv\Scripts\activate # On Windows
You'll see
(venv)in your terminal prompt when activated. -
Install development dependencies:
pip install -r requirements-dev.txt
-
Install prek hooks (for code quality):
prek install
-
Install project dependencies (when you have a
requirements.txt):pip install -r requirements.txt
-
Record your dependencies:
When you add new packages, update the requirements file:
pip freeze > requirements.txt git add requirements.txt ...
If you're working with R:
-
Install R (if not already installed):
Follow the R installation guide for your operating system.
-
Install required R packages for code quality:
# In R console install.packages(c("styler", "lintr", "testthat"))
-
Install project-specific R packages:
Create a file called
install_packages.Rin your project root and list your dependencies:# install_packages.R packages <- c( "dplyr", "ggplot2", "tidyr" # Add your packages here ) install.packages(packages)
Run it with:
Rscript install_packages.R -
Alternative - Use renv for dependency management:
# Initialise renv for your project install.packages("renv") renv::init() # Install packages (they'll be tracked by renv) install.packages("dplyr") # Create a snapshot of your dependencies renv::snapshot()
Edit this README.md file to document your project accurately. Take the time to create a clear, engaging, and informative README.md file. Include information like what your project does, how to install and run it, how to contribute, and any other pertinent details.
Also make sure the badge urls are correct for your repository, e.g.:
[](https://github.com/moj-analytical-services/YOUR-REPO-HERE/actions/workflows/pre-commit.yml)After you've created your repository, GitHub provides a brief description field that appears on the top of your repository's main page. This is a summary that gives visitors quick insight into the project. Using this field to provide a succinct overview of your repository is highly recommended.
Assign permissions to the appropriate Ministry of Justice teams. Ensure at least one team is granted Admin permissions. Whenever possible, assign permissions to teams rather than individual users.
Familiarise yourself with the GDS Way. These standards ensure consistency, maintainability, and best practices across all our repositories.
To add an Outside Collaborator to the repository, follow the guidelines for managing GitHub collaborators.
(Optional) Modify the CODEOWNERS file to specify the teams or users authorised to approve pull requests.
Adapt the dependabot.yml file to match your project's dependency manager and to enable automated pull requests for package updates.
If your repository is private with no GitHub Advanced Security license, remove the .github/workflows/dependency-review.yml file.
This project follows the Ministry of Justice's Code of Conduct. Please be respectful and professional in all interactions.
This project is licensed under the MIT License - see the LICENSE file for details.
This license may not be appropriate for all projects, because it gives permission to anyone with access to the repo to open source a copy of it. Please review and change the license as necessary for your project.
