A cookiecutter template for end-to-end ML projects with MLOps best practices.
This repository contains the cookiecutter
template for
generating a repository that provides boilerplates touching on the
differing components of an end-to-end ML project.
For now, this template is dedicated for AI Singapore's on-premise environment, and where Run:ai is used as the MLOps platform. Other platforms and orchestrators would be integrated to this repository in the near future.
- End-to-end ML project structure
- MLOps best practices
- Run:ai integration
- Problem-specific templates
- Docker configuration
# Ensure that python>=3.10
pip install "cookiecutter>=2.2" unidiff
For macOS users who encounter issues with the
cookiecutter
CLI installation, you can useconda install
(if using Conda), or follow the instructions on the cookiecutter guide site usingpipx
.
To use the template and create a repository:
cookiecutter https://github.com/aisingapore/kapitan-hull
For a specific version of Kapitan Hull:
cookiecutter https://github.com/aisingapore/kapitan-hull -c <tag>
You will be prompted to provide inputs that will populate different parts of the generated repository.
To update your repository with the latest utilities:
- First, commit and push all your changes
- Run:
# Add `-c <tag>` for the specific tag/branch
cookiecutter --replay-file cookiecutter.json \
https://github.com/aisingapore/kapitan-hull \
-o .. -f
Parameter | Detail | Default | Regex Reference |
---|---|---|---|
project_name |
Project name that will appear as the main header in README.md (must start with a letter, use only spaces as separators) | My Project | Link |
description |
Brief project description for README.md (max 72 characters) | A short description of the project. | NIL |
repo_name |
Repository folder name (must start with a letter, no spaces or underscores allowed, use hyphens instead) | project_name where whitespaces and underscores are replaced with hyphens. |
Link |
src_package_name |
Python package name for source code (must start with a letter, no spaces or hyphens allowed, use underscores instead) | repo_name where hyphens are replaced with underscores. |
Link |
src_package_name_short |
Short alias for the source package (must start with a letter, no spaces or hyphens allowed) | src_package_name |
Link |
platform |
Select the platform where this project will run (on-premise infrastructure or Google Cloud Platform) | onprem or gcp |
NIL |
orchestrator |
Select the orchestration system for this project (Run:ai with AISG Resources or No orchestrator) | runai or none |
NIL |
aisg |
Choose whether to add AISG context | true or false |
NIL |
proj_name |
Project name used by the orchestrator (for Run:ai, this will be the Run:ai project name) | sample-project |
NIL |
registry_project_path |
Full path to container registry (without trailing slash, e.g., registry.domain.tld/project/image) | registry.domain.tld/sample-project/my-project |
Link |
problem_template |
Select a problem template to initialize your repository (base or cv) | base or cv |
NIL |
author_name |
Your name or team name (no hyphens allowed) | AISG |
Link |
Following the creation of your repository, initialise it with Git, push
it to a remote, and follow its README.md
document for a full guide on
its usage.
To reduce the size and check the explicit changes between the base template and the various problem templates, we opt for the use of diff files within Kapitan Hull to store the differences within the repository. As such it is essential that developers would know how to apply patches for development, and regenerate the diff files to commit those changes.
You can apply a specific patch as such:
python hooks/pre_prompt.py apply <diff_file>
You can create a specific patch as such:
python extras/generate_diffs.py create <alt_file>
This <alt_file>
refers to the file that is to be committed as a diff
file in the problem-templates/<template>
folder.
Note: When creating a diff patch, ensure that the base file and the other file have an extra newline at the end to avoid patching issues using the scripts.
Those who plan to use AMD GPUs and RoCM can check the extras/rocm
folder and copy the contents into the {{cookiecutter.repo_name}}
folder before populating your template. This is experimental, so
official support for this should not be expected any time soon. This is
also not added to the main template to reduce the confusion of having
multiple file variants for the users.