Skip to content

Commit 5b9a57b

Browse files
committed
Update README and environment configuration for Azure ML
- Simplified instructions in README regarding environment updates by removing redundant information about dependencies in requirements.txt. - Adjusted the build context in create-env.yaml to use the repository root for Dockerfile compatibility, ensuring correct path resolution for dependencies.
1 parent 812e2e0 commit 5b9a57b

File tree

16 files changed

+934
-31
lines changed

16 files changed

+934
-31
lines changed

.azuredevops/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
* [ ] No PII in logs or output
1414
* [ ] Made corresponding changes to the documentation
15-
* [ ] All new packages used are included in requirements.txt
15+
* [ ] All new packages used are included in pyproject.toml
1616
* [ ] Functions use type hints, and there are no type hint errors
1717

1818
## Pull Request Type

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ A machine learning and data science project template that makes it easy to work
1212
- [Features](#features)
1313
- [Getting Started](#getting-started)
1414
- [How to setup dev environment?](#how-to-setup-dev-environment)
15+
- [How to create a new directory under src with a new environment](#how-to-create-a-new-directory-under-src-with-a-new-environment)
1516
- [How to update python packages in the dev container](#how-to-update-python-packages-in-the-dev-container)
1617
- [Directory Structure](#directory-structure)
1718
- [`notebooks` directory vs `src` directory](#notebooks-directory-vs-src-directory)
@@ -30,7 +31,7 @@ A machine learning and data science project template that makes it easy to work
3031

3132
This repository provides a [VSCode Dev Container](https://code.visualstudio.com/docs/devcontainers/containers) based project template that can help accelerate your Machine Learning inner-loop development phase. The template covers the phases from early ML experimentation (local training/testing) until production oriented ML model training (cloud based training/testing with bigger CPUs and GPUs).
3233

33-
During the early phase of Machine Learning project, you may face challenges such as each data scientist creating various different python environments that span across CPU and GPU that tend to have different setup procedures. With the power of Dev Containers, you can automate environment setup process across the team and every data scientist will get the exact same environment automatically. This template provides both CPU and GPU Dev Container setup as examples. To support multiple different ML approaches with different python environments to be experimented in one project, this solution allows multiple different Dev Containers to be used in one repository while having a "common" module that will be installed into all Dev Container to enable code reuse across different Dev Containers.
34+
During the early phase of Machine Learning project, you may face challenges such as each data scientist creating various different python environments that span across CPU and GPU that tend to have different setup procedures. With the power of Dev Containers, you can automate environment setup process across the team and every data scientist will get the exact same environment automatically. This template provides both CPU and GPU Dev Container setup as examples. To support multiple different ML approaches with different python environments to be experimented in one project, this solution allows multiple different Dev Containers to be used in one repository.
3435

3536
Another challenge you may face is each data scientist creating a low quality codebase. That is fine during the experimentation stage to keep the team agility high and maximize your team’s experimentation throughput. But when you move to the model productionization stage, you experience the burden of bringing code quality up to production level. With the power of python tools and VSCode extensions configured for this template on top of Dev Containers, you can keep the code quality high automatically without losing your team’s agility and experimentation throughput and ease the transition to the productionization phase.
3637

@@ -102,17 +103,17 @@ This section gives you overview of the directory structure of this template. Onl
102103
│ ├── .devcontainer # dev container related configuration files goes to here following VSCode convention
103104
│ │ ├── devcontainer.json # dev container configuration and VS Code settings, extensions etc.
104105
│ │ ├── Dockerfile # referred in devcontainer.json
105-
│ │ └── pyproject.toml # includes python package list for notebooks. used in Dockerfile
106+
│ │ ├── pyproject.toml # includes python package list for notebooks. used in Dockerfile
107+
│ │ └── uv.lock # lock file for python packages. used in Dockerfile
106108
│ └── sample_notebook.py # example of interactive python script
107109
├── pyproject.toml # Setting file for ruff, pytest and pytest-cov
108110
└── src
109-
├── common # this module is accessible from all modules under src. put functions you want to import across the projects here
110-
│ └── requirements.txt # python package list for common module. installed in all Dockerfile under src. python tools for src goes to here too
111111
├── sample_cpu_project # cpu project example. Setup process is covered in Section: How to setup dev environment?
112112
│ ├── .devcontainer # dev container related configuration files goes to here following VSCode convention
113113
│ │ ├── devcontainer.json # dev container configuration and VS Code settings, extensions etc.
114114
│ │ ├── Dockerfile # referred in devcontainer.json. Supports only CPU
115-
│ │ └── pyproject.toml # includes python package list for sample_cpu_project. used in Dockerfile
115+
│ │ ├── pyproject.toml # includes python package list for sample_cpu_project. used in Dockerfile
116+
│ │ └── uv.lock # lock file for python packages. used in Dockerfile
116117
│ ├── sample_main.py
117118
│ └── tests # pytest scripts for sample_cpu_project goes here
118119
│ └── test_dummy.py # pytest script example
@@ -121,7 +122,8 @@ This section gives you overview of the directory structure of this template. Onl
121122
├── .devcontainer # dev container related configuration files goes to here following VSCode convention
122123
│ ├── devcontainer.json # dev container configuration and VS Code settings, extensions etc.
123124
│ ├── Dockerfile # referred in devcontainer.json. Supports GPU
124-
│ └── pyproject.toml # includes python package list for sample_pytorch_gpu_project. used in Dockerfile
125+
│ ├── pyproject.toml # includes python package list for sample_pytorch_gpu_project. used in Dockerfile
126+
│ └── uv.lock # lock file for python packages. used in Dockerfile
125127
├── aml_example/ # Sample AML CLI v2 Components-based pipeline, including setup YAML. See sample_pytorch_gpu_project/README for full details of files in this directory.
126128
├── sample_main.py
127129
├── inference.py # Example pytorch inference/eval script that also works with aml_example
@@ -224,7 +226,6 @@ ssh-add
224226
## Future Roadmap
225227

226228
- Add Docker build caching to Azure DevOps MS hosted CI pipeline
227-
- Investigate making `src/common` installed with `pip -e`
228229

229230
## Contributing
230231

notebooks/.devcontainer/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ ENV UV_PROJECT_FILE=.devcontainer/pyproject.toml
2626

2727
# Changing the default UV_LINK_MODE silences warnings about not being able to use hard links since the cache and sync target are on separate file systems
2828
ENV UV_LINK_MODE=copy
29-
# Install dependencies (as root for simplicity; devuser can still use installed packages)
29+
# Install dependencies using bind mounts instead of COPY to avoid extra layers
30+
# This is the recommended approach by uv: https://docs.astral.sh/uv/guides/integration/docker/#installing-a-project
3031
RUN --mount=type=cache,target=/root/.cache/uv \
3132
--mount=type=bind,source=notebooks/.devcontainer/uv.lock,target=uv.lock \
3233
--mount=type=bind,source=notebooks/.devcontainer/pyproject.toml,target=pyproject.toml \

notebooks/.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "DSToolkit Notebooks Dev Container",
33
"build": {
4-
// use root directory as build context so that requirements-dev.txt is accessible during build
4+
// use root directory as build context so that pyproject.toml and uv.lock are accessible during build
55
"context": "../../",
66
"dockerfile": "Dockerfile"
77
},

src/common/__init__.py

Whitespace-only changes.

src/common/requirements.txt

Lines changed: 0 additions & 2 deletions
This file was deleted.

src/sample_cpu_project/.devcontainer/Dockerfile

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,13 @@ ENV UV_PROJECT_FILE=.devcontainer/pyproject.toml
2626

2727
# Changing the default UV_LINK_MODE silences warnings about not being able to use hard links since the cache and sync target are on separate file systems
2828
ENV UV_LINK_MODE=copy
29-
# Install dependencies (as root for simplicity; devuser can still use installed packages)
29+
# Install dependencies using bind mounts instead of COPY to avoid extra layers
30+
# This is the recommended approach by uv: https://docs.astral.sh/uv/guides/integration/docker/#installing-a-project
3031
RUN --mount=type=cache,target=/root/.cache/uv \
3132
--mount=type=bind,source=src/sample_cpu_project/.devcontainer/uv.lock,target=uv.lock \
3233
--mount=type=bind,source=src/sample_cpu_project/.devcontainer/pyproject.toml,target=pyproject.toml \
3334
uv sync --locked --project $UV_PROJECT_FILE
3435

35-
# install common module related packages
36-
# This part can be potentially improved by https://docs.astral.sh/uv/concepts/projects/workspaces/#when-not-to-use-workspaces to move away from requirements.txt and gets its own lock file
37-
COPY src/common/requirements.txt .
38-
RUN --mount=type=cache,target=/root/.cache/uv \
39-
uv pip install -r requirements.txt --system
40-
4136
# Allow devuser to manage packages at runtime without sudo (e.g. uv add)
4237
RUN chown -R $USERNAME:$USERNAME /usr/local
4338
USER $USERNAME

src/sample_cpu_project/.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "Sample CPU Project Dev Container",
33
"build": {
4-
// use root directory as build context so that requirements-dev.txt is accessible during build
4+
// use root directory as build context so that pyproject.toml and uv.lock are accessible during build
55
"context": "../../../",
66
"dockerfile": "Dockerfile"
77
},

src/sample_cpu_project/.devcontainer/pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ requires-python = ">=3.11"
55

66
[dependency-groups]
77
dev = [
8+
"ipykernel==7.2.0",
89
"mypy==1.20.0",
910
"pytest==9.0.2",
1011
"pre-commit==4.5.1",

0 commit comments

Comments
 (0)