Skip to content

Commit b89f83d

Browse files
authored
Rework of mila init following changes (#154)
* WIP: Rework of `mila init` following changes Signed-off-by: Fabrice Normandin <[email protected]> * temp Signed-off-by: Fabrice Normandin <[email protected]> * temp Signed-off-by: Fabrice Normandin <[email protected]> * TEMP Signed-off-by: Fabrice Normandin <[email protected]> * WIP: New `mila init` that mainly only prints Signed-off-by: Fabrice Normandin <[email protected]> * Make some progress on the rework of mila init Signed-off-by: Fabrice Normandin <[email protected]> * Remove overcomplicated, broken tests Signed-off-by: Fabrice Normandin <[email protected]> * Add todo for new 'cn-????' entry for compute nodes Signed-off-by: Fabrice Normandin <[email protected]> * Remove leftover todo Signed-off-by: Fabrice Normandin <[email protected]> * `StrictHostKeyChecking=no` for mila compute nodes Signed-off-by: Fabrice Normandin <[email protected]> * Remove flaky check_passwordless and test Signed-off-by: Fabrice Normandin <[email protected]> * Explicit async fixture scope to avoid warning Signed-off-by: Fabrice Normandin <[email protected]> * Update regression files for compute node entry Signed-off-by: Fabrice Normandin <[email protected]> * Fix bug in test for config file permissions Signed-off-by: Fabrice Normandin <[email protected]> * Add xfail on Windows test (?) Signed-off-by: Fabrice Normandin <[email protected]> * Another fix for Windows permission check (?) Signed-off-by: Fabrice Normandin <[email protected]> * Make test slightly less flaky Signed-off-by: Fabrice Normandin <[email protected]> * Remove mentions of the temporary form to add keys Signed-off-by: Fabrice Normandin <[email protected]> * Dont add StrictHostKeyChecking=no for compute node Signed-off-by: Fabrice Normandin <[email protected]> * Add new tests for mila init Signed-off-by: Fabrice Normandin <[email protected]> * WIP: Simplify/remove unnecessary tests Signed-off-by: Fabrice Normandin <[email protected]> * Xfail tests that are broken Signed-off-by: Fabrice Normandin <[email protected]> * Remove unnecessary regression files, add todos Signed-off-by: Fabrice Normandin <[email protected]> * Adjust tests for init_command Signed-off-by: Fabrice Normandin <[email protected]> * Simplify input stream mocks in tests for mila init Signed-off-by: Fabrice Normandin <[email protected]> * Fix some tests that use input_stream twice Signed-off-by: Fabrice Normandin <[email protected]> * Start adding new tests for `mila init` Signed-off-by: Fabrice Normandin <[email protected]> * Make new integration tests for `mila init` Signed-off-by: Fabrice Normandin <[email protected]> * Add temporary modifications to init_command.py Signed-off-by: Fabrice Normandin <[email protected]> * Add todo for disabling assert rewriting Signed-off-by: Fabrice Normandin <[email protected]> * Add full test for compute node SSH access setup Signed-off-by: Fabrice Normandin <[email protected]> * Add todo for later test Signed-off-by: Fabrice Normandin <[email protected]> * Fix test collection error in test_init.py Signed-off-by: Fabrice Normandin <[email protected]> * Fix test for add_ssh_entry Signed-off-by: Fabrice Normandin <[email protected]> * Tweak to not crash during test collection Signed-off-by: Fabrice Normandin <[email protected]> * Use the '-f' flag with ssh-copy-id on mila Signed-off-by: Fabrice Normandin <[email protected]> * Add the new DRAC clusters configs Signed-off-by: Fabrice Normandin <[email protected]> * Update regression files, make _some_ progress Signed-off-by: Fabrice Normandin <[email protected]> * Use different pubkey for drac Signed-off-by: Fabrice Normandin <[email protected]> * Fix bug in can_access_compute_nodes Signed-off-by: Fabrice Normandin <[email protected]> * Fix bug in can_access_compute_nodes Signed-off-by: Fabrice Normandin <[email protected]> * Integration tests with real SSH dir work Signed-off-by: Fabrice Normandin <[email protected]> * Add more tests Signed-off-by: Fabrice Normandin <[email protected]> * Fix issue with controlpath, ssh key Signed-off-by: Fabrice Normandin <[email protected]> * Fix display of public key having extra newline Signed-off-by: Fabrice Normandin <[email protected]> * add IdentityFile in config if key is non-standard Signed-off-by: Fabrice Normandin <[email protected]> * Rename the ssh key used in tests Signed-off-by: Fabrice Normandin <[email protected]> * Copy keys between windows<->WSL in mila init Signed-off-by: Fabrice Normandin <[email protected]> * Add tests for copying keys from wsl<->win32 Signed-off-by: Fabrice Normandin <[email protected]> * Try to patch up the experimental int tests Signed-off-by: Fabrice Normandin <[email protected]> * Try to make int tests work (they dont) Signed-off-by: Fabrice Normandin <[email protected]> * Fix formatting bug in build.yml Signed-off-by: Fabrice Normandin <[email protected]> * Only try creating keys if they don't exist Signed-off-by: Fabrice Normandin <[email protected]> * Add a bit more useful info for debugging Signed-off-by: Fabrice Normandin <[email protected]> * Try to disambiguate the Mila and DRAC key paths Signed-off-by: Fabrice Normandin <[email protected]> * Make DRAC setup warning on Windows more visible Signed-off-by: Fabrice Normandin <[email protected]> * Use bold red (more readable in Powershell) Signed-off-by: Fabrice Normandin <[email protected]> * Always add IdentityFile with keypath Signed-off-by: Fabrice Normandin <[email protected]> * Set StrictHostKeyChecking=no for localhost Signed-off-by: Fabrice Normandin <[email protected]> * Fix bug with relative path of IdentityFile Signed-off-by: Fabrice Normandin <[email protected]> * Fix regression files Signed-off-by: Fabrice Normandin <[email protected]> * Fix bug in get_windows_home_path_in_wsl Signed-off-by: Fabrice Normandin <[email protected]> * Fix drac default private key path in IdentityFile Signed-off-by: Fabrice Normandin <[email protected]> * Create ssh config at ~/.ssh/config before tests Signed-off-by: Fabrice Normandin <[email protected]> * Attempt to fix bug in mila init on WSL Signed-off-by: Fabrice Normandin <[email protected]> * Tweak text when copying keys from WSL<->windows Signed-off-by: Fabrice Normandin <[email protected]> * If private key exists, chmod 600 it Signed-off-by: Fabrice Normandin <[email protected]> * Change the path to the key used in unit tests Signed-off-by: Fabrice Normandin <[email protected]> * Mock the $HOME and default key paths during tests Signed-off-by: Fabrice Normandin <[email protected]> * Try to fix connection issue on Windows Signed-off-by: Fabrice Normandin <[email protected]> * Don't wait for passphrase when creating test key Signed-off-by: Fabrice Normandin <[email protected]> * Remove leftover debugging assert False Signed-off-by: Fabrice Normandin <[email protected]> * Add URL to ask for access to DRAC clusters Signed-off-by: Fabrice Normandin <[email protected]> * Use paths with '~/.ssh' for IdentityFile Signed-off-by: Fabrice Normandin <[email protected]> * Add an annoying xfail for RemoteV1 test Signed-off-by: Fabrice Normandin <[email protected]> * Skip tests with file permissions on Win32 Signed-off-by: Fabrice Normandin <[email protected]> * Replace computecanada with alliancecan Signed-off-by: Fabrice Normandin <[email protected]> * xfail for weird socket.getfqdn bug in MacOS in CI Signed-off-by: Fabrice Normandin <[email protected]> * Remove experimental integration tests Signed-off-by: Fabrice Normandin <[email protected]> * Fix wrong IdentityFile copied to Windows config Signed-off-by: Fabrice Normandin <[email protected]> * Fix other broken test Signed-off-by: Fabrice Normandin <[email protected]> * '~' actually works in IdentityFile on Windows! Signed-off-by: Fabrice Normandin <[email protected]> * Add conditional skip to test on MacOS in CI Signed-off-by: Fabrice Normandin <[email protected]> * Add a conditional skip for buggy, unnecessary test Signed-off-by: Fabrice Normandin <[email protected]> * Setup compute node access in `mila code` as well Signed-off-by: Fabrice Normandin <[email protected]> * Move block after disk quota check Signed-off-by: Fabrice Normandin <[email protected]> --------- Signed-off-by: Fabrice Normandin <[email protected]>
1 parent 1c4653f commit b89f83d

File tree

83 files changed

+2386
-4928
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+2386
-4928
lines changed

.github/workflows/build.yml

Lines changed: 42 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ on:
55
branches:
66
- master
77
paths-ignore:
8-
- 'README.md'
8+
- "README.md"
99
pull_request:
1010
paths-ignore:
11-
- 'README.md'
11+
- "README.md"
1212

1313
# https://stackoverflow.com/a/72408109/6388696
1414
# https://docs.github.com/en/actions/using-jobs/using-concurrency#example-using-concurrency-to-cancel-any-in-progress-job-or-run
@@ -24,7 +24,7 @@ jobs:
2424
- uses: actions/checkout@v5
2525
- uses: actions/setup-python@v6
2626
with:
27-
python-version: '3.10'
27+
python-version: "3.10"
2828
- run: pip install "pre-commit<4.0.0"
2929
- run: pre-commit --version
3030
- run: pre-commit install
@@ -37,49 +37,50 @@ jobs:
3737
max-parallel: 4
3838
matrix:
3939
platform: [ubuntu-latest, windows-latest, macos-latest]
40-
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13']
40+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
4141
env:
4242
PLATFORM: ${{ matrix.platform }}
4343
steps:
44-
- uses: actions/checkout@v5
45-
- name: Install the latest version of uv
46-
uses: astral-sh/setup-uv@v7
47-
with:
48-
version: "latest"
49-
enable-cache: true
50-
# https://github.com/astral-sh/setup-uv?tab=readme-ov-file#github-authentication-token
51-
github-token: ${{ secrets.GITHUB_TOKEN }}
52-
cache-suffix: ${{ matrix.python-version }}
53-
- name: Pin Python version to ${{ matrix.python-version }}
54-
run: uv python pin ${{ matrix.python-version }}
55-
- name: Install dependencies
56-
run: uv sync
57-
- name: Setup passwordless SSH access to localhost for tests
58-
# Adapted from https://stackoverflow.com/a/60367309/6388696
59-
if: runner.os == 'Linux'
60-
run: |
61-
ssh-keygen -t ed25519 -f ~/.ssh/testkey -N ''
62-
cat > ~/.ssh/config <<EOF
63-
Host localhost
64-
User $USER
65-
HostName 127.0.0.1
66-
IdentityFile ~/.ssh/testkey
67-
EOF
68-
echo -n 'from="127.0.0.1" ' | cat - ~/.ssh/testkey.pub > ~/.ssh/authorized_keys
69-
chmod og-rw ~
70-
ssh -o 'StrictHostKeyChecking no' localhost id
44+
- uses: actions/checkout@v5
45+
- name: Install the latest version of uv
46+
uses: astral-sh/setup-uv@v7
47+
with:
48+
version: "latest"
49+
enable-cache: true
50+
# https://github.com/astral-sh/setup-uv?tab=readme-ov-file#github-authentication-token
51+
github-token: ${{ secrets.GITHUB_TOKEN }}
52+
cache-suffix: ${{ matrix.python-version }}
53+
- name: Pin Python version to ${{ matrix.python-version }}
54+
run: uv python pin ${{ matrix.python-version }}
55+
- name: Install dependencies
56+
run: uv sync
57+
- name: Setup passwordless SSH access to localhost for tests
58+
# Adapted from https://stackoverflow.com/a/60367309/6388696
59+
if: runner.os == 'Linux'
60+
run: |
61+
ssh-keygen -t ed25519 -f ~/_id_ed25519_test -N ''
62+
mkdir -p ~/.ssh
63+
chmod 700 ~/.ssh
64+
cat > ~/.ssh/config <<EOF
65+
Host localhost
66+
User $USER
67+
HostName 127.0.0.1
68+
IdentityFile ~/_id_ed25519_test
69+
EOF
70+
echo -n 'from="127.0.0.1" ' | cat - ~/_id_ed25519_test.pub > ~/.ssh/authorized_keys
71+
chmod og-rw ~
72+
ssh -o 'StrictHostKeyChecking no' localhost id
7173
72-
- name: Test with pytest
73-
run: uv run pytest --cov=milatools --cov-report=xml --cov-append
74+
- name: Test with pytest
75+
run: uv run pytest --cov=milatools --cov-report=xml --cov-append
7476

75-
- name: Store coverage report as an artifact
76-
uses: actions/upload-artifact@v4
77-
with:
78-
name: coverage-reports-unit-${{ matrix.platform }}-${{ matrix.python-version }}
79-
path: ./coverage.xml
77+
- name: Store coverage report as an artifact
78+
uses: actions/upload-artifact@v4
79+
with:
80+
name: coverage-reports-unit-${{ matrix.platform }}-${{ matrix.python-version }}
81+
path: ./coverage.xml
8082

8183
real-slurm-integration-tests:
82-
8384
name: integration tests with a real SLURM cluster
8485
needs: [unit-tests]
8586

@@ -88,8 +89,8 @@ jobs:
8889
matrix:
8990
# TODO: We should ideally also run this with Windows/Mac clients and a Linux
9091
# server. Unsure how to set that up with GitHub Actions though.
91-
python-version: ['3.11']
92-
cluster: ['mila']
92+
python-version: ["3.11"]
93+
cluster: ["mila"]
9394
uses: ./.github/workflows/testing.yml
9495
with:
9596
cluster: ${{ matrix.cluster }}

milatools/cli/code.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
"""
2+
3+
TODO: If we need to add a ssh-key on the login node and add it to authorized_keys to get
4+
ssh access to compute nodes, do it here instead of in mila init, since we're already
5+
going to go through 2FA once.
6+
"""
7+
18
from __future__ import annotations
29

310
import asyncio
@@ -7,10 +14,19 @@
714
from pathlib import PurePosixPath
815

916
from milatools.cli import console
10-
from milatools.cli.init_command import DRAC_CLUSTERS
17+
from milatools.cli.init_command import (
18+
DEFAULT_DRAC_PUBKEY_PATH,
19+
DEFAULT_MILA_PUBKEY_PATH,
20+
DRAC_CLUSTERS,
21+
can_access_compute_nodes,
22+
get_ssh_public_key_path,
23+
setup_access_to_compute_nodes,
24+
)
1125
from milatools.cli.utils import (
26+
SSH_CONFIG_FILE,
1227
CommandNotFoundError,
1328
MilatoolsUserError,
29+
SSHConfig,
1430
currently_in_a_test,
1531
internet_on_compute_nodes,
1632
running_inside_WSL,
@@ -83,6 +99,15 @@ async def code(
8399
f"Unable to check the disk-quota on the {cluster} cluster: {exc}"
84100
)
85101

102+
public_key_path = get_ssh_public_key_path(
103+
cluster, ssh_config=SSHConfig(SSH_CONFIG_FILE)
104+
) or (DEFAULT_MILA_PUBKEY_PATH if cluster == "mila" else DEFAULT_DRAC_PUBKEY_PATH)
105+
if not can_access_compute_nodes(login_node, public_key_path=public_key_path):
106+
logger.info("Setting up access to compute nodes...")
107+
setup_access_to_compute_nodes(
108+
cluster, remote=login_node, public_key_path=public_key_path
109+
)
110+
86111
# NOTE: Perhaps we could eventually do this check dynamically, if the cluster is an
87112
# unknown cluster?
88113
sync_vscode_extensions_task = None

milatools/cli/commands.py

Lines changed: 2 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,7 @@
3434
from milatools.cli import console
3535
from milatools.cli.code import code
3636
from milatools.cli.init_command import (
37-
print_welcome_message,
38-
setup_keys_on_login_node,
39-
setup_passwordless_ssh_access,
40-
setup_ssh_config,
41-
setup_vscode_settings,
42-
setup_windows_ssh_config_from_wsl,
37+
init,
4338
)
4439
from milatools.cli.profile import ensure_program, setup_profile
4540
from milatools.cli.utils import (
@@ -54,7 +49,6 @@
5449
get_fully_qualified_name,
5550
get_hostname_to_use_for_compute_node,
5651
randname,
57-
running_inside_WSL,
5852
with_control_file,
5953
)
6054
from milatools.utils.disk_quota import check_disk_quota_v1
@@ -108,6 +102,7 @@ def main():
108102
github_issue_url = (
109103
f"https://github.com/mila-iqia/milatools/issues/new?{urlencode(options)}"
110104
)
105+
111106
print(
112107
T.bold_yellow(
113108
f"An error occurred during the execution of the command `{command}`. "
@@ -508,33 +503,6 @@ def intranet(search: Sequence[str]) -> None:
508503
webbrowser.open(url)
509504

510505

511-
def init():
512-
"""Set up your configuration and credentials."""
513-
514-
#############################
515-
# Step 1: SSH Configuration #
516-
#############################
517-
518-
print("Checking ssh config")
519-
520-
ssh_config = setup_ssh_config()
521-
522-
success = setup_passwordless_ssh_access(ssh_config=ssh_config)
523-
if not success:
524-
exit()
525-
526-
# if we're running on WSL, we actually just copy the id_rsa + id_rsa.pub and the
527-
# ~/.ssh/config to the Windows ssh directory (taking care to remove the
528-
# ControlMaster-related entries) so that the user doesn't need to install Python on
529-
# the Windows side.
530-
if running_inside_WSL():
531-
setup_windows_ssh_config_from_wsl(linux_ssh_config=ssh_config)
532-
533-
setup_keys_on_login_node()
534-
setup_vscode_settings()
535-
print_welcome_message()
536-
537-
538506
def forward(
539507
remote: str,
540508
page: str | None,

0 commit comments

Comments
 (0)