FirecREST by AryanAhadinia · Pull Request #12 · swiss-ai/model-launch

AryanAhadinia · 2026-03-06T18:13:14Z

No description provided.

Copilot

Pull request overview

This PR restructures the repository into an installable swiss-ai-model-launch Python package providing an interactive sml CLI to launch models (primarily via FirecREST), replacing the previous serving/ scripts and related docs.

Changes:

Introduces a new src/ package with FirecREST launcher, configuration wizard (keyring-backed), healthcheck, and a Textual live display UI.
Adds packaged assets (job template + environment TOML files + preconfigured models list) used for SLURM submission.
Adds Python tooling/CI (pyproject, pre-commit, ruff, mypy, GitHub Actions) and replaces prior unit test scaffolding with an integration test.

Reviewed changes

Copilot reviewed 38 out of 48 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`.github/workflows/ci.yml`	Adds CI for lint/format, mypy strict typechecking, and pytest+coverage.
`.gitignore`	Expands ignore rules for Python/IDE/OS artifacts.
`.pre-commit-config.yaml`	Adds ruff + mypy pre-commit hooks.
`Makefile`	Adds dev convenience targets (format/clean).
`README.md`	Updates docs toward the new `sml` CLI + integration test instructions.
`RUNNING_OPENCODE.md`	Removes OpenCode running instructions (legacy).
`download/README.md`	Removes legacy model download instructions.
`download/download_model.py`	Removes legacy model download script.
`images/README.md`	Removes legacy image build documentation.
`images/sglang_glm5/Dockerfile`	Removes legacy Dockerfile.
`images/sglang_kimi_k2.5/Dockerfile`	Removes legacy Dockerfile.
`pyproject.toml`	Introduces packaging metadata, dependencies, ruff/mypy/pytest/coverage configuration.
`requirements.txt`	Removes old requirements file (migrated to pyproject).
`serving/README.md`	Removes legacy serving documentation.
`serving/submit_job.py`	Removes legacy SLURM submission script.
`serving/utils.py`	Removes legacy utilities (bootstrap fetch, script rendering, etc.).
`src/swiss_ai_model_launch/__init__.py`	Adds package root marker.
`src/swiss_ai_model_launch/assets/__init__.py`	Adds assets package marker.
`src/swiss_ai_model_launch/assets/models.json`	Adds preconfigured model launch presets for the CLI.
`src/swiss_ai_model_launch/assets/template.jinja`	Adds/updates SLURM job script template used by FirecREST submissions.
`src/swiss_ai_model_launch/assets/envs/__init__.py`	Adds env assets package marker.
`src/swiss_ai_model_launch/assets/envs/sglang.toml`	Updates sglang container env configuration.
`src/swiss_ai_model_launch/assets/envs/sglang_kimi.toml`	Updates kimi env config (mounts/env/annotations).
`src/swiss_ai_model_launch/assets/envs/sglang_glm.toml`	Updates GLM image path.
`src/swiss_ai_model_launch/assets/envs/vllm.toml`	Adds vLLM container env configuration.
`src/swiss_ai_model_launch/assets/envs/vllm_qwen35.toml`	Adds vLLM nightly env configuration.
`src/swiss_ai_model_launch/cli/__init__.py`	Exposes `main` entrypoint.
`src/swiss_ai_model_launch/cli/main.py`	Implements interactive CLI flow: init wizard, model selection, launch + live monitoring.
`src/swiss_ai_model_launch/cli/configuration/__init__.py`	Exposes InitConfig.
`src/swiss_ai_model_launch/cli/configuration/init_wizard.py`	Adds init wizard and config persistence to `~/.sml/config.yml`.
`src/swiss_ai_model_launch/cli/configuration/models.py`	Adds configuration model types using Pydantic + Questionary + Keyring.
`src/swiss_ai_model_launch/cli/display/__init__.py`	Exposes display state and live UI.
`src/swiss_ai_model_launch/cli/display/live.py`	Adds Textual UI for status + stdout/stderr logs.
`src/swiss_ai_model_launch/cli/display/state.py`	Adds in-memory state model feeding the UI.
`src/swiss_ai_model_launch/cli/healthcheck/__init__.py`	Adds OpenAI-compatible endpoint healthcheck via httpx.
`src/swiss_ai_model_launch/launchers/__init__.py`	Exposes launcher types.
`src/swiss_ai_model_launch/launchers/firecrest_launcher.py`	Implements FirecREST-based model launch + job status/log retrieval.
`src/swiss_ai_model_launch/launchers/launch_args.py`	Defines template arguments for job script rendering.
`src/swiss_ai_model_launch/launchers/launch_request.py`	Defines user-facing LaunchRequest (Pydantic).
`src/swiss_ai_model_launch/launchers/launcher.py`	Defines Launcher ABC + JobStatus enum.
`src/swiss_ai_model_launch/launchers/utils.py`	Adds random salt helper used for naming.
`src/swiss_ai_model_launch/telemetry/__init__.py`	Adds telemetry package marker (no implementation yet).
`tests/__init__.py`	Adds tests package marker.
`tests/integration/__init__.py`	Adds integration tests package marker.
`tests/integration/test_launch_apertus.py`	Adds a FirecREST-based integration test that launches Apertus and checks health.
`tests/unit.py`	Removes legacy unittest-based unit test.
`tests/bootstrap_payload.json`	Removes legacy unit-test fixture payload.

Comments suppressed due to low confidence (3)

src/swiss_ai_model_launch/assets/template.jinja:10

The SLURM error output is configured to write to the same file as stdout (--error=logs/%j/log.out), which will interleave/overwrite stderr and makes debugging difficult. This looks like a copy/paste mistake; stderr should go to a separate log (e.g. log.err).
src/swiss_ai_model_launch/assets/template.jinja:10
--output=logs/%j/log.out (and --error=...) writes into a per-job subdirectory under logs/, but SLURM opens these files before the script executes; it won't create missing directories like logs/%j. Since the script no longer creates the log directory before job start, this can cause the job to fail immediately. Use an output path that doesn't require pre-created directories, or ensure the required directories exist before submission (e.g., create logs/ upfront and avoid %j as a directory component).
src/swiss_ai_model_launch/assets/template.jinja:8
The job script hardcodes a reservation (#SBATCH --reservation=PA-2338-RL). This will break launches for users/projects that don't have access to that reservation and makes the CLI non-portable. Consider making this configurable (LaunchRequest/LaunchArgs/config) or omitting it by default.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+_missing_env_vars = [v for v in _REQUIRED_ENV_VARS if os.environ.get(v) is None]
+if _missing_env_vars:
+    pytest.fail(
+        "Missing required environment variables: " + ", ".join(_missing_env_vars),
+        pytrace=False,
+    )


+    return FirecRESTLauncher(
+        client=client,
+        system_name=os.environ["FIRECREST_SYSTEM"],
+        username=os.environ.get("FIRECREST_USERNAME"),
+        account=os.environ["FIRECREST_ACCOUNT"],
+        partition=os.environ["FIRECREST_PARTITION"],
+    )


+    async def get_job_status(self, job_id: int) -> JobStatus:
+        job_info = await self.client.job_info(
+            system_name=self.system_name,
+            jobid=job_id,
+            # account=self.account,  # TODO
+        )
+        return JobStatus(str(job_info[0]["status"]["state"]))
+


+            raise ValueError(
+                "`envionment` is not provided in the launch request, "
+                "and no default environment is available for the specified framework."
+            )


+    def _get_laucnh_args_from_request(
+        self,
+        launch_request: LaunchRequest,
+    ) -> LaunchArgs:


+async def _get_firecrest_launcher_with_client(client: f7t.v2.AsyncFirecrest):
+    async def _get_systems() -> dict[str, tuple[str, str]]:
+        return {
+            sys["name"]: (sys["name"], sys["ssh"]["host"])
+            for sys in await client.systems()
+        }
+
+    async def _get_partitions(get_value_from_context) -> dict[str, tuple[str, str]]:
+        system = get_value_from_context("firecrest_system")


+    state = DisplayState()
+    state.update(cluster=launcher.system_name, partition=launcher.partition)
+


+        while True:
+            await asyncio.sleep(5)
+
+            job_status = await launcher.get_job_status(job_id)
+            state.update(job_status=job_status)
+
+            model_health = await check_model_health(served_model_name, cscs_api_key)
+            state.update(model_health=model_health)
+
+            o, e = await launcher.get_job_logs(job_id)
+            state.append_out_log(o)
+            state.append_err_log(e)


+
+[tool.coverage.report]
+show_missing = true
+fail_under = 100


+        return LaunchArgs(
+            job_name=job_name,
+            account=self.account,
+            partition=self.partition,
+            workers=launch_request.workers,
+            nodes_per_worker=launch_request.nodes_per_worker,
+            time=launch_request.time,
+            environment=launch_request.environment,
+            framework=launch_request.framework,
+            served_model_name=served_model_name,


AryanAhadinia · 2026-03-16T23:47:44Z

Moved to #14 .

Clean Up the Repo

c448366

AryanAhadinia self-assigned this Mar 6, 2026

AryanAhadinia marked this pull request as ready for review March 9, 2026 15:26

AryanAhadinia marked this pull request as draft March 11, 2026 11:51

AryanAhadinia marked this pull request as ready for review March 16, 2026 10:51

AryanAhadinia requested review from Copilot and robmsmt March 16, 2026 10:51

Copilot started reviewing on behalf of AryanAhadinia March 16, 2026 10:52 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Add SML Package

c89ed12

AryanAhadinia force-pushed the AryanAhadinia/FirecREST branch from b446efc to c89ed12 Compare March 16, 2026 23:42

AryanAhadinia closed this Mar 16, 2026

AryanAhadinia deleted the AryanAhadinia/FirecREST branch March 17, 2026 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FirecREST#12

FirecREST#12
AryanAhadinia wants to merge 2 commits into
mainfrom
AryanAhadinia/FirecREST

AryanAhadinia commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

AryanAhadinia commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		state = DisplayState()
		state.update(cluster=launcher.system_name, partition=launcher.partition)

Conversation

AryanAhadinia commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

AryanAhadinia commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants