Skip to content

FirecREST#12

Closed
AryanAhadinia wants to merge 2 commits into
mainfrom
AryanAhadinia/FirecREST
Closed

FirecREST#12
AryanAhadinia wants to merge 2 commits into
mainfrom
AryanAhadinia/FirecREST

Conversation

@AryanAhadinia
Copy link
Copy Markdown
Member

No description provided.

@AryanAhadinia AryanAhadinia self-assigned this Mar 6, 2026
@AryanAhadinia AryanAhadinia marked this pull request as ready for review March 9, 2026 15:26
@AryanAhadinia AryanAhadinia marked this pull request as draft March 11, 2026 11:51
@AryanAhadinia AryanAhadinia marked this pull request as ready for review March 16, 2026 10:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the repository into an installable swiss-ai-model-launch Python package providing an interactive sml CLI to launch models (primarily via FirecREST), replacing the previous serving/ scripts and related docs.

Changes:

  • Introduces a new src/ package with FirecREST launcher, configuration wizard (keyring-backed), healthcheck, and a Textual live display UI.
  • Adds packaged assets (job template + environment TOML files + preconfigured models list) used for SLURM submission.
  • Adds Python tooling/CI (pyproject, pre-commit, ruff, mypy, GitHub Actions) and replaces prior unit test scaffolding with an integration test.

Reviewed changes

Copilot reviewed 38 out of 48 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
.github/workflows/ci.yml Adds CI for lint/format, mypy strict typechecking, and pytest+coverage.
.gitignore Expands ignore rules for Python/IDE/OS artifacts.
.pre-commit-config.yaml Adds ruff + mypy pre-commit hooks.
Makefile Adds dev convenience targets (format/clean).
README.md Updates docs toward the new sml CLI + integration test instructions.
RUNNING_OPENCODE.md Removes OpenCode running instructions (legacy).
download/README.md Removes legacy model download instructions.
download/download_model.py Removes legacy model download script.
images/README.md Removes legacy image build documentation.
images/sglang_glm5/Dockerfile Removes legacy Dockerfile.
images/sglang_kimi_k2.5/Dockerfile Removes legacy Dockerfile.
pyproject.toml Introduces packaging metadata, dependencies, ruff/mypy/pytest/coverage configuration.
requirements.txt Removes old requirements file (migrated to pyproject).
serving/README.md Removes legacy serving documentation.
serving/submit_job.py Removes legacy SLURM submission script.
serving/utils.py Removes legacy utilities (bootstrap fetch, script rendering, etc.).
src/swiss_ai_model_launch/__init__.py Adds package root marker.
src/swiss_ai_model_launch/assets/__init__.py Adds assets package marker.
src/swiss_ai_model_launch/assets/models.json Adds preconfigured model launch presets for the CLI.
src/swiss_ai_model_launch/assets/template.jinja Adds/updates SLURM job script template used by FirecREST submissions.
src/swiss_ai_model_launch/assets/envs/__init__.py Adds env assets package marker.
src/swiss_ai_model_launch/assets/envs/sglang.toml Updates sglang container env configuration.
src/swiss_ai_model_launch/assets/envs/sglang_kimi.toml Updates kimi env config (mounts/env/annotations).
src/swiss_ai_model_launch/assets/envs/sglang_glm.toml Updates GLM image path.
src/swiss_ai_model_launch/assets/envs/vllm.toml Adds vLLM container env configuration.
src/swiss_ai_model_launch/assets/envs/vllm_qwen35.toml Adds vLLM nightly env configuration.
src/swiss_ai_model_launch/cli/__init__.py Exposes main entrypoint.
src/swiss_ai_model_launch/cli/main.py Implements interactive CLI flow: init wizard, model selection, launch + live monitoring.
src/swiss_ai_model_launch/cli/configuration/__init__.py Exposes InitConfig.
src/swiss_ai_model_launch/cli/configuration/init_wizard.py Adds init wizard and config persistence to ~/.sml/config.yml.
src/swiss_ai_model_launch/cli/configuration/models.py Adds configuration model types using Pydantic + Questionary + Keyring.
src/swiss_ai_model_launch/cli/display/__init__.py Exposes display state and live UI.
src/swiss_ai_model_launch/cli/display/live.py Adds Textual UI for status + stdout/stderr logs.
src/swiss_ai_model_launch/cli/display/state.py Adds in-memory state model feeding the UI.
src/swiss_ai_model_launch/cli/healthcheck/__init__.py Adds OpenAI-compatible endpoint healthcheck via httpx.
src/swiss_ai_model_launch/launchers/__init__.py Exposes launcher types.
src/swiss_ai_model_launch/launchers/firecrest_launcher.py Implements FirecREST-based model launch + job status/log retrieval.
src/swiss_ai_model_launch/launchers/launch_args.py Defines template arguments for job script rendering.
src/swiss_ai_model_launch/launchers/launch_request.py Defines user-facing LaunchRequest (Pydantic).
src/swiss_ai_model_launch/launchers/launcher.py Defines Launcher ABC + JobStatus enum.
src/swiss_ai_model_launch/launchers/utils.py Adds random salt helper used for naming.
src/swiss_ai_model_launch/telemetry/__init__.py Adds telemetry package marker (no implementation yet).
tests/__init__.py Adds tests package marker.
tests/integration/__init__.py Adds integration tests package marker.
tests/integration/test_launch_apertus.py Adds a FirecREST-based integration test that launches Apertus and checks health.
tests/unit.py Removes legacy unittest-based unit test.
tests/bootstrap_payload.json Removes legacy unit-test fixture payload.
Comments suppressed due to low confidence (3)

src/swiss_ai_model_launch/assets/template.jinja:10

  • The SLURM error output is configured to write to the same file as stdout (--error=logs/%j/log.out), which will interleave/overwrite stderr and makes debugging difficult. This looks like a copy/paste mistake; stderr should go to a separate log (e.g. log.err).
    src/swiss_ai_model_launch/assets/template.jinja:10
  • --output=logs/%j/log.out (and --error=...) writes into a per-job subdirectory under logs/, but SLURM opens these files before the script executes; it won't create missing directories like logs/%j. Since the script no longer creates the log directory before job start, this can cause the job to fail immediately. Use an output path that doesn't require pre-created directories, or ensure the required directories exist before submission (e.g., create logs/ upfront and avoid %j as a directory component).
    src/swiss_ai_model_launch/assets/template.jinja:8
  • The job script hardcodes a reservation (#SBATCH --reservation=PA-2338-RL). This will break launches for users/projects that don't have access to that reservation and makes the CLI non-portable. Consider making this configurable (LaunchRequest/LaunchArgs/config) or omitting it by default.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +56
_missing_env_vars = [v for v in _REQUIRED_ENV_VARS if os.environ.get(v) is None]
if _missing_env_vars:
pytest.fail(
"Missing required environment variables: " + ", ".join(_missing_env_vars),
pytrace=False,
)
Comment on lines +69 to +75
return FirecRESTLauncher(
client=client,
system_name=os.environ["FIRECREST_SYSTEM"],
username=os.environ.get("FIRECREST_USERNAME"),
account=os.environ["FIRECREST_ACCOUNT"],
partition=os.environ["FIRECREST_PARTITION"],
)
Comment on lines +148 to +155
async def get_job_status(self, job_id: int) -> JobStatus:
job_info = await self.client.job_info(
system_name=self.system_name,
jobid=job_id,
# account=self.account, # TODO
)
return JobStatus(str(job_info[0]["status"]["state"]))

Comment on lines +90 to +93
raise ValueError(
"`envionment` is not provided in the launch request, "
"and no default environment is available for the specified framework."
)
Comment on lines +53 to +56
def _get_laucnh_args_from_request(
self,
launch_request: LaunchRequest,
) -> LaunchArgs:
Comment thread src/swiss_ai_model_launch/cli/main.py Outdated
Comment on lines +39 to +47
async def _get_firecrest_launcher_with_client(client: f7t.v2.AsyncFirecrest):
async def _get_systems() -> dict[str, tuple[str, str]]:
return {
sys["name"]: (sys["name"], sys["ssh"]["host"])
for sys in await client.systems()
}

async def _get_partitions(get_value_from_context) -> dict[str, tuple[str, str]]:
system = get_value_from_context("firecrest_system")
Comment on lines +225 to +227
state = DisplayState()
state.update(cluster=launcher.system_name, partition=launcher.partition)

Comment on lines +231 to +242
while True:
await asyncio.sleep(5)

job_status = await launcher.get_job_status(job_id)
state.update(job_status=job_status)

model_health = await check_model_health(served_model_name, cscs_api_key)
state.update(model_health=model_health)

o, e = await launcher.get_job_logs(job_id)
state.append_out_log(o)
state.append_err_log(e)
Comment thread pyproject.toml

[tool.coverage.report]
show_missing = true
fail_under = 100
Comment on lines +64 to +73
return LaunchArgs(
job_name=job_name,
account=self.account,
partition=self.partition,
workers=launch_request.workers,
nodes_per_worker=launch_request.nodes_per_worker,
time=launch_request.time,
environment=launch_request.environment,
framework=launch_request.framework,
served_model_name=served_model_name,
@AryanAhadinia AryanAhadinia force-pushed the AryanAhadinia/FirecREST branch from b446efc to c89ed12 Compare March 16, 2026 23:42
@AryanAhadinia
Copy link
Copy Markdown
Member Author

Moved to #14 .

@AryanAhadinia AryanAhadinia deleted the AryanAhadinia/FirecREST branch March 17, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants