quick-model-tests

Lightweight tests that are 100% deterministic to prove the model (and system around it) will work in production. Designed to be prereq to evals and benchmarks.

Quickstart

export CSCS_SERVING_API=...   # your bearer token
curl -fsSL https://raw.githubusercontent.com/swiss-ai/quick-model-tests/main/run.sh | bash

Scope to specific areas and pick a model:

curl -fsSL https://raw.githubusercontent.com/swiss-ai/quick-model-tests/main/run.sh | bash -s -- --model swiss-ai/Apertus-8B-Instruct-2509

Every check is deterministic (status codes, response schema, token counts, substring / regex / closed-set membership) — there is intentionally no LLM-as-judge. Semantic/quality evaluation belongs in LLM evals, not in a functional pass/fail gate.

Usage

One command. It runs the deterministic suites against a model and prints a ✔/✗ capability table, exiting non-zero on any failure.

quick-model-tests                              # run ALL checks, default model
quick-model-tests --model swiss-ai/Apertus-1.5-8B-Instruct-sft-dpo-tools
quick-model-tests --capability tools           # just one capability
quick-model-tests --model A --model B          # compare models (table)
quick-model-tests --model A --model B --detail # + per-model failure reasons
quick-model-tests --json                       # machine-readable

Status per check: ✔ pass · ✗ an assertion failed (a real gap) · ⚠ the check errored (e.g. the server returned an HTTP error) · – skipped. The capabilities are: core · streaming · tools · multimodal · multiturn · reasoning · robustness (· perf, opt-in).

Local development:

uv venv && source .venv/bin/activate   # or python -m venv .venv && source .venv/bin/activate
make install                      # editable install (auto-detects `uv pip` / `pip`)
make run MODEL=Qwen/Qwen3.5-27B   # run the checks
make format                       # ruff auto-fix + format
make check                        # ruff lint + format check (the CI PR gate)

Every PR runs ruff check + ruff format --check (see .github/workflows/ci.yml); keep the tree clean with make format before pushing.

Configuration

Env var	Default
`QMT_API_BASE`	`https://api.swissai.svc.cscs.ch/v1`
`QMT_API_KEY` (or `CSCS_SERVING_API`)	— (required)
`QMT_MODEL`	`swiss-ai/Apertus-8B-Instruct-2509`

Suites

core · streaming · tools · multimodal · multiturn · reasoning · robustness · perf (opt-in).

Only core is implemented today; the rest are stubs. See SPEC.md for the full specification, test catalog, open questions to probe against the live API, and implementation milestones — it is written so another engineer can take over the build from it directly.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
quick_model_tests		quick_model_tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SPEC.md		SPEC.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

quick-model-tests

Quickstart

Usage

Configuration

Suites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

quick-model-tests

Quickstart

Usage

Configuration

Suites

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages