Document canonical RankLLM CLI and CI smoke checks (#367)

ronakice · web-flow · commit 68c2d8e6b2a3 · 2026-03-26T18:43:39.000+01:00
* Add packaged rank-llm entrypoint baseline

* Fix ruff formatting in CLI packaging test

* Add RankLLM legacy CLI wrappers

* Document canonical RankLLM CLI and CI smoke checks

* Clean RankLLM CLI help output

* Pin fastmcp in CLI smoke workflow
diff --git a/.github/workflows/pr-format.yml b/.github/workflows/pr-format.yml
@@ -75,3 +75,45 @@ jobs:
 
       - name: Run tests
         run: .venv/bin/python -m unittest discover -s "test/${{ matrix.test-suite }}"
+
+  cli_smoke:
+    needs: lint
+    runs-on: ubuntu-latest
+    env:
+      TMPDIR: /mnt/tmp
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Prepare temp directory on /mnt
+        run: |
+          sudo mkdir -p "$TMPDIR"
+          sudo chown "$USER":"$USER" "$TMPDIR"
+          df -h /
+          df -h /mnt
+
+      - uses: astral-sh/setup-uv@v6
+        with:
+          python-version: "3.11"
+
+      - name: Create CLI environment
+        run: uv venv --python 3.11
+
+      - name: Install CLI smoke environment
+        run: |
+          uv pip install --python .venv/bin/python -e .
+          uv pip install --python .venv/bin/python fastapi uvicorn "fastmcp>=2.0,<3"
+
+      - name: Run CLI smoke tests
+        run: |
+          .venv/bin/python -m unittest \
+            test.test_cli_packaging \
+            test.test_cli_scaffolding \
+            test.test_cli_rerank_command \
+            test.test_cli_validation \
+            test.test_cli_prompt \
+            test.test_cli_view \
+            test.test_cli_introspection \
+            test.test_cli_utilities \
+            test.test_cli_http \
+            test.test_cli_mcp \
+            test.test_cli_legacy_wrappers
diff --git a/README.md b/README.md
@@ -105,7 +105,7 @@ hosted-provider stacks.
 | Listwise reranking with open-source models via vLLM | `vllm` | Builds on `local` and adds the vLLM backend |
 | Batched SGLang inference | `sglang` | Install `flashinfer` separately when needed |
 | Batched TensorRT-LLM inference | `tensorrt-llm` | Install `flash-attn` separately when needed |
-| Flask and MCP server surfaces | `server` | Pulls the server-only dependency set |
+| HTTP and MCP server surfaces | `server` | Pulls the packaged `serve http` and `serve mcp` dependency set |
 | Finetuning and training scripts | `training` | Keeps training-only deps out of base installs |
 | Everything | `all` | Aggregate of all extras |
 
@@ -150,6 +150,21 @@ pip install flash-attn --no-build-isolation
 
 <a id="quick-start"></a>
 # ⏳ Quick Start
+The packaged `rank-llm` command is the canonical CLI surface for this repository.
+The legacy scripts under `src/rank_llm/scripts/` still work, but they now act as
+compatibility wrappers over the same CLI.
+
+```bash
+rank-llm rerank --model-path castorini/rank_zephyr_7b_v1_full --dataset dl20 \
+  --retrieval-method bm25 --top-k-candidates 100
+
+rank-llm prompt list
+rank-llm view demo_outputs/rerank_results.jsonl
+rank-llm evaluate --model-name castorini/rank_zephyr_7b_v1_full
+rank-llm serve http --model-path castorini/rank_zephyr_7b_v1_full --port 8082
+rank-llm serve mcp --transport stdio
+```
+
 The following code snippet is a minimal walk through of retrieval, reranking, evalaution, and invocations analysis of top 100 retrieved documents for queries from `DL19`. In this example `BM25` is used as the retriever and `RankZephyr` as the reranker. Additional sample snippets are available to run under the `src/rank_llm/demo` directory.
 ```python
 from pathlib import Path
@@ -234,15 +249,15 @@ writer.write_inference_invocations_history(
 ```
 
 # End-to-end Run and 2CR
-If you are interested in running retrieval and reranking end-to-end or reproducing the results from the [reference papers](#✨-references), `run_rank_llm.py` is a convinent wrapper script that combines these two steps.
+If you are interested in running retrieval and reranking end-to-end or reproducing the results from the [reference papers](#✨-references), `rank-llm rerank` is the canonical command. `run_rank_llm.py` remains available as a compatibility wrapper for older automation.
 
 The comperehensive list of our two-click reproduction commands are available on [MS MARCO V1](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v1-passage.html) and [MS MARCO V2](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v2-passage.html) webpages for DL19 and DL20 and DL21-23 datasets, respectively. Moving forward, we plan to cover more datasets and retrievers in our 2CR pages. The rest of this session provides some sample e2e runs. 
 ## RankZephyr
 
 We can run the RankZephyr model with the following command:
 ```bash
-python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 \
---retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml  --context_size=4096 --variable_passages
+rank-llm rerank --model-path castorini/rank_zephyr_7b_v1_full --top-k-candidates 100 --dataset dl20 \
+--retrieval-method SPLADE++_EnsembleDistil_ONNX --prompt-template-path src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml --context-size 4096 --variable-passages
 ```
 
 Including the `--sglang_batched` flag will allow you to run the model in batched mode using the `SGLang` library.
@@ -255,8 +270,8 @@ If you want to run multiple passes of the model, you can use the `--num_passes`
 
 We can run the RankGPT4-o model with the following command:
 ```bash
-python src/rank_llm/scripts/run_rank_llm.py  --model_path=gpt-4o --top_k_candidates=100 --dataset=dl20 \
-  --retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml  --context_size=4096 --use_azure_openai
+rank-llm rerank --model-path gpt-4o --top-k-candidates 100 --dataset dl20 \
+  --retrieval-method bm25 --prompt-template-path src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml --context-size 4096 --use-azure-openai
 ```
 Note that the `--prompt_template_path` is set to `rank_gpt_apeer` to use the LLM refined prompt from [APEER](https://arxiv.org/abs/2406.14449).
 This can be changed to `rank_GPT` to use the original prompt.
diff --git a/src/rank_llm/cli/main.py b/src/rank_llm/cli/main.py
@@ -213,7 +213,10 @@ def build_parser() -> argparse.ArgumentParser:
         type=int,
         default=300,
     )
-    validate_parser = subparsers.add_parser("validate", help=argparse.SUPPRESS)
+    validate_parser = subparsers.add_parser(
+        "validate",
+        help="Validate inputs without executing a model.",
+    )
     validate_subparsers = validate_parser.add_subparsers(
         dest="validate_target",
         required=True,
@@ -226,7 +229,10 @@ def build_parser() -> argparse.ArgumentParser:
     validate_rerank_parser.add_argument("--stdin", action="store_true")
     validate_rerank_parser.add_argument("--requests-file", dest="requests_file")
 
-    prompt_parser = subparsers.add_parser("prompt", help=argparse.SUPPRESS)
+    prompt_parser = subparsers.add_parser(
+        "prompt",
+        help="Inspect bundled prompt templates.",
+    )
     prompt_subparsers = prompt_parser.add_subparsers(
         dest="prompt_command",
         required=True,
@@ -245,19 +251,34 @@ def build_parser() -> argparse.ArgumentParser:
     prompt_render_parser.add_argument("--input-json", dest="input_json")
     prompt_render_parser.add_argument("--stdin", action="store_true")
 
-    view_parser = subparsers.add_parser("view", help=argparse.SUPPRESS)
+    view_parser = subparsers.add_parser(
+        "view",
+        help="Inspect RankLLM artifacts and outputs.",
+    )
     view_parser.add_argument("path")
     view_parser.add_argument("--records", type=int, default=1)
 
-    describe_parser = subparsers.add_parser("describe", help=argparse.SUPPRESS)
+    describe_parser = subparsers.add_parser(
+        "describe",
+        help="Show structured metadata for a CLI command.",
+    )
     describe_parser.add_argument("name", choices=sorted(COMMAND_DESCRIPTIONS))
 
-    schema_parser = subparsers.add_parser("schema", help=argparse.SUPPRESS)
+    schema_parser = subparsers.add_parser(
+        "schema",
+        help="Show JSON schemas for supported contracts.",
+    )
     schema_parser.add_argument("name", choices=sorted(SCHEMAS))
 
-    subparsers.add_parser("doctor", help=argparse.SUPPRESS)
+    subparsers.add_parser(
+        "doctor",
+        help="Report environment and dependency readiness.",
+    )
 
-    serve_parser = subparsers.add_parser("serve", help=argparse.SUPPRESS)
+    serve_parser = subparsers.add_parser(
+        "serve",
+        help="Start RankLLM transport servers.",
+    )
     serve_subparsers = serve_parser.add_subparsers(
         dest="serve_target",
         required=True,
@@ -383,7 +404,10 @@ def build_parser() -> argparse.ArgumentParser:
         default="stdio",
     )
     serve_mcp_parser.add_argument("--port", type=int, default=8000)
-    evaluate_parser = subparsers.add_parser("evaluate", help=argparse.SUPPRESS)
+    evaluate_parser = subparsers.add_parser(
+        "evaluate",
+        help="Aggregate trec_eval metrics across rerank outputs.",
+    )
     evaluate_parser.add_argument("--model-name", required=True, dest="model_name")
     evaluate_parser.add_argument("--context-size", type=int, default=4096)
     evaluate_parser.add_argument(
@@ -392,13 +416,16 @@ def build_parser() -> argparse.ArgumentParser:
         default="rerank_results",
     )
 
-    analyze_parser = subparsers.add_parser("analyze", help=argparse.SUPPRESS)
+    analyze_parser = subparsers.add_parser(
+        "analyze",
+        help="Analyze stored RankLLM responses.",
+    )
     analyze_parser.add_argument("--files", nargs="+", required=True)
     analyze_parser.add_argument("--verbose", action="store_true")
 
     retrieve_cache_parser = subparsers.add_parser(
         "retrieve-cache",
-        help=argparse.SUPPRESS,
+        help="Generate cached retrieval JSON from run files.",
     )
     retrieve_cache_parser.add_argument("--trec-file", required=True, dest="trec_file")
     retrieve_cache_parser.add_argument(
diff --git a/src/rank_llm/cli/operations.py b/src/rank_llm/cli/operations.py
@@ -339,7 +339,7 @@ def run_evaluate_aggregate(
     if runner is None:
         from argparse import Namespace
 
-        from rank_llm.scripts.run_trec_eval import main as runner
+        from rank_llm.scripts.run_trec_eval import evaluate_aggregate
 
         args = Namespace(
             model_name=model_name,
diff --git a/test/test_cli_scaffolding.py b/test/test_cli_scaffolding.py
@@ -18,6 +18,16 @@ def test_command_response_envelope(self):
 
 
 class TestCLIParserAndOutput(unittest.TestCase):
+    def test_top_level_help_does_not_expose_suppress_sentinels(self):
+        stdout = io.StringIO()
+        stderr = io.StringIO()
+        with contextlib.redirect_stdout(stdout), contextlib.redirect_stderr(stderr):
+            with self.assertRaises(SystemExit) as raised:
+                main(["--help"])
+        self.assertEqual(raised.exception.code, 0)
+        self.assertEqual("", stderr.getvalue())
+        self.assertNotIn("==SUPPRESS==", stdout.getvalue())
+
     def test_missing_command_text_error(self):
         stdout = io.StringIO()
         stderr = io.StringIO()