Skip to content

Commit 68c2d8e

Browse files
authored
Document canonical RankLLM CLI and CI smoke checks (#367)
* Add packaged rank-llm entrypoint baseline * Fix ruff formatting in CLI packaging test * Add RankLLM legacy CLI wrappers * Document canonical RankLLM CLI and CI smoke checks * Clean RankLLM CLI help output * Pin fastmcp in CLI smoke workflow
1 parent bd9ff2a commit 68c2d8e

File tree

5 files changed

+111
-17
lines changed

5 files changed

+111
-17
lines changed

.github/workflows/pr-format.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,3 +75,45 @@ jobs:
7575

7676
- name: Run tests
7777
run: .venv/bin/python -m unittest discover -s "test/${{ matrix.test-suite }}"
78+
79+
cli_smoke:
80+
needs: lint
81+
runs-on: ubuntu-latest
82+
env:
83+
TMPDIR: /mnt/tmp
84+
steps:
85+
- uses: actions/checkout@v4
86+
87+
- name: Prepare temp directory on /mnt
88+
run: |
89+
sudo mkdir -p "$TMPDIR"
90+
sudo chown "$USER":"$USER" "$TMPDIR"
91+
df -h /
92+
df -h /mnt
93+
94+
- uses: astral-sh/setup-uv@v6
95+
with:
96+
python-version: "3.11"
97+
98+
- name: Create CLI environment
99+
run: uv venv --python 3.11
100+
101+
- name: Install CLI smoke environment
102+
run: |
103+
uv pip install --python .venv/bin/python -e .
104+
uv pip install --python .venv/bin/python fastapi uvicorn "fastmcp>=2.0,<3"
105+
106+
- name: Run CLI smoke tests
107+
run: |
108+
.venv/bin/python -m unittest \
109+
test.test_cli_packaging \
110+
test.test_cli_scaffolding \
111+
test.test_cli_rerank_command \
112+
test.test_cli_validation \
113+
test.test_cli_prompt \
114+
test.test_cli_view \
115+
test.test_cli_introspection \
116+
test.test_cli_utilities \
117+
test.test_cli_http \
118+
test.test_cli_mcp \
119+
test.test_cli_legacy_wrappers

README.md

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ hosted-provider stacks.
105105
| Listwise reranking with open-source models via vLLM | `vllm` | Builds on `local` and adds the vLLM backend |
106106
| Batched SGLang inference | `sglang` | Install `flashinfer` separately when needed |
107107
| Batched TensorRT-LLM inference | `tensorrt-llm` | Install `flash-attn` separately when needed |
108-
| Flask and MCP server surfaces | `server` | Pulls the server-only dependency set |
108+
| HTTP and MCP server surfaces | `server` | Pulls the packaged `serve http` and `serve mcp` dependency set |
109109
| Finetuning and training scripts | `training` | Keeps training-only deps out of base installs |
110110
| Everything | `all` | Aggregate of all extras |
111111

@@ -150,6 +150,21 @@ pip install flash-attn --no-build-isolation
150150

151151
<a id="quick-start"></a>
152152
# ⏳ Quick Start
153+
The packaged `rank-llm` command is the canonical CLI surface for this repository.
154+
The legacy scripts under `src/rank_llm/scripts/` still work, but they now act as
155+
compatibility wrappers over the same CLI.
156+
157+
```bash
158+
rank-llm rerank --model-path castorini/rank_zephyr_7b_v1_full --dataset dl20 \
159+
--retrieval-method bm25 --top-k-candidates 100
160+
161+
rank-llm prompt list
162+
rank-llm view demo_outputs/rerank_results.jsonl
163+
rank-llm evaluate --model-name castorini/rank_zephyr_7b_v1_full
164+
rank-llm serve http --model-path castorini/rank_zephyr_7b_v1_full --port 8082
165+
rank-llm serve mcp --transport stdio
166+
```
167+
153168
The following code snippet is a minimal walk through of retrieval, reranking, evalaution, and invocations analysis of top 100 retrieved documents for queries from `DL19`. In this example `BM25` is used as the retriever and `RankZephyr` as the reranker. Additional sample snippets are available to run under the `src/rank_llm/demo` directory.
154169
```python
155170
from pathlib import Path
@@ -234,15 +249,15 @@ writer.write_inference_invocations_history(
234249
```
235250

236251
# End-to-end Run and 2CR
237-
If you are interested in running retrieval and reranking end-to-end or reproducing the results from the [reference papers](#✨-references), `run_rank_llm.py` is a convinent wrapper script that combines these two steps.
252+
If you are interested in running retrieval and reranking end-to-end or reproducing the results from the [reference papers](#✨-references), `rank-llm rerank` is the canonical command. `run_rank_llm.py` remains available as a compatibility wrapper for older automation.
238253

239254
The comperehensive list of our two-click reproduction commands are available on [MS MARCO V1](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v1-passage.html) and [MS MARCO V2](https://castorini.github.io/rank_llm/src/rank_llm/2cr/msmarco-v2-passage.html) webpages for DL19 and DL20 and DL21-23 datasets, respectively. Moving forward, we plan to cover more datasets and retrievers in our 2CR pages. The rest of this session provides some sample e2e runs.
240255
## RankZephyr
241256

242257
We can run the RankZephyr model with the following command:
243258
```bash
244-
python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 \
245-
--retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml --context_size=4096 --variable_passages
259+
rank-llm rerank --model-path castorini/rank_zephyr_7b_v1_full --top-k-candidates 100 --dataset dl20 \
260+
--retrieval-method SPLADE++_EnsembleDistil_ONNX --prompt-template-path src/rank_llm/rerank/prompt_templates/rank_zephyr_template.yaml --context-size 4096 --variable-passages
246261
```
247262

248263
Including the `--sglang_batched` flag will allow you to run the model in batched mode using the `SGLang` library.
@@ -255,8 +270,8 @@ If you want to run multiple passes of the model, you can use the `--num_passes`
255270

256271
We can run the RankGPT4-o model with the following command:
257272
```bash
258-
python src/rank_llm/scripts/run_rank_llm.py --model_path=gpt-4o --top_k_candidates=100 --dataset=dl20 \
259-
--retrieval_method=bm25 --prompt_template_path=src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml --context_size=4096 --use_azure_openai
273+
rank-llm rerank --model-path gpt-4o --top-k-candidates 100 --dataset dl20 \
274+
--retrieval-method bm25 --prompt-template-path src/rank_llm/rerank/prompt_templates/rank_gpt_apeer_template.yaml --context-size 4096 --use-azure-openai
260275
```
261276
Note that the `--prompt_template_path` is set to `rank_gpt_apeer` to use the LLM refined prompt from [APEER](https://arxiv.org/abs/2406.14449).
262277
This can be changed to `rank_GPT` to use the original prompt.

src/rank_llm/cli/main.py

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,10 @@ def build_parser() -> argparse.ArgumentParser:
213213
type=int,
214214
default=300,
215215
)
216-
validate_parser = subparsers.add_parser("validate", help=argparse.SUPPRESS)
216+
validate_parser = subparsers.add_parser(
217+
"validate",
218+
help="Validate inputs without executing a model.",
219+
)
217220
validate_subparsers = validate_parser.add_subparsers(
218221
dest="validate_target",
219222
required=True,
@@ -226,7 +229,10 @@ def build_parser() -> argparse.ArgumentParser:
226229
validate_rerank_parser.add_argument("--stdin", action="store_true")
227230
validate_rerank_parser.add_argument("--requests-file", dest="requests_file")
228231

229-
prompt_parser = subparsers.add_parser("prompt", help=argparse.SUPPRESS)
232+
prompt_parser = subparsers.add_parser(
233+
"prompt",
234+
help="Inspect bundled prompt templates.",
235+
)
230236
prompt_subparsers = prompt_parser.add_subparsers(
231237
dest="prompt_command",
232238
required=True,
@@ -245,19 +251,34 @@ def build_parser() -> argparse.ArgumentParser:
245251
prompt_render_parser.add_argument("--input-json", dest="input_json")
246252
prompt_render_parser.add_argument("--stdin", action="store_true")
247253

248-
view_parser = subparsers.add_parser("view", help=argparse.SUPPRESS)
254+
view_parser = subparsers.add_parser(
255+
"view",
256+
help="Inspect RankLLM artifacts and outputs.",
257+
)
249258
view_parser.add_argument("path")
250259
view_parser.add_argument("--records", type=int, default=1)
251260

252-
describe_parser = subparsers.add_parser("describe", help=argparse.SUPPRESS)
261+
describe_parser = subparsers.add_parser(
262+
"describe",
263+
help="Show structured metadata for a CLI command.",
264+
)
253265
describe_parser.add_argument("name", choices=sorted(COMMAND_DESCRIPTIONS))
254266

255-
schema_parser = subparsers.add_parser("schema", help=argparse.SUPPRESS)
267+
schema_parser = subparsers.add_parser(
268+
"schema",
269+
help="Show JSON schemas for supported contracts.",
270+
)
256271
schema_parser.add_argument("name", choices=sorted(SCHEMAS))
257272

258-
subparsers.add_parser("doctor", help=argparse.SUPPRESS)
273+
subparsers.add_parser(
274+
"doctor",
275+
help="Report environment and dependency readiness.",
276+
)
259277

260-
serve_parser = subparsers.add_parser("serve", help=argparse.SUPPRESS)
278+
serve_parser = subparsers.add_parser(
279+
"serve",
280+
help="Start RankLLM transport servers.",
281+
)
261282
serve_subparsers = serve_parser.add_subparsers(
262283
dest="serve_target",
263284
required=True,
@@ -383,7 +404,10 @@ def build_parser() -> argparse.ArgumentParser:
383404
default="stdio",
384405
)
385406
serve_mcp_parser.add_argument("--port", type=int, default=8000)
386-
evaluate_parser = subparsers.add_parser("evaluate", help=argparse.SUPPRESS)
407+
evaluate_parser = subparsers.add_parser(
408+
"evaluate",
409+
help="Aggregate trec_eval metrics across rerank outputs.",
410+
)
387411
evaluate_parser.add_argument("--model-name", required=True, dest="model_name")
388412
evaluate_parser.add_argument("--context-size", type=int, default=4096)
389413
evaluate_parser.add_argument(
@@ -392,13 +416,16 @@ def build_parser() -> argparse.ArgumentParser:
392416
default="rerank_results",
393417
)
394418

395-
analyze_parser = subparsers.add_parser("analyze", help=argparse.SUPPRESS)
419+
analyze_parser = subparsers.add_parser(
420+
"analyze",
421+
help="Analyze stored RankLLM responses.",
422+
)
396423
analyze_parser.add_argument("--files", nargs="+", required=True)
397424
analyze_parser.add_argument("--verbose", action="store_true")
398425

399426
retrieve_cache_parser = subparsers.add_parser(
400427
"retrieve-cache",
401-
help=argparse.SUPPRESS,
428+
help="Generate cached retrieval JSON from run files.",
402429
)
403430
retrieve_cache_parser.add_argument("--trec-file", required=True, dest="trec_file")
404431
retrieve_cache_parser.add_argument(

src/rank_llm/cli/operations.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ def run_evaluate_aggregate(
339339
if runner is None:
340340
from argparse import Namespace
341341

342-
from rank_llm.scripts.run_trec_eval import main as runner
342+
from rank_llm.scripts.run_trec_eval import evaluate_aggregate
343343

344344
args = Namespace(
345345
model_name=model_name,

test/test_cli_scaffolding.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,16 @@ def test_command_response_envelope(self):
1818

1919

2020
class TestCLIParserAndOutput(unittest.TestCase):
21+
def test_top_level_help_does_not_expose_suppress_sentinels(self):
22+
stdout = io.StringIO()
23+
stderr = io.StringIO()
24+
with contextlib.redirect_stdout(stdout), contextlib.redirect_stderr(stderr):
25+
with self.assertRaises(SystemExit) as raised:
26+
main(["--help"])
27+
self.assertEqual(raised.exception.code, 0)
28+
self.assertEqual("", stderr.getvalue())
29+
self.assertNotIn("==SUPPRESS==", stdout.getvalue())
30+
2131
def test_missing_command_text_error(self):
2232
stdout = io.StringIO()
2333
stderr = io.StringIO()

0 commit comments

Comments
 (0)