docs: a claude demo of using torch-tensorrt to compile qwen3-reranker by narendasan · Pull Request #4085 · pytorch/TensorRT

narendasan · 2026-02-18T17:30:06Z

Description

A LLM generated demo for this new model class

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/torch_export_qwen3_reranker.py	2026-02-18 17:30:20.512233+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/torch_export_qwen3_reranker.py	2026-02-18 17:30:54.330404+00:00
@@ -125,11 +125,13 @@

    def __init__(self, model: torch.nn.Module):
        super().__init__()
        self.model = model

-    def forward(self, input_ids: torch.Tensor, position_ids: torch.Tensor) -> torch.Tensor:
+    def forward(
+        self, input_ids: torch.Tensor, position_ids: torch.Tensor
+    ) -> torch.Tensor:
        out = self.model(input_ids=input_ids, position_ids=position_ids)
        return out.logits


# ---------------------------------------------------------------------------
@@ -202,11 +204,13 @@
# Main
# ---------------------------------------------------------------------------


def parse_args():
-    p = argparse.ArgumentParser(description="Compile Qwen3-Reranker with Torch-TensorRT")
+    p = argparse.ArgumentParser(
+        description="Compile Qwen3-Reranker with Torch-TensorRT"
+    )
    p.add_argument("--model", default="Qwen/Qwen3-Reranker-0.6B", help="HF model name")
    p.add_argument("--precision", default="FP16", choices=["FP16", "BF16", "FP32"])
    p.add_argument(
        "--max_length",
        type=int,
@@ -231,11 +235,13 @@
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    token_true_id = tokenizer.convert_tokens_to_ids("yes")
    token_false_id = tokenizer.convert_tokens_to_ids("no")
-    print(f"  token_true_id (yes) = {token_true_id}, token_false_id (no) = {token_false_id}")
+    print(
+        f"  token_true_id (yes) = {token_true_id}, token_false_id (no) = {token_false_id}"
+    )

    base_model = (
        AutoModelForCausalLM.from_pretrained(
            args.model,
            use_cache=False,
@@ -252,22 +258,26 @@
    base_model = base_model.to(dtype_map[args.precision])

    # ------------------------------------------------------------------
    # 2. Build test inputs
    # ------------------------------------------------------------------
-    instruction = "Given a web search query, retrieve relevant passages that answer the query"
+    instruction = (
+        "Given a web search query, retrieve relevant passages that answer the query"
+    )
    queries = [
        "What is the capital of China?",
        "How does photosynthesis work?",
    ]
    documents = [
        "The capital of China is Beijing.",
        "Photosynthesis is the process by which plants convert sunlight into glucose.",
    ]

    print("Tokenizing inputs ...")
-    inputs = build_inputs(tokenizer, queries, documents, instruction, max_length=args.max_length)
+    inputs = build_inputs(
+        tokenizer, queries, documents, instruction, max_length=args.max_length
+    )
    input_ids = inputs["input_ids"]
    print(f"  input_ids shape: {input_ids.shape}")

    # ------------------------------------------------------------------
    # 3. PyTorch baseline

docs: a claude demo of using torch-tensorrt to compile qwen3-reranker

ec0e0bb

meta-cla bot added the cla signed label Feb 18, 2026

github-actions bot requested changes Feb 18, 2026

View reviewed changes

narendasan mentioned this pull request Feb 18, 2026

qwen3 embedding reranker support? #4073

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: a claude demo of using torch-tensorrt to compile qwen3-reranker#4085

docs: a claude demo of using torch-tensorrt to compile qwen3-reranker#4085
narendasan wants to merge 1 commit intomainfrom
narendasan/push-vyowusqpovxt

narendasan commented Feb 18, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

narendasan commented Feb 18, 2026

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments