Llama (Bidirectional) for Reranking

NeMo AutoModel provides a bidirectional variant of Meta's Llama for reranking tasks. Unlike the standard causal (left-to-right) Llama used for text generation, this variant uses bidirectional attention, allowing the query and document to interact across the full sequence before a classification head produces a relevance score.

For the bi-encoder variant, see Llama (Bidirectional) for Embedding.

:::{card}


Tasks	Reranking
Architecture	`LlamaBidirectionalForSequenceClassification`
Parameters	1B – 8B
HF Org	meta-llama
:::

Available Models

Any Llama checkpoint can be loaded as a bidirectional reranking backbone. The following configurations have been tested:

Llama 3.2 1B — fast iteration, fits on a single GPU
Llama 3.1 8B — higher-quality reranking for production use

Reranking Models

The cross-encoder path is used for pairwise relevance scoring and reranking.

Architecture	Task	Wrapper Class	Description
`LlamaBidirectionalForSequenceClassification`	Reranking	`NeMoAutoModelCrossEncoder`	Bidirectional Llama with classification head for relevance scoring

Example HF Models

Model	HF ID
Llama 3.2 1B	`meta-llama/Llama-3.2-1B`
Llama 3.1 8B	`meta-llama/Llama-3.1-8B`

Example Recipes

Recipe	Description
{download}`llama3_2_1b.yaml <../../../../examples/retrieval/cross_encoder/llama3_2_1b.yaml>`	Cross-encoder — Llama 3.2 1B reranker

Try with NeMo AutoModel

1. Install NeMo AutoModel. Refer to the (Installation Guide) for information:

uv pip install nemo-automodel

2. Clone the repo to get the example recipes:

git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel

3. Run the recipe from inside the repo:

automodel examples/retrieval/cross_encoder/llama3_2_1b.yaml --nproc-per-node 8

:::{dropdown} Run with Docker 1. Pull the container and mount a checkpoint directory:

docker run --gpus all -it --rm \
  --shm-size=8g \
  -v $(pwd)/checkpoints:/opt/Automodel/checkpoints \
  nvcr.io/nvidia/nemo-automodel:26.02.00

2. Navigate to the AutoModel directory (where the recipes are):

cd /opt/Automodel

3. Run the recipe:

automodel examples/retrieval/cross_encoder/llama3_2_1b.yaml --nproc-per-node 8

:::

See the Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama (Bidirectional) for Reranking

Available Models

Reranking Models

Example HF Models

Example Recipes

Try with NeMo AutoModel

Hugging Face Model Cards

FilesExpand file tree

llama-bidirectional.md

Latest commit

History

llama-bidirectional.md

File metadata and controls

Llama (Bidirectional) for Reranking

Available Models

Reranking Models

Example HF Models

Example Recipes

Try with NeMo AutoModel

Hugging Face Model Cards