(reranking-models)=

Reranking Models

Introduction

Reranking models use cross-encoders to score a query-document pair jointly. They are typically used after an embedding model has produced an initial candidate set. NeMo AutoModel supports optimized bidirectional Llama rerankers and falls back to Hugging Face AutoModelForSequenceClassification for other architectures.

For first-stage dense retrieval, see Embedding Models.

Optimized Backbones (Bidirectional Attention)

Owner	Model	Architecture	Wrapper Class	Tasks
Meta	Llama (Bidirectional)	`LlamaBidirectionalForSequenceClassification`	`NeMoAutoModelCrossEncoder`	Reranking

Hugging Face Auto Backbones

Any Hugging Face model loadable using AutoModelForSequenceClassification can be used as a reranking backbone. This fallback path uses the model's native attention; no bidirectional conversion is applied.

Supported Workflows

Fine-tuning (Cross-Encoder): Cross-entropy training on query-document pairs to produce rerankers
LoRA/PEFT: Parameter-efficient fine-tuning for reranking backbones

Dataset

Retrieval fine-tuning requires query-document pairs: each example is a query paired with one positive document and one or more negative documents. Both inline JSONL and corpus ID-based JSON formats are supported. See the Retrieval Dataset guide.

:hidden:

meta/llama-bidirectional

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reranking Models

Introduction

Optimized Backbones (Bidirectional Attention)

Hugging Face Auto Backbones

Supported Workflows

Dataset

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

Reranking Models

Introduction

Optimized Backbones (Bidirectional Attention)

Hugging Face Auto Backbones

Supported Workflows

Dataset