(reranking-models)=
Reranking models use cross-encoders to score a query-document pair jointly. They are typically used after an embedding model has produced an initial candidate set. NeMo AutoModel supports optimized bidirectional Llama rerankers and falls back to Hugging Face AutoModelForSequenceClassification for other architectures.
For first-stage dense retrieval, see Embedding Models.
| Owner | Model | Architecture | Wrapper Class | Tasks |
|---|---|---|---|---|
| Meta | Llama (Bidirectional) | LlamaBidirectionalForSequenceClassification |
NeMoAutoModelCrossEncoder |
Reranking |
Any Hugging Face model loadable using AutoModelForSequenceClassification can be used as a reranking backbone. This fallback path uses the model's native attention; no bidirectional conversion is applied.
- Fine-tuning (Cross-Encoder): Cross-entropy training on query-document pairs to produce rerankers
- LoRA/PEFT: Parameter-efficient fine-tuning for reranking backbones
Retrieval fine-tuning requires query-document pairs: each example is a query paired with one positive document and one or more negative documents. Both inline JSONL and corpus ID-based JSON formats are supported. See the Retrieval Dataset guide.
:hidden:
meta/llama-bidirectional