Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/styles/base/Dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -709,7 +709,9 @@ requeue
request_counts
requestable
RequestResponse
rerank
Rerank
reranked
reranking
resourceVersion
response_format
Expand Down
5 changes: 5 additions & 0 deletions app/_data/entity_examples/gateway/routes/rerank-route.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: rerank-route
paths:
- /rerank
service:
name: rerank-service
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
name: rerank-service
url: http://httpbin.konghq.com/rerank
294 changes: 294 additions & 0 deletions app/_how-tos/use-bedrock-rerank-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
---
title: Use AWS Bedrock rerank API with AI Proxy
content_type: how_to
related_resources:
- text: AI Gateway
url: /ai-gateway/
- text: AI Proxy
url: /plugins/ai-proxy/
- text: AWS Bedrock Rerank API
url: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_Rerank.html
breadcrumbs:
- /ai-gateway/

description: "Configure the AI Proxy plugin to use AWS Bedrock's Rerank API for improving document retrieval relevance in RAG pipelines."

products:
- gateway
- ai-gateway

works_on:
- on-prem
- konnect

min_version:
gateway: '3.10'

plugins:
- ai-proxy

entities:
- service
- route
- plugin

tags:
- ai

tldr:
q: How do I use AWS Bedrock Rerank with the AI Proxy plugin?
a: Configure AI Proxy with provider bedrock and route_type llm/v1/rerank. Send a query and candidate documents to the /rerank endpoint. The API returns documents reordered by relevance score.

tools:
- deck

prereqs:
inline:
- title: AWS credentials
content: |
Before you begin, you must have AWS credentials with Bedrock permissions:

- **AWS Access Key ID**: Your AWS access key
- **AWS Secret Access Key**: Your AWS secret key
- **Region**: AWS region where Bedrock is available (for example, `us-west-2`)

Export these values as environment variables:
```sh
export AWS_ACCESS_KEY_ID="<your-access-key-id>"
export AWS_SECRET_ACCESS_KEY="<your-secret-access-key>"
export AWS_REGION="us-west-2"
```
Ensure the rerank model is enabled in your AWS Bedrock console under "Model Access".
icon_url: /assets/icons/aws.svg
- title: Python and requests library
content: |
Install Python 3 and the requests library:
```sh
pip install requests
```
icon_url: /assets/icons/python.svg
entities:
services:
- rerank-service
routes:
- rerank-route

cleanup:
inline:
- title: Clean up Konnect environment
include_content: cleanup/platform/konnect
icon_url: /assets/icons/gateway.svg
- title: Destroy the {{site.base_gateway}} container
include_content: cleanup/products/gateway
icon_url: /assets/icons/gateway.svg

faqs:
- q: What version of {{site.base_gateway}} supports Bedrock Rerank?
a: |
AWS Bedrock Rerank requires {{site.base_gateway}} 3.13 or later.
- q: What is reranking and why is it useful?
a: |
Reranking takes a list of search results and reorders them by semantic relevance to a query. This improves retrieval quality in RAG pipelines by ensuring the most relevant documents are sent to the LLM for generation.
- q: How many documents can I rerank at once?
a: |
AWS Bedrock's Rerank API supports reranking up to 1,000 documents per request. The `numberOfResults` parameter controls how many of the highest-ranked results are returned.
- q: What rerank models are available?
a: |
AWS Bedrock offers `cohere.rerank-v3-5:0` and `amazon.rerank-v1:0`. Cohere Rerank 3.5 is available in most regions, while Amazon Rerank 1.0 is not available in us-east-1.
---

## Configure the plugin

Configure AI Proxy to use AWS Bedrock's Rerank API. This requires creating a dedicated route with the `/rerank` path:

{% entity_examples %}
entities:
plugins:
- name: ai-proxy
route: rerank-route
config:
llm_format: bedrock
route_type: llm/v1/chat
logging:
log_payloads: false
log_statistics: true
auth:
allow_override: false
aws_access_key_id: ${aws_access_key_id}
aws_secret_access_key: ${aws_secret_access_key}
model:
provider: bedrock
name: cohere.rerank-v3-5:0
options:
bedrock:
aws_region: ${aws_region}
variables:
aws_access_key_id:
value: $AWS_ACCESS_KEY_ID
aws_secret_access_key:
value: $AWS_SECRET_ACCESS_KEY
aws_region:
value: $AWS_REGION
{% endentity_examples %}

{:.info}
> The `config.llm_format: bedrock` setting enables Kong to accept native AWS Bedrock API requests. Kong detects the `/rerank` URI pattern and automatically routes requests to the Bedrock Agent Runtime service.

## Use AWS Bedrock Rerank API

AWS Bedrock's Rerank API reorders candidate documents by semantic relevance to a query. Send a query and document list (typically from vector or keyword search). The API returns the top N documents ordered by relevance score. This reduces context size before LLM generation and prioritizes relevant information. The rerank API scores and orders documents. It does not generate answers or citations.

The following script sends a query about exercise benefits with five candidate documents. Three documents contain relevant information about exercise and health. Two documents discuss unrelated topics (the Eiffel Tower and Python programming). The script shows the original retrieval order, then the reranked order with relevance scores.

Create the script:

```sh
cat > grounded-chat-demo.py << 'EOF'
#!/usr/bin/env python3
"""Demonstrate AWS Bedrock Rerank for improving RAG retrieval quality"""

import requests
import json

RERANK_URL = "http://localhost:8000/rerank"

print("AWS Bedrock Rerank Demo: RAG Pipeline Improvement")
print("=" * 60)

# Simulate documents retrieved from vector search
query = "What are the health benefits of regular exercise?"
documents = [
"Regular exercise can improve cardiovascular health and reduce the risk of heart disease.",
"The Eiffel Tower was completed in 1889 and stands 324 meters tall.",
"Exercise helps maintain healthy weight by burning calories and building muscle mass.",
"Python is a high-level programming language known for its simplicity and readability.",
"Physical activity strengthens bones and muscles, reducing the risk of osteoporosis and falls in older adults."
]

print(f"\nQuery: {query}")
print(f"\nCandidate documents: {len(documents)}")

# Before rerank: show original order
print("\n--- BEFORE RERANK (Original retrieval order) ---")
for idx, doc in enumerate(documents):
print(f"{idx}. {doc[:80]}...")

# Rerank the documents
print("\n--- RERANKING ---")
try:
# Build Bedrock rerank request
sources = []
for doc in documents:
sources.append({
"type": "INLINE",
"inlineDocumentSource": {
"type": "TEXT",
"textDocument": {
"text": doc
}
}
})

response = requests.post(
RERANK_URL,
headers={"Content-Type": "application/json"},
json={
"queries": [
{
"type": "TEXT",
"textQuery": {
"text": query
}
}
],
"sources": sources,
"rerankingConfiguration": {
"type": "BEDROCK_RERANKING_MODEL",
"bedrockRerankingConfiguration": {
"numberOfResults": 3,
"modelConfiguration": {
"modelArn": "arn:aws:bedrock:us-west-2::foundation-model/cohere.rerank-v3-5:0"
}
}
}
}
)

response.raise_for_status()
result = response.json()

print("✓ Reranking complete")

# After rerank: show reordered results
print("\n--- AFTER RERANK (Ordered by relevance) ---")
for item in result['results']:
idx = item['index']
score = item['relevanceScore']
print(f"{idx}. [Relevance: {score:.3f}] {documents[idx][:80]}...")

# Show the top document that should be sent to LLM
print("\n--- TOP RESULT FOR LLM CONTEXT ---")
top_idx = result['results'][0]['index']
top_score = result['results'][0]['relevanceScore']
print(f"Relevance Score: {top_score:.3f}")
print(f"Document: {documents[top_idx]}")

except Exception as e:
print(f"✗ Failed: {e}")

print("\n" + "=" * 60)
print("Demo complete")
EOF
```

This script sends a query with 5 candidate documents to AWS Bedrock's rerank endpoint. Three documents discuss exercise and health benefits. Two documents are intentionally irrelevant (Eiffel Tower, Python programming).

The script shows the original document order, then the reranked order with relevance scores. The `numberOfResults: 3` parameter limits the response to the top 3 documents. This demonstrates how reranking filters and reorders documents by semantic relevance before LLM generation.

{:.info}
> Verify that the response structure includes `results` with `index` and `relevanceScore` fields. Check AWS Bedrock's API documentation or test the script to confirm this behavior.

## Validate the configuration

Now, let's run the script we created in the previous step:

```sh
python3 bedrock-rerank-demo.py
```

Example output:

```text
AWS Bedrock Rerank Demo: RAG Pipeline Improvement
============================================================

Query: What are the health benefits of regular exercise?

Candidate documents: 5

--- BEFORE RERANK (Original retrieval order) ---
0. Regular exercise can improve cardiovascular health and reduce the risk of hea...
1. The Eiffel Tower was completed in 1889 and stands 324 meters tall....
2. Exercise helps maintain healthy weight by burning calories and building muscl...
3. Python is a high-level programming language known for its simplicity and read...
4. Physical activity strengthens bones and muscles, reducing the risk of osteopo...

--- RERANKING ---
✓ Reranking complete

--- AFTER RERANK (Ordered by relevance) ---
0. [Relevance: 0.989] Regular exercise can improve cardiovascular health and reduce the risk of hea...
2. [Relevance: 0.876] Exercise helps maintain healthy weight by burning calories and building muscl...
4. [Relevance: 0.823] Physical activity strengthens bones and muscles, reducing the risk of osteopo...

--- TOP RESULT FOR LLM CONTEXT ---
Relevance Score: 0.989
Document: Regular exercise can improve cardiovascular health and reduce the risk of heart disease.

============================================================
Demo complete
```

The output shows how reranking improves retrieval quality. The three exercise-related documents (indices 0, 2, 4) are correctly identified as most relevant with high scores above 0.82. The irrelevant documents about the Eiffel Tower and Python programming are filtered out, not appearing in the top 3 results.

This reranking step ensures that when you send context to an LLM for generation, you're providing the most semantically relevant information, improving answer quality and reducing hallucinations.
Loading
Loading