feat: Add task-aware embedding support by StoreksFeed · Pull Request #2560 · HKUDS/LightRAG

StoreksFeed · 2025-12-30T14:20:32Z

Description

Modern embedding models (e.g., Gemini and FRIDA) support asymmetric embeddings through task-specific prefixes, which significantly improve retrieval accuracy by generating different embeddings for queries versus documents. This PR implements this capability in LightRAG.

Related Issues

N/A

Changes Made

Configuration & Documentation

Added EMBEDDING_DOCUMENT_PREFIX and EMBEDDING_QUERY_PREFIX environment variables to lightrag/api/config.py
Updated docs/DockerDeployment.md and env.example with new configuration options
Added example demonstrating prefix usage: examples/unofficial-sample/lightrag_embedding_prefixes.py

Core Infrastructure

Enhanced EmbeddingFunc wrapper in lightrag/utils.py with supports_context parameter
Updated wrap_embedding_func_with_attrs decorator to support context-aware functions
Modified lightrag/operate.py to pass context

Vector Storage Backends

Updated all storage implementations to use context parameter:

lightrag/kg/faiss_impl.py
lightrag/kg/milvus_impl.py
lightrag/kg/mongo_impl.py
lightrag/kg/nano_vector_db_impl.py
lightrag/kg/postgres_impl.py
lightrag/kg/qdrant_impl.py

LLM Provider Bindings

Updated embedding functions with context support:

lightrag/llm/openai.py - Prefix support
lightrag/llm/ollama.py - Prefix support
lightrag/llm/gemini.py - Automatic task_type selection
lightrag/llm/jina.py - Automatic task selection
lightrag/llm/hf.py - Prefix support

Binding Options

Updated GeminiEmbeddingOptions to support automatic task_type selection

API Server

Integrated prefix configuration into lightrag/api/lightrag_server.py
Updated lightrag/api/utils_api.py splash screen to display prefix settings

Checklist

Changes tested locally
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

Backward Compatibility

Fully backward compatible - task is not injected unless explicitly asked
Existing deployments without prefix configuration should continue to work unchanged
Optional feature activated only when EMBEDDING_DOCUMENT_PREFIX or EMBEDDING_QUERY_PREFIX environment variables are set

danielaskdd · 2026-03-03T09:34:04Z

Thank you for your valuable contribution. Please resolve the existing conflicts so we can proceed with the review and merger.

danielaskdd · 2026-03-24T04:26:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93d5ce0283

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightrag/api/lightrag_server.py

lightrag/llm/jina.py

examples/unofficial-sample/lightrag_embedding_prefixes.py

…ear opt-in

StoreksFeed added 4 commits December 30, 2025 18:47

Add context-aware embedding support with prefixes

c19acc7

Fix default jina embedding task

f15ed25

Fix formatting

b98eeca

Fix typos in sample script

5ce8c1e

StoreksFeed changed the title ~~feat: Add context-aware embedding support with prefixes~~ feat: Add task-aware embedding support Jan 16, 2026

danielaskdd added enhancement New feature or request tracked Issue is tracked by project labels Mar 2, 2026

Merge remote-tracking branch 'origin/main' into embedding-prefixes

93d5ce0

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

lightrag/api/lightrag_server.py Outdated Show resolved Hide resolved

lightrag/llm/jina.py Outdated Show resolved Hide resolved

examples/unofficial-sample/lightrag_embedding_prefixes.py Outdated Show resolved Hide resolved

Addressed review suggestions by Codex, updated naming, implemented cl…

907e008

…ear opt-in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add task-aware embedding support#2560

feat: Add task-aware embedding support#2560
StoreksFeed wants to merge 6 commits intoHKUDS:mainfrom
StoreksFeed:embedding-prefixes

StoreksFeed commented Dec 30, 2025 •

edited

Loading

Uh oh!

danielaskdd commented Mar 3, 2026

Uh oh!

danielaskdd commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StoreksFeed commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Changes Made

Configuration & Documentation

Core Infrastructure

Vector Storage Backends

LLM Provider Bindings

Binding Options

API Server

Checklist

Additional Notes

Backward Compatibility

Uh oh!

danielaskdd commented Mar 3, 2026

Uh oh!

danielaskdd commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

StoreksFeed commented Dec 30, 2025 •

edited

Loading