Langfuse tracing across RAG ingestion pipeline by kushidhar-in · Pull Request #31516 · yugabyte/yugabyte-db

kushidhar-in · 2026-05-08T12:55:01Z

Add Langfuse tracing across RAG ingestion pipeline with datapack-scoped project resolution

Instrument the RAG agent pipeline end-to-end with Langfuse spans/observations so ingestion,
chunking, embedding generation, vector writes, and pipeline state updates are traceable in a
single observability flow. This adds method-level @observe coverage to core pipeline layers
(document_preprocessor, rag_handler, partition_chunk_pipeline, chunk, process_pdf,
embed, embedding_user_promt, and active pipeline tracking DB methods), and introduces a
wrapped SQL executor in yugabytedb_vector_store to capture query metadata and row counts for
database operations.

In DocumentPreprocessor, add datapack-aware Langfuse key resolution from document URI and
meko_system.langfuse_project_mapping, then bind the active public key context so nested
observations are routed to the correct Langfuse project. Also wrap top-level task processing in
an observation span and ensure structured success/error output is emitted to tracing. Improve
failure handling by updating document status to FAILED only when document_id is available.

Update dependencies by adding langfuse>=3.0.0 to support the new tracing instrumentation.

CLAassistant · 2026-05-08T12:55:23Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

kushidhar-in seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request integrates Langfuse for observability across the RAG pipeline, adding @observe decorators to various database, embedding, and processing functions. It also introduces dynamic Langfuse client resolution based on document metadata. Feedback focuses on critical runtime errors in the new _resolve_langfuse_client method, specifically regarding missing null checks for document_uri and incorrect return types that would cause unpacking failures. Additionally, there is a concern regarding the use of a private Langfuse API for context binding.

netlify · 2026-05-08T12:56:30Z

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`f3eddde`
🔍 Latest deploy log	https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/6a01bbc25e41e2000862cdfd
😎 Deploy Preview	https://deploy-preview-31516--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@observe

…ed project resolution Instrument the RAG agent pipeline end-to-end with Langfuse spans/observations so ingestion, chunking, embedding generation, vector writes, and pipeline state updates are traceable in a single observability flow. This adds method-level @observe coverage to core pipeline layers (, , , , , , , and active pipeline tracking DB methods), and introduces a wrapped SQL executor in to capture query metadata and row counts for database operations. In , add datapack-aware Langfuse key resolution from document URI and , then bind the active public key context so nested observations are routed to the correct Langfuse project. Also wrap top-level task processing in an observation span and ensure structured success/error output is emitted to tracing. Improve failure handling by updating document status to FAILED only when is available. Update dependencies by adding to support the new tracing instrumentation.

…sing document_uri

ashetkar · 2026-05-08T16:05:50Z

 boto3==1.34.81  # AWS SDK for Python (latest as of 2024-06)
+
+# Langfuse
+langfuse>=3.0.0


Should we lock it to a specific version since, as per gemini, we are using an internal API get_client()?

tracked in a seperate ticket. We will lock langfuse across all components with specefic version.

ashetkar

Please address/respond to gemini comments as well.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py Outdated

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py

kushidhar-in force-pushed the feat/rag-worker-tracing branch from ee08636 to e3d654a Compare May 8, 2026 15:52

kushidhar-in added 2 commits May 11, 2026 14:40

add fallback methods for rag worker tracing using lf

62156f9

Resolve Langfuse client by tenant_id from task details instead of par…

f3eddde

…sing document_uri

ashetkar reviewed May 11, 2026

View reviewed changes

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py

Comment thread python/ai/rag_agent/rag_pipeline/document_preprocessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse tracing across RAG ingestion pipeline#31516

Langfuse tracing across RAG ingestion pipeline#31516
kushidhar-in wants to merge 3 commits into
yugabyte:masterfrom
kushidhar-in:feat/rag-worker-tracing

kushidhar-in commented May 8, 2026

Uh oh!

CLAassistant commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

netlify Bot commented May 8, 2026 •

edited

Loading

Uh oh!

ashetkar May 8, 2026

Uh oh!

kushidhar-in May 12, 2026

Uh oh!

ashetkar left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kushidhar-in commented May 8, 2026

Uh oh!

CLAassistant commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

netlify Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Uh oh!

ashetkar May 8, 2026

Choose a reason for hiding this comment

Uh oh!

kushidhar-in May 12, 2026

Choose a reason for hiding this comment

Uh oh!

ashetkar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify Bot commented May 8, 2026 •

edited

Loading