Langfuse tracing across RAG ingestion pipeline#31516
Conversation
|
kushidhar-in seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Code Review
This pull request integrates Langfuse for observability across the RAG pipeline, adding @observe decorators to various database, embedding, and processing functions. It also introduces dynamic Langfuse client resolution based on document metadata. Feedback focuses on critical runtime errors in the new _resolve_langfuse_client method, specifically regarding missing null checks for document_uri and incorrect return types that would cause unpacking failures. Additionally, there is a concern regarding the use of a private Langfuse API for context binding.
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
…ed project resolution Instrument the RAG agent pipeline end-to-end with Langfuse spans/observations so ingestion, chunking, embedding generation, vector writes, and pipeline state updates are traceable in a single observability flow. This adds method-level @observe coverage to core pipeline layers (, , , , , , , and active pipeline tracking DB methods), and introduces a wrapped SQL executor in to capture query metadata and row counts for database operations. In , add datapack-aware Langfuse key resolution from document URI and , then bind the active public key context so nested observations are routed to the correct Langfuse project. Also wrap top-level task processing in an observation span and ensure structured success/error output is emitted to tracing. Improve failure handling by updating document status to FAILED only when is available. Update dependencies by adding to support the new tracing instrumentation.
ee08636 to
e3d654a
Compare
| boto3==1.34.81 # AWS SDK for Python (latest as of 2024-06) | ||
|
|
||
| # Langfuse | ||
| langfuse>=3.0.0 No newline at end of file |
There was a problem hiding this comment.
Should we lock it to a specific version since, as per gemini, we are using an internal API get_client()?
There was a problem hiding this comment.
tracked in a seperate ticket. We will lock langfuse across all components with specefic version.
ashetkar
left a comment
There was a problem hiding this comment.
Please address/respond to gemini comments as well.
Add Langfuse tracing across RAG ingestion pipeline with datapack-scoped project resolution
Instrument the RAG agent pipeline end-to-end with Langfuse spans/observations so ingestion,
chunking, embedding generation, vector writes, and pipeline state updates are traceable in a
single observability flow. This adds method-level @observe coverage to core pipeline layers
(
document_preprocessor,rag_handler,partition_chunk_pipeline,chunk,process_pdf,embed,embedding_user_promt, and active pipeline tracking DB methods), and introduces awrapped SQL executor in
yugabytedb_vector_storeto capture query metadata and row counts fordatabase operations.
In
DocumentPreprocessor, add datapack-aware Langfuse key resolution from document URI andmeko_system.langfuse_project_mapping, then bind the active public key context so nestedobservations are routed to the correct Langfuse project. Also wrap top-level task processing in
an observation span and ensure structured success/error output is emitted to tracing. Improve
failure handling by updating document status to FAILED only when
document_idis available.Update dependencies by adding
langfuse>=3.0.0to support the new tracing instrumentation.