feat(eval): enqueue call trace ids onto evaluation redis queue#831
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Walkthrough
ChangesLLM-as-judge Evaluation Enqueue Pipeline
Sequence DiagramsequenceDiagram
participant end_conversation
participant get_trace_id
participant enqueue_trace_for_evaluation
participant Redis
end_conversation->>end_conversation: update_span_with_evaluation_data(context)
end_conversation->>get_trace_id: context.root_span
get_trace_id-->>end_conversation: evaluation_trace_id (32-hex or None)
alt trace id present
end_conversation->>enqueue_trace_for_evaluation: evaluation_trace_id
enqueue_trace_for_evaluation->>Redis: SET evaluation:enqueued:{id} NX EX 7d
enqueue_trace_for_evaluation->>Redis: RPUSH evaluation:trace_queue id
enqueue_trace_for_evaluation-->>end_conversation: True / False
end
end_conversation->>end_conversation: end_conversation_callbacks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_evaluation_queue.py (1)
38-40: ⚡ Quick winAdd type hints to the new helper/fixture/test function signatures in this module.
Several new function signatures are untyped (e.g., fixtures and test functions), which violates the repo’s Python typing rule.
As per coding guidelines,
**/*.pymust “Add type hints on all function signatures.”Also applies to: 49-55, 59-86, 93-204
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_evaluation_queue.py` around lines 38 - 40, Add type hints to all function signatures that are missing them in tests/test_evaluation_queue.py. At lines 38-40, the _new_trace_id function already has a return type hint (-> str) but ensure all parameters and return types are typed. At lines 49-55, 59-86, and 93-204, add complete type hints to all function signatures including parameters and return types for fixtures, test functions, and any helper functions that currently lack type annotations. This includes adding parameter type hints and return type hints (use None for functions that don't return a value) to comply with the repository's Python typing requirements.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/services/langfuse/tasks/evaluation/queue.py`:
- Around line 46-51: The marker is set before the queue operation, creating an
atomicity issue where if the RPUSH fails, the marker remains set for 7 days and
blocks future retry attempts, causing traces to be lost from evaluation. To fix
this, move the marker set operation to occur after the successful RPUSH in the
section where the dedup marker is created with client.set() and the trace is
enqueued with client.rpush(). This ensures that the marker is only persisted
once the trace has been successfully added to the queue, allowing retries if the
enqueue operation fails.
In `@tests/test_evaluation_queue.py`:
- Line 198: The end_conversation function call on line 198 is passing a
SimpleNamespace object as the context parameter, but the function is typed to
accept a TemplateContext object, causing a type checker failure. Replace the
SimpleNamespace context with a properly constructed TemplateContext instance
that satisfies the function's type requirements.
---
Nitpick comments:
In `@tests/test_evaluation_queue.py`:
- Around line 38-40: Add type hints to all function signatures that are missing
them in tests/test_evaluation_queue.py. At lines 38-40, the _new_trace_id
function already has a return type hint (-> str) but ensure all parameters and
return types are typed. At lines 49-55, 59-86, and 93-204, add complete type
hints to all function signatures including parameters and return types for
fixtures, test functions, and any helper functions that currently lack type
annotations. This includes adding parameter type hints and return type hints
(use None for functions that don't return a value) to comply with the
repository's Python typing requirements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8cb36779-0558-400e-be08-ff527683452b
📒 Files selected for processing (6)
app/ai/voice/agents/breeze_buddy/handlers/internal/end_conversation.pyapp/ai/voice/agents/breeze_buddy/observability/tracing_setup.pyapp/services/langfuse/tasks/evaluation/__init__.pyapp/services/langfuse/tasks/evaluation/queue.pytests/conftest.pytests/test_evaluation_queue.py
911ae1d to
82a88d8
Compare
Push the bare OTEL trace_id to a Redis list at call end (atomic SETNX+RPUSH via Lua) for the evaluation worker to drain later. Co-Authored-By: Claude <noreply@anthropic.com>
82a88d8 to
f58aa20
Compare
What
PR 1 of the internal LLM-as-judge evaluation system. On call end, extract the OTEL/Langfuse
trace_idand push the bare id onto a Redis list (evaluation:trace_queue) for the evaluation worker (later phase) to drain. Only the trace_id is stored — every other field (tags, call_sid, transcription, payload, …) is fetched from Langfuse by the worker.Changes
get_trace_id(span)intracing_setup.py— returns the 32-char hex OTEL trace_id (1:1 with Langfusetrace.id), guarded byENABLE_BREEZE_BUDDY_TRACING;Nonewhen tracing is off.enqueue_trace_for_evaluation(trace_id)inservices/langfuse/tasks/evaluation/queue.py—SETNXdedup marker (evaluation:enqueued:{trace_id}, 7d) +RPUSHthe bare trace_id; best-effort (swallows Redis errors, never breaks call teardown).end_conversationright afterupdate_span_with_evaluation_data.tests/conftest.pypre-imports the template package so the test process importstracing_setupin the same order as the running app — sidesteps the latenttemplate <-> handlers <-> tracing_setupimport cycle without touching production imports (noTYPE_CHECKING).Out of scope (later PRs)
Schemas/DB layer, the worker (drain queue → Langfuse fetch → LLM judge), actions/Slack, the REST API. The queue is written but not yet consumed — safe to merge standalone.
Testing
get_trace_id(none-span → None; real span → 32-hex), enqueue (empty-noop, push, dedup).end_conversationhandler with a real OTEL span + real Redis (bot/context/task stubbed) → asserts the trace_id lands on the queue.import app.mainOK; server boots to "Application startup complete."black,isort,autoflake,pyrefly(0 errors on these files); 6/6 tests pass against local Redis (skip if Redis down).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests