Skip to content

Conversation

@abhay-sheshadri
Copy link

Added support for passing through LORA Requests into vllm.trace

# Test with LoRA adapter
with vllm.trace(test_prompts, temperature=0.0, max_tokens=500, lora_request=lora_request) as tracer:
    lora_results = vllm.generator.output.save()

abhay-sheshadri and others added 2 commits September 2, 2025 23:29
This commit adds the ability to pass custom metadata (like request_id) to
tracer.invoke() calls and access it within the trace context. This solves
issues with request/response alignment in batched inference scenarios where
lazy evaluation can cause mismatched completions.

Changes:
- Mediator: Add custom_data parameter to store arbitrary metadata
- Invoker: Extract custom parameters (like request_id) from kwargs before
  passing to batcher, store in mediator's custom_data
- Access pattern: Use vllm._interleaver.current.custom_data.get('request_id')
  within trace context

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant