How to separately track token usage for multiple DSPy calls in Langfuse? #10653

HwJhx · 2025-11-22T15:34:55Z

HwJhx
Nov 22, 2025

Describe your question

Problem:

I have two sequential DSPy calls in my pipeline (see code below), but currently Langfuse only displays the token usage from the last call. I need to track token consumption for each individual DSPy call separately and also see the total aggregated usage.

Expected Behavior:

I want to see token usage displayed separately for each DSPy call in the Langfuse tracing UI, similar to the demo screenshot below where both ai.streamText.doStream calls show their individual token consumption (2,988 → 19 and 6,703 → 1,214), and the total is correctly aggregated at the top (9,691 → 1,233). As below:

Current Behavior:

Only the last DSPy call's token usage is shown:

Question:

How can I properly instrument my code so that each DSPy call creates a separate span/generation in Langfuse with its own token usage metrics?

Additional Context:

In my current code, The update_langfuse_usage function calls langfuse.update_current_generation to track token usage. Both
my_llm_pipeline and my_llm_pipeline222 functions invoke DSPy's ChainOfThought twice.

During execution, the terminal output correctly prints token usage statistics for both calls, confirming that the usage data is being captured. However, when I check the Langfuse tracing UI, only the token usage from the second (last) call is displayed. The first call's token usage is missing from the trace.

My Current Code:

import asyncio
from pathlib import Path
# Get keys for your project from the project settings page: https://cloud.langfuse.com
from dotenv import load_dotenv

# 获取当前文件所在目录，确保能找到 .env 文件
env_path = Path(__file__).parent / '.env'
load_dotenv(dotenv_path=env_path)

# Keys are now loaded from .env file




from langfuse import get_client

langfuse = get_client()
 
# Verify connection
if langfuse.auth_check():
    print("Langfuse client is authenticated and ready!")
else:
    print("Authentication failed. Please check your credentials and host.")



from openinference.instrumentation.dspy import DSPyInstrumentor
 
DSPyInstrumentor().instrument()


import dspy
lm = dspy.LM(
        model="openai/moonshotai/Kimi-K2-Thinking",
        # api_key=openai_api_key,
        api_base="https://api.siliconflow.cn",
        model_type="chat",
        max_tokens=4096,
        cache=False,
        temperature=0.1,  # 进一步增加随机性
        )
dspy.configure(lm=lm, usage_tracker=DSPyInstrumentor().instrument())




from langfuse import observe, propagate_attributes
def llm_infer(input):
    math = dspy.ChainOfThought("question -> answer: str")
    return math(question=input)
    

def llm_analysis(input):
    math = dspy.ChainOfThought("question -> answer: str")
    # 在问题中添加中文回答的要求
    question_with_instruction = f"{input}\n请用中文详细回答。"
    return math(question=question_with_instruction)

def update_langfuse_usage(info:str):
    """
    从 DSPy 的历史记录中提取 token 使用情况，并更新到 Langfuse
    """
    # Debug: Check if DSPy captured usage in history
    print("\n{}".format(info))
    if lm.history:
        last_interaction = lm.history[-1]
        usage = last_interaction.get('usage')
        completion_tokens = usage.get('completion_tokens')
        prompt_tokens = usage.get('prompt_tokens')
        total_tokens = usage.get('total_tokens')
        print(f"Last Usage in History: {last_interaction.get('usage')}")
        print(f"Last Response Keys: {last_interaction.keys()}")

        # 构建 langfuse 所需的 usage 格式
        usage = {
            "input": prompt_tokens,  # 输入token数(包含prompt和原始输入)
            "output": completion_tokens,  # 输出token数
            "total": total_tokens  # 总token数
        }
        
        generation = langfuse.update_current_generation(
            usage_details=usage
        )
        
        # langfuse.flush()
    else:
        print("No history found in lm.")




async def my_llm_pipeline():
    # Add additional attributes (user_id, session_id, metadata, version, tags) to all spans created within this execution scope
 
    input = "请写一个20字左右的散文"
    # YOUR APPLICATION CODE HERE
    result = llm_infer(input)

    # Update the trace input and output
    langfuse.update_current_trace(
        input=input,
        output=result,
    )
    update_langfuse_usage(input)


async def my_llm_pipeline222():

    input = "Two dice are tossed. What is the probability that the sum equals two?"
    result = llm_analysis(input)
    # Update the trace input and output
    langfuse.update_current_trace(
        input=input,
        output=result,
    )
    update_langfuse_usage(input)

@observe(as_type="generation")
async def main():
    # Add additional attributes (user_id, session_id, metadata, version, tags) to all spans created within this execution scope
    with propagate_attributes(
        user_id="user_123",
        session_id="session_abc",
        tags=["agent", "my-trace"],
        metadata={"email": "[email protected]"},
        version="1.0.0"
    ):
        await my_llm_pipeline()
        await my_llm_pipeline222()

asyncio.run(main())

Langfuse Cloud or Self-Hosted?

Langfuse Cloud

If self-hosted, what version are you running?

No response

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

Answered by inkeep-ai-assistant[bot]

Nov 22, 2025

Solution: Create Separate Observations for Each DSPy Call

The issue is that both my_llm_pipeline() and my_llm_pipeline222() are updating the same generation (the main function decorated with @observe(as_type="generation")). To track token usage separately for each DSPy call, you need to create separate observations (spans or generations) for each call(1).

Recommended Approach: Use `@observe()` Decorator

Wrap each DSPy pipeline function with the @observe() decorator to create separate spans with their own token usage tracking(1):

from langfuse import observe, propagate_attributes, get_client

langfuse = get_client()

@observe(as_type="generation", name="llm-infer-call")
async def my_llm_pip…

View full answer

HwJhx · 2025-11-22T15:41:45Z

inkeep-ai-assistant[bot]
bot Nov 22, 2025

Solution: Create Separate Observations for Each DSPy Call

The issue is that both my_llm_pipeline() and my_llm_pipeline222() are updating the same generation (the main function decorated with @observe(as_type="generation")). To track token usage separately for each DSPy call, you need to create separate observations (spans or generations) for each call(1).

Recommended Approach: Use `@observe()` Decorator

Wrap each DSPy pipeline function with the @observe() decorator to create separate spans with their own token usage tracking(1):

from langfuse import observe, propagate_attributes, get_client

langfuse = get_client()

@observe(as_type="generation", name="llm-infer-call")
async def my_llm_pipeline():
    input = "请写一个20字左右的散文"
    result = llm_infer(input)
    
    # Update current generation (this specific call)
    update_langfuse_usage(input)
    
    return result

@observe(as_type="generation", name="llm-analysis-call")
async def my_llm_pipeline222():
    input = "Two dice are tossed. What is the probability that the sum equals two?"
    result = llm_analysis(input)
    
    # Update current generation (this specific call)
    update_langfuse_usage(input)
    
    return result

@observe()
async def main():
    with propagate_attributes(
        user_id="user_123",
        session_id="session_abc",
        tags=["agent", "my-trace"],
        metadata={"email": "[email protected]"},
        version="1.0.0"
    ):
        result1 = await my_llm_pipeline()
        result2 = await my_llm_pipeline222()
        
        # Update trace-level input/output
        langfuse.update_current_trace(
            input={"call1": "请写一个20字左右的散文", "call2": "Two dice are tossed..."},
            output={"result1": result1, "result2": result2}
        )

asyncio.run(main())

(1)

Key Changes:

Decorate each pipeline function with @observe(as_type="generation") to create separate generation observations(1)
Remove update_trace() calls from individual pipeline functions - these were overwriting the trace-level data
Keep update_langfuse_usage() inside each decorated function so it updates the current generation (not the parent trace)(1)
Update trace input/output once in main() if you need trace-level aggregation

How Token Usage is Tracked:

When you call langfuse.update_current_generation() inside a decorated function, it updates that specific generation's usage details(1). The usage details should follow this format(2):

usage_details = {
    "input": prompt_tokens,
    "output": completion_tokens,
    "total": total_tokens  # Optional - will be derived if not set
}

(2)

Result:

Each DSPy call will now appear as a separate generation in the Langfuse UI with its own token usage metrics, and the total will be automatically aggregated at the trace level(2)(1).

📚 Sources:

Have another question? Just tag @inkeep.

1 reply

HwJhx Nov 23, 2025
Author

Thanks. The problem is solved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

How to separately track token usage for multiple DSPy calls in Langfuse? #10653

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

How to separately track token usage for multiple DSPy calls in Langfuse? #10653

Uh oh!

HwJhx Nov 22, 2025

Describe your question

Problem:

Expected Behavior:

Current Behavior:

Question:

Additional Context:

My Current Code:

Langfuse Cloud or Self-Hosted?

If self-hosted, what version are you running?

SDK and integration versions

Pre-Submission Checklist

Solution: Create Separate Observations for Each DSPy Call

Recommended Approach: Use @observe() Decorator

Replies: 1 comment · 1 reply

Uh oh!

inkeep-ai-assistant[bot] bot Nov 22, 2025

Solution: Create Separate Observations for Each DSPy Call

Recommended Approach: Use @observe() Decorator

Key Changes:

How Token Usage is Tracked:

Result:

Uh oh!

HwJhx Nov 23, 2025 Author

HwJhx
Nov 22, 2025

Recommended Approach: Use `@observe()` Decorator

Replies: 1 comment 1 reply

inkeep-ai-assistant[bot]
bot Nov 22, 2025

Recommended Approach: Use `@observe()` Decorator

HwJhx Nov 23, 2025
Author