Telemetry for the Agentic World: TruLens + OpenTelemetry

Agents are rapidly gaining traction across AI applications. With this growth comes a new set of challenges: how do we trace, observe, and evaluate these dynamic, distributed systems?
Today, we’re excited to share that TruLens now supports OpenTelemetry (OTel), unlocking powerful, interoperable observability for the agentic world.

Challenge for Tracing Agents

Tracing agentic applications is fundamentally different from tracing traditional software systems:

Language-agnostic: Agents can be written in Python, Go, Java, or more, requiring tracing that transcends language boundaries.
Distributed by nature: Multi-agent systems often span multiple machines or processes.
Existing telemetry stacks: Many developers and enterprises already use OpenTelemetry, so tracing compatibility is essential.
Dynamic execution: Unlike traditional apps, agents often make decisions on the fly, with branching workflows that can’t be fully known in advance.
Interoperability standards: As frameworks like Model Context Protocol (MCP) and Agent2Agent Protocol (A2A) emerge, tracing must support agents working across different systems.
Repeated tool usage: Agents may call the same function or tool multiple times in a single execution trace, requiring fine-grained visibility into span grouping to understand what’s happening and why.

What is TruLens

TruLens is an open source library for evaluating and tracing AI agents, including RAG systems and other LLM applications. It combines OpenTelemetry-based tracing with trustworthy evaluations, including both ground truth metrics and reference-free (LLM-as-a-Judge) feedback.

TruLens pioneered the RAG Triad—a structured evaluation of:

Context relevance
Groundedness
Answer relevance

These evaluations provide a foundation for understanding the performance of RAGs and agentic RAGs, supported by benchmarks like LLM-AggreFact, TREC-DL, and HotPotQA.

This combination of trusted evaluators and open standard tracing gives you tools to both improve your application offline and monitor once it reaches production.

How TruLens Augments OpenTelemetry

As AI applications become increasingly agentic, TruLens’ shift to OpenTelemetry enables observability that is:

Interoperable with existing telemetry stacks
Compatible across languages and frameworks
Capable of tracing dynamic agent workflows

TruLens now accepts any span that adheres to the OTel standard.

What is OpenTelemetry?

OpenTelemetry (OTel) is an open-source observability framework for generating, collecting, and exporting telemetry data such as traces, metrics, and logs.

In LLM and agentic contexts, OpenTelemetry enables language-agnostic, interoperable tracing for:

Multi-agent systems
Distributed environments
Tooling interoperability

What is a span?
A span represents a single unit of work. In LLM apps, this might be: planning, routing, retrieval, tool usage, or generation.

TruLens Defines Semantic Conventions for the Agentic World

TruLens maps span attributes to common definitions using semantic conventions to ensure:

Cross-framework interoperability
Shared instrumentation for MCP and A2A
Consistent evaluation across implementations

Using Semantic Conventions to Compute Evaluation Metrics

TruLens allows evaluation of metrics based on span instrumentation.

@instrument(
    span_type=SpanAttributes.SpanType.RETRIEVAL,
    attributes={
        SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
        SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
    },
)
def retrieve(self, query: str) -> list:
    results = vector_store.query(query_texts=query, n_results=4)
    return [doc for sublist in results["documents"] for doc in sublist]

f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on_context(call_feedback_function_per_entry_in_list=True)
    .aggregate(np.mean)
)

Computing Metrics on Complex Execution Flows

TruLens introduces span groups to handle repeated tool calls within a trace.

class App:

    @instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
    def clean_up_question(question: str, idx: str) -> str:
        ...

    @instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
    def clean_up_response(response: str, idx: str) -> str:
        ...

    @instrument()
    def combine_responses(cleaned_responses: List[str]) -> str:
        ...

    @instrument()
    def query(complex_question: str) -> str:
        questions = break_question_down(complex_question)
        cleaned_responses = []
        for i, question in enumerate(questions):
            cleaned_question = clean_up_question(question, str(i))
            response = call_llm(cleaned_question)
            cleaned_response = clean_up_response(response, str(i))
            cleaned_responses.append(cleaned_response)
        return combine_responses(cleaned_responses)

How to Examine Execution Flows in TruLens

Run:

session.run_dashboard()

…and visually inspect execution traces. Span types are shown directly in the dashboard to help identify branching, errors, or performance issues.

How to Get Started

Install TruLens:

pip install trulens-core==1.5.0

Enable OpenTelemetry:

os.environ["TRULENS_OTEL_TRACING"] = "1"

Instrument Methods:

from trulens.core.otel.instrument import instrument

@instrument(
    attributes={
        SpanAttributes.RECORD_ROOT.INPUT: "query",
        SpanAttributes.RECORD_ROOT.OUTPUT: "return",
    },
)
def query(self, query: str) -> str:
    context_str = self.retrieve(query=query)
    completion = self.generate_completion(query=query, context_str=context_str)
    return completion

Add Evaluations:

f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input()
    .on_output()
)

Using selectors:

from trulens.core.feedback.selector import Selector

f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on({
        "prompt": Selector(
            span_type=SpanAttributes.SpanType.RECORD_ROOT,
            span_attribute=SpanAttributes.RECORD_ROOT.INPUT,
        ),
    })
    .on({
        "response": Selector(
            span_type=SpanAttributes.SpanType.RECORD_ROOT,
            span_attribute=SpanAttributes.RECORD_ROOT.OUTPUT,
        ),
    })
)

Register Your App:

from trulens.apps.app import TruApp

rag = RAG(model_name="gpt-4.1-mini")

tru_rag = TruApp(
    rag,
    app_name="OTEL-RAG",
    app_version="4.1-mini",
    feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)

Run the Dashboard:

from trulens.dashboard import run_dashboard

run_dashboard(session)

Concluding Thoughts

By building on top of OpenTelemetry, TruLens delivers a universal tracing and evaluation platform for modern AI systems. Whether your agents are built in Python, composed via MCP, or distributed across systems—TruLens provides a common observability layer for telemetry and evaluation.

Let’s build the future of trustworthy agentic AI together.

Full Release Details

Evaluate Weaviate Query Agents by @sfc-gh-jreini in #1896
[SNOW-2005227, SNOW-2037060] Add HuggingFace pytest annotation and deflake E2E test issues by @sfc-gh-nvytla in #1907
move notebooks that shouldn't be in quickstarts to experimental by @sfc-gh-jreini in #1908
[SNOW-2005236, SNOW-2040963]: Deprecate outdated/obsolete notebooks by @sfc-gh-nvytla in #1909
Fix test_dummy.py and test_session.py E2E tests. by @sfc-gh-dkurokawa in #1910
snowflake ai stack by @sfc-gh-jreini in #1912
[E2E] Revert temporary change for e2e testing by @sfc-gh-nvytla in #1913
gpt-4.1 by @sfc-gh-jreini in #1914
Update contribution guide by @sfc-gh-jreini in #1915
Restore rag evaluation quickstart notebook by @sfc-gh-nvytla in #1917
Fix test_deprecation.py. by @sfc-gh-dkurokawa in #1918
Fix test_serial.py. by @sfc-gh-dkurokawa in #1916
Set up algorithm for feedback selectors 2.0. by @sfc-gh-dkurokawa in #1905
Nit: Flakiness fix for test_tru_llama by @sfc-gh-nvytla in #1919
Add some tests I forgot to add in PR#1905 on feedback computation. by @sfc-gh-dkurokawa in #1920
Add function name to span attribute ai.observability.call.function. by @sfc-gh-dkurokawa in #1922
Fix "future" issue on E2E tests. by @sfc-gh-dkurokawa in #1921
Add experimental langgraph multi-agent demo by @sfc-gh-dhuang in #1923
Add dashboard GIFs to weaviate query agents notebook by @sfc-gh-jreini in #1925
Track langgraph saved images with Git LFS, fix formatting nit by @sfc-gh-nvytla in #1926
Unlock poetry > 2.0 by @sfc-gh-chu in #1930
Small E2E test fixes. by @sfc-gh-dkurokawa in #1931
Allow enabling/disabling OTEL instrumentation. by @sfc-gh-dkurokawa in #1932
Fix test_tru_llama.py tests. by @sfc-gh-dkurokawa in #1933
Clear dummy stack after test_dummy.py so that it doesn't affect other tests. by @sfc-gh-dkurokawa in #1935
add backlinks to weaviate by @sfc-gh-jreini in #1936
Allow feedback Selector for OTEL selectors to have more options. by @sfc-gh-dkurokawa in #1938
Add missing dependency in requirements.txt (chromadb) for snowflake ai stack demo by @sfc-gh-dhuang in #1942
Update CALL semantic convention. by @sfc-gh-dkurokawa in #1943
[SNOW-2005237] Fix notebook E2E tests by @sfc-gh-dhuang in #1941
Increase Cortex complete timeout to 60s for tests. by @sfc-gh-dkurokawa in #1944
trulens-core tile in readme by @sfc-gh-jreini in #1947
Initial draft of OTEL-specific Record Viewer by @sfc-gh-gtokernliang in #1948
Convert ai.observability.eval_root.result to ai.observability.eval_root.score as that's what Snowflake metric computation does. by @sfc-gh-dkurokawa in #1946
Pin certifi version to unblock e2e pipeline by @sfc-gh-dhuang in #1945
[SNOW-2005227, SNOW-2061190] v0 OTEL implementation of get_records_and_feedback by @sfc-gh-nvytla in #1939
When converting a span attribute to a string, try to JSONify before stringifying. by @sfc-gh-dkurokawa in #1949
Updating gitignore files by @sfc-gh-dhuang in #1953
Add storybook tests by @sfc-gh-gtokernliang in #1952
Introduce snapshot testing for React components by @sfc-gh-gtokernliang in #1954
Update OTEL semantic convention in EVAL_ROOT to hold more info about feedback function metadata. by @sfc-gh-dkurokawa in #1955
Split Selector into its own file so it's easier for users to use and doesn't easily induce cyclic imports. by @sfc-gh-dkurokawa in #1957
Add accordion functionality to Panel. by @sfc-gh-dkurokawa in #1956
Write app to App defn table in OTEL again. by @sfc-gh-dkurokawa in #1958
[SNOW-2081987] Enable OTEL record pagination based on unique, sorted record_ids from event table by @sfc-gh-nvytla in #1951
Update Jenkinsfile to add Jest + Storybook tests by @sfc-gh-gtokernliang in #1950
langgraph + snowflake tools by @sfc-gh-jreini in #1940
Use the full function name for span attributes. by @sfc-gh-dkurokawa in #1966
Write app id to span attributes. by @sfc-gh-dkurokawa in #1967
Add more storybook tests for RecordTree by @sfc-gh-gtokernliang in #1961
Add valid app info columns for get_records_and_feedback by @sfc-gh-nvytla in #1965
Add storybook tests for deep nodes + test refactoring by @sfc-gh-gtokernliang in #1962
[Feat] Add Report a Bug button by @sfc-gh-nvytla in #1970
Allow for feedbacks in OTEL and kick them off via a new TruApp::compute_feedback function. by @sfc-gh-dkurokawa in #1971
Consolidate checking of TRULENS_OTEL_TRACING env variable to check if running in OTel mode. by @sfc-gh-dkurokawa in #1975
[Fix] Update latency from milliseconds to seconds by @sfc-gh-nvytla in #1968
[Fix] Allow _json_extract_otel helper method to access other JSON columns in the Event ORM table by @sfc-gh-nvytla in #1972
Add handling for orphaned nodes by @sfc-gh-gtokernliang in #1963
Add newly supported Cortex models in cost tracking table by @sfc-gh-dhuang in #1974
Connect Streamlit <> OTEL Record viewer by @sfc-gh-gtokernliang in #1964
[Fix] Update groundedness measure return to 0.0 instead of NaN when no non-trivial statements are found. by @sfc-gh-nvytla in #1973
Don't recompute feedback functions if already done. by @sfc-gh-dkurokawa in #1977
Deprecate SnowflakeFeedback by @sfc-gh-jreini in #1979
Update langgraph/cortex example span attributes by @sfc-gh-jreini in #1976
Create automatic job to compute feedbacks. by @sfc-gh-dkurokawa in #1982
Check that wrapped apps are garbage collected correctly. by @sfc-gh-dkurokawa in #1983
Start evaluator immediately and also require that FeedbackMode must be WITH_APP_THREAD (the default). by @sfc-gh-dkurokawa in #1984
Allow use of Feedback's on_input, on_output, on_input_output, on_default methods. by @sfc-gh-dkurokawa in #1985
[SNOW-2061186, SNOW-2105931] Connect record_viewer_otel (FE) to Event table (BE) by @sfc-gh-nvytla in #1981
Update vscode settings to automatically handle line length better for python files. by @sfc-gh-dkurokawa in #1986
[SNOW-2111562] Render feedback visualizations for OTEL records by @sfc-gh-nvytla in #1988
Make old @instrument decorator work in OTel world. by @sfc-gh-dkurokawa in #1989
Consistently use TruApp in place of TruCustomApp by @sfc-gh-jreini in #1980
Progress towards otel quickstart readiness by @sfc-gh-jreini in #1991
let dashboard pick up env variable instead of using flag by @sfc-gh-jreini in #1992
Make sure to clean up context manager after using it once. by @sfc-gh-dkurokawa in #1993
[Minor] Update WITH_OTEL_TRACING to TRULENS_OTEL_TRACING by @sfc-gh-nvytla in #1995
Create Recording when using a App as a context manager. by @sfc-gh-dkurokawa in #1997
[Minor] Allow trulens_trace to handle both record (pre-otel) and record id string (otel) by @sfc-gh-nvytla in #1996
Use ai.observability.eval_root.higher_is_better. by @sfc-gh-dkurokawa in #1998
Update OTEL feedback score rendering and columns by @sfc-gh-nvytla in #1994
Work on streamlit UI components, streaming for OTEL by @sfc-gh-jreini in #2000
Handle aggregations and also emit eval sub-step spans. by @sfc-gh-dkurokawa in #2004
Have recording.get wait for record to show up in event table before giving to user. by @sfc-gh-dkurokawa in #2006
Fix OpenAI provider in python 3.12+ to work with the changed @cached_property behavior by @sfc-gh-dhuang in #2005
Summit demo streamlit app: OTEL + multi-agentic evals by @sfc-gh-dhuang in #2007
Don't require a main method. by @sfc-gh-dkurokawa in #2009
UI updates to agent demo by @sfc-gh-jreini in #2013
Summit demo streamlit app: convert trajectory eval to custom feedback function with the new Selector by @sfc-gh-dhuang in #2011
Fix TruLens nits by @sfc-gh-gtokernliang in #2010
Fix Feedback function histogram filtering out 1 by @sfc-gh-gtokernliang in #2012
Create Feedback::on_context and Selector::select_context methods to make it easier to create feedback functions on the context. by @sfc-gh-dkurokawa in #2015
Display generated charts in the agent demo by @sfc-gh-jreini in #2016
Fix more TruLens UI nits by @sfc-gh-gtokernliang in #2019
Add spinners to evals in agent demo ui by @sfc-gh-jreini in #2021
Remove "name" span attribute. by @sfc-gh-dkurokawa in #2017
fix app version search and advanced filters search by @sfc-gh-chu in #2027
use on_context for ffs in quickstart by @sfc-gh-chu in #2022
Remove old "selector" span attribute stuff. by @sfc-gh-dkurokawa in #2020
Fix how OTEL mode is being checked by @sfc-gh-gtokernliang in #2029
Don't use session state api for metadata dropdown by @sfc-gh-chu in #2023
Nit: Clean up and standardize otel tracing params by @sfc-gh-nvytla in #2031
Move app info to resource attributes. by @sfc-gh-dkurokawa in #1999
set otel os env var from streamlit flag by @sfc-gh-chu in #2028
Add semantic conventions to docs. by @sfc-gh-dkurokawa in #2036
Add explanations and fix groundedness calls in streamlit by @sfc-gh-chu in #2014
Add more functionality to the Recording class to mimic pre-OTel behavior. Also add a retrieve_feeback_results function. by @sfc-gh-dkurokawa in #2032
Rename call_feedback_function_per_entry_in_list to collect_list and change polarity of the boolean. by @sfc-gh-dkurokawa in #2033
Fix some e2e tests. by @sfc-gh-dkurokawa in #2034
Prepare agent for demo by @sfc-gh-jreini in #2026
Fail if trying to run a pre-OTel only function in OTel mode. by @sfc-gh-dkurokawa in #2038
demo: add dummy keys for openai, change localhost port by @sfc-gh-jreini in #2041
Fix record table nits by @sfc-gh-gtokernliang in #2025
Put metric name in ai.observability.eval_root.metric_name as well as ai.observability.eval.metric_name. by @sfc-gh-dkurokawa in #2039

Full Changelog: trulens-1.4.9...trulens-1.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trulens v1.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Telemetry for the Agentic World: TruLens + OpenTelemetry

Challenge for Tracing Agents

What is TruLens

How TruLens Augments OpenTelemetry

What is OpenTelemetry?

TruLens Defines Semantic Conventions for the Agentic World

Using Semantic Conventions to Compute Evaluation Metrics

Computing Metrics on Complex Execution Flows

How to Examine Execution Flows in TruLens

How to Get Started

Concluding Thoughts

Full Release Details

Contributors

Uh oh!