Trulens v1.5.0
Telemetry for the Agentic World: TruLens + OpenTelemetry
Agents are rapidly gaining traction across AI applications. With this growth comes a new set of challenges: how do we trace, observe, and evaluate these dynamic, distributed systems?
Today, we’re excited to share that TruLens now supports OpenTelemetry (OTel), unlocking powerful, interoperable observability for the agentic world.
Challenge for Tracing Agents
Tracing agentic applications is fundamentally different from tracing traditional software systems:
- Language-agnostic: Agents can be written in Python, Go, Java, or more, requiring tracing that transcends language boundaries.
- Distributed by nature: Multi-agent systems often span multiple machines or processes.
- Existing telemetry stacks: Many developers and enterprises already use OpenTelemetry, so tracing compatibility is essential.
- Dynamic execution: Unlike traditional apps, agents often make decisions on the fly, with branching workflows that can’t be fully known in advance.
- Interoperability standards: As frameworks like Model Context Protocol (MCP) and Agent2Agent Protocol (A2A) emerge, tracing must support agents working across different systems.
- Repeated tool usage: Agents may call the same function or tool multiple times in a single execution trace, requiring fine-grained visibility into span grouping to understand what’s happening and why.
What is TruLens
TruLens is an open source library for evaluating and tracing AI agents, including RAG systems and other LLM applications. It combines OpenTelemetry-based tracing with trustworthy evaluations, including both ground truth metrics and reference-free (LLM-as-a-Judge) feedback.
TruLens pioneered the RAG Triad—a structured evaluation of:
- Context relevance
- Groundedness
- Answer relevance
These evaluations provide a foundation for understanding the performance of RAGs and agentic RAGs, supported by benchmarks like LLM-AggreFact, TREC-DL, and HotPotQA.
This combination of trusted evaluators and open standard tracing gives you tools to both improve your application offline and monitor once it reaches production.
How TruLens Augments OpenTelemetry
As AI applications become increasingly agentic, TruLens’ shift to OpenTelemetry enables observability that is:
- Interoperable with existing telemetry stacks
- Compatible across languages and frameworks
- Capable of tracing dynamic agent workflows
TruLens now accepts any span that adheres to the OTel standard.
What is OpenTelemetry?
OpenTelemetry (OTel) is an open-source observability framework for generating, collecting, and exporting telemetry data such as traces, metrics, and logs.
In LLM and agentic contexts, OpenTelemetry enables language-agnostic, interoperable tracing for:
- Multi-agent systems
- Distributed environments
- Tooling interoperability
What is a span?
A span represents a single unit of work. In LLM apps, this might be: planning, routing, retrieval, tool usage, or generation.
TruLens Defines Semantic Conventions for the Agentic World
TruLens maps span attributes to common definitions using semantic conventions to ensure:
- Cross-framework interoperability
- Shared instrumentation for MCP and A2A
- Consistent evaluation across implementations
Using Semantic Conventions to Compute Evaluation Metrics
TruLens allows evaluation of metrics based on span instrumentation.
@instrument(
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
},
)
def retrieve(self, query: str) -> list:
results = vector_store.query(query_texts=query, n_results=4)
return [doc for sublist in results["documents"] for doc in sublist]
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on_context(call_feedback_function_per_entry_in_list=True)
.aggregate(np.mean)
)
Computing Metrics on Complex Execution Flows
TruLens introduces span groups to handle repeated tool calls within a trace.
class App:
@instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
def clean_up_question(question: str, idx: str) -> str:
...
@instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
def clean_up_response(response: str, idx: str) -> str:
...
@instrument()
def combine_responses(cleaned_responses: List[str]) -> str:
...
@instrument()
def query(complex_question: str) -> str:
questions = break_question_down(complex_question)
cleaned_responses = []
for i, question in enumerate(questions):
cleaned_question = clean_up_question(question, str(i))
response = call_llm(cleaned_question)
cleaned_response = clean_up_response(response, str(i))
cleaned_responses.append(cleaned_response)
return combine_responses(cleaned_responses)
How to Examine Execution Flows in TruLens
Run:
session.run_dashboard()
…and visually inspect execution traces. Span types are shown directly in the dashboard to help identify branching, errors, or performance issues.
How to Get Started
- Install TruLens:
pip install trulens-core==1.5.0
- Enable OpenTelemetry:
os.environ["TRULENS_OTEL_TRACING"] = "1"
- Instrument Methods:
from trulens.core.otel.instrument import instrument
@instrument(
attributes={
SpanAttributes.RECORD_ROOT.INPUT: "query",
SpanAttributes.RECORD_ROOT.OUTPUT: "return",
},
)
def query(self, query: str) -> str:
context_str = self.retrieve(query=query)
completion = self.generate_completion(query=query, context_str=context_str)
return completion
- Add Evaluations:
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input()
.on_output()
)
Using selectors:
from trulens.core.feedback.selector import Selector
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on({
"prompt": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.INPUT,
),
})
.on({
"response": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.OUTPUT,
),
})
)
- Register Your App:
from trulens.apps.app import TruApp
rag = RAG(model_name="gpt-4.1-mini")
tru_rag = TruApp(
rag,
app_name="OTEL-RAG",
app_version="4.1-mini",
feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
- Run the Dashboard:
from trulens.dashboard import run_dashboard
run_dashboard(session)
Concluding Thoughts
By building on top of OpenTelemetry, TruLens delivers a universal tracing and evaluation platform for modern AI systems. Whether your agents are built in Python, composed via MCP, or distributed across systems—TruLens provides a common observability layer for telemetry and evaluation.
Let’s build the future of trustworthy agentic AI together.
Full Release Details
- Evaluate Weaviate Query Agents by @sfc-gh-jreini in #1896
- [SNOW-2005227, SNOW-2037060] Add HuggingFace pytest annotation and deflake E2E test issues by @sfc-gh-nvytla in #1907
- move notebooks that shouldn't be in quickstarts to experimental by @sfc-gh-jreini in #1908
- [SNOW-2005236, SNOW-2040963]: Deprecate outdated/obsolete notebooks by @sfc-gh-nvytla in #1909
- Fix
test_dummy.py
andtest_session.py
E2E tests. by @sfc-gh-dkurokawa in #1910 - snowflake ai stack by @sfc-gh-jreini in #1912
- [E2E] Revert temporary change for e2e testing by @sfc-gh-nvytla in #1913
- gpt-4.1 by @sfc-gh-jreini in #1914
- Update contribution guide by @sfc-gh-jreini in #1915
- Restore rag evaluation quickstart notebook by @sfc-gh-nvytla in #1917
- Fix
test_deprecation.py
. by @sfc-gh-dkurokawa in #1918 - Fix
test_serial.py
. by @sfc-gh-dkurokawa in #1916 - Set up algorithm for feedback selectors 2.0. by @sfc-gh-dkurokawa in #1905
- Nit: Flakiness fix for test_tru_llama by @sfc-gh-nvytla in #1919
- Add some tests I forgot to add in PR#1905 on feedback computation. by @sfc-gh-dkurokawa in #1920
- Add function name to span attribute
ai.observability.call.function
. by @sfc-gh-dkurokawa in #1922 - Fix "future" issue on E2E tests. by @sfc-gh-dkurokawa in #1921
- Add experimental langgraph multi-agent demo by @sfc-gh-dhuang in #1923
- Add dashboard GIFs to weaviate query agents notebook by @sfc-gh-jreini in #1925
- Track langgraph saved images with Git LFS, fix formatting nit by @sfc-gh-nvytla in #1926
- Unlock poetry > 2.0 by @sfc-gh-chu in #1930
- Small E2E test fixes. by @sfc-gh-dkurokawa in #1931
- Allow enabling/disabling OTEL instrumentation. by @sfc-gh-dkurokawa in #1932
- Fix
test_tru_llama.py
tests. by @sfc-gh-dkurokawa in #1933 - Clear dummy stack after
test_dummy.py
so that it doesn't affect other tests. by @sfc-gh-dkurokawa in #1935 - add backlinks to weaviate by @sfc-gh-jreini in #1936
- Allow feedback
Selector
for OTEL selectors to have more options. by @sfc-gh-dkurokawa in #1938 - Add missing dependency in requirements.txt (chromadb) for snowflake ai stack demo by @sfc-gh-dhuang in #1942
- Update
CALL
semantic convention. by @sfc-gh-dkurokawa in #1943 - [SNOW-2005237] Fix notebook E2E tests by @sfc-gh-dhuang in #1941
- Increase Cortex complete timeout to 60s for tests. by @sfc-gh-dkurokawa in #1944
- trulens-core tile in readme by @sfc-gh-jreini in #1947
- Initial draft of OTEL-specific Record Viewer by @sfc-gh-gtokernliang in #1948
- Convert
ai.observability.eval_root.result
toai.observability.eval_root.score
as that's what Snowflake metric computation does. by @sfc-gh-dkurokawa in #1946 - Pin certifi version to unblock e2e pipeline by @sfc-gh-dhuang in #1945
- [SNOW-2005227, SNOW-2061190] v0 OTEL implementation of get_records_and_feedback by @sfc-gh-nvytla in #1939
- When converting a span attribute to a string, try to JSONify before stringifying. by @sfc-gh-dkurokawa in #1949
- Updating gitignore files by @sfc-gh-dhuang in #1953
- Add storybook tests by @sfc-gh-gtokernliang in #1952
- Introduce snapshot testing for React components by @sfc-gh-gtokernliang in #1954
- Update OTEL semantic convention in
EVAL_ROOT
to hold more info about feedback function metadata. by @sfc-gh-dkurokawa in #1955 - Split
Selector
into its own file so it's easier for users to use and doesn't easily induce cyclic imports. by @sfc-gh-dkurokawa in #1957 - Add accordion functionality to Panel. by @sfc-gh-dkurokawa in #1956
- Write app to App defn table in OTEL again. by @sfc-gh-dkurokawa in #1958
- [SNOW-2081987] Enable OTEL record pagination based on unique, sorted record_ids from event table by @sfc-gh-nvytla in #1951
- Update Jenkinsfile to add Jest + Storybook tests by @sfc-gh-gtokernliang in #1950
- langgraph + snowflake tools by @sfc-gh-jreini in #1940
- Use the full function name for span attributes. by @sfc-gh-dkurokawa in #1966
- Write app id to span attributes. by @sfc-gh-dkurokawa in #1967
- Add more storybook tests for RecordTree by @sfc-gh-gtokernliang in #1961
- Add valid app info columns for get_records_and_feedback by @sfc-gh-nvytla in #1965
- Add storybook tests for deep nodes + test refactoring by @sfc-gh-gtokernliang in #1962
- [Feat] Add Report a Bug button by @sfc-gh-nvytla in #1970
- Allow for feedbacks in OTEL and kick them off via a new
TruApp::compute_feedback
function. by @sfc-gh-dkurokawa in #1971 - Consolidate checking of
TRULENS_OTEL_TRACING
env variable to check if running in OTel mode. by @sfc-gh-dkurokawa in #1975 - [Fix] Update latency from milliseconds to seconds by @sfc-gh-nvytla in #1968
- [Fix] Allow _json_extract_otel helper method to access other JSON columns in the Event ORM table by @sfc-gh-nvytla in #1972
- Add handling for orphaned nodes by @sfc-gh-gtokernliang in #1963
- Add newly supported Cortex models in cost tracking table by @sfc-gh-dhuang in #1974
- Connect Streamlit <> OTEL Record viewer by @sfc-gh-gtokernliang in #1964
- [Fix] Update groundedness measure return to 0.0 instead of NaN when no non-trivial statements are found. by @sfc-gh-nvytla in #1973
- Don't recompute feedback functions if already done. by @sfc-gh-dkurokawa in #1977
- Deprecate SnowflakeFeedback by @sfc-gh-jreini in #1979
- Update langgraph/cortex example span attributes by @sfc-gh-jreini in #1976
- Create automatic job to compute feedbacks. by @sfc-gh-dkurokawa in #1982
- Check that wrapped apps are garbage collected correctly. by @sfc-gh-dkurokawa in #1983
- Start evaluator immediately and also require that
FeedbackMode
must beWITH_APP_THREAD
(the default). by @sfc-gh-dkurokawa in #1984 - Allow use of
Feedback
'son_input
,on_output
,on_input_output
,on_default
methods. by @sfc-gh-dkurokawa in #1985 - [SNOW-2061186, SNOW-2105931] Connect record_viewer_otel (FE) to Event table (BE) by @sfc-gh-nvytla in #1981
- Update vscode settings to automatically handle line length better for python files. by @sfc-gh-dkurokawa in #1986
- [SNOW-2111562] Render feedback visualizations for OTEL records by @sfc-gh-nvytla in #1988
- Make old
@instrument
decorator work in OTel world. by @sfc-gh-dkurokawa in #1989 - Consistently use TruApp in place of TruCustomApp by @sfc-gh-jreini in #1980
- Progress towards otel quickstart readiness by @sfc-gh-jreini in #1991
- let dashboard pick up env variable instead of using flag by @sfc-gh-jreini in #1992
- Make sure to clean up context manager after using it once. by @sfc-gh-dkurokawa in #1993
- [Minor] Update WITH_OTEL_TRACING to TRULENS_OTEL_TRACING by @sfc-gh-nvytla in #1995
- Create
Recording
when using aApp
as a context manager. by @sfc-gh-dkurokawa in #1997 - [Minor] Allow trulens_trace to handle both record (pre-otel) and record id string (otel) by @sfc-gh-nvytla in #1996
- Use
ai.observability.eval_root.higher_is_better
. by @sfc-gh-dkurokawa in #1998 - Update OTEL feedback score rendering and columns by @sfc-gh-nvytla in #1994
- Work on streamlit UI components, streaming for OTEL by @sfc-gh-jreini in #2000
- Handle aggregations and also emit eval sub-step spans. by @sfc-gh-dkurokawa in #2004
- Have
recording.get
wait for record to show up in event table before giving to user. by @sfc-gh-dkurokawa in #2006 - Fix OpenAI provider in python 3.12+ to work with the changed @cached_property behavior by @sfc-gh-dhuang in #2005
- Summit demo streamlit app: OTEL + multi-agentic evals by @sfc-gh-dhuang in #2007
- Don't require a main method. by @sfc-gh-dkurokawa in #2009
- UI updates to agent demo by @sfc-gh-jreini in #2013
- Summit demo streamlit app: convert trajectory eval to custom feedback function with the new Selector by @sfc-gh-dhuang in #2011
- Fix TruLens nits by @sfc-gh-gtokernliang in #2010
- Fix Feedback function histogram filtering out 1 by @sfc-gh-gtokernliang in #2012
- Create
Feedback::on_context
andSelector::select_context
methods to make it easier to create feedback functions on the context. by @sfc-gh-dkurokawa in #2015 - Display generated charts in the agent demo by @sfc-gh-jreini in #2016
- Fix more TruLens UI nits by @sfc-gh-gtokernliang in #2019
- Add spinners to evals in agent demo ui by @sfc-gh-jreini in #2021
- Remove "name" span attribute. by @sfc-gh-dkurokawa in #2017
- fix app version search and advanced filters search by @sfc-gh-chu in #2027
- use on_context for ffs in quickstart by @sfc-gh-chu in #2022
- Remove old "selector" span attribute stuff. by @sfc-gh-dkurokawa in #2020
- Fix how OTEL mode is being checked by @sfc-gh-gtokernliang in #2029
- Don't use session state api for metadata dropdown by @sfc-gh-chu in #2023
- Nit: Clean up and standardize otel tracing params by @sfc-gh-nvytla in #2031
- Move app info to resource attributes. by @sfc-gh-dkurokawa in #1999
- set otel os env var from streamlit flag by @sfc-gh-chu in #2028
- Add semantic conventions to docs. by @sfc-gh-dkurokawa in #2036
- Add explanations and fix groundedness calls in streamlit by @sfc-gh-chu in #2014
- Add more functionality to the
Recording
class to mimic pre-OTel behavior. Also add aretrieve_feeback_results
function. by @sfc-gh-dkurokawa in #2032 - Rename
call_feedback_function_per_entry_in_list
tocollect_list
and change polarity of the boolean. by @sfc-gh-dkurokawa in #2033 - Fix some e2e tests. by @sfc-gh-dkurokawa in #2034
- Prepare agent for demo by @sfc-gh-jreini in #2026
- Fail if trying to run a pre-OTel only function in OTel mode. by @sfc-gh-dkurokawa in #2038
- demo: add dummy keys for openai, change localhost port by @sfc-gh-jreini in #2041
- Fix record table nits by @sfc-gh-gtokernliang in #2025
- Put metric name in
ai.observability.eval_root.metric_name
as well asai.observability.eval.metric_name
. by @sfc-gh-dkurokawa in #2039
Full Changelog: trulens-1.4.9...trulens-1.5.0