Releases: truera/trulens
TruLens v2.1.2
What's Changed
- Address feedback on the data agents notebook by @sfc-gh-jreini in #2146
- Make empty events warning quieter by @sfc-gh-jreini in #2145
- improve groundedness reliability by @sfc-gh-jreini in #2147
- Sort siblings in the record viewer by their start time. by @sfc-gh-dkurokawa in #2149
- Lower Langgraph version required in
trulens-apps-langgraph
by @sfc-gh-dhuang in #2150 - Make trulens-core no longer depend on trulens-feedback by @sfc-gh-dhuang in #2151
- Update span name to task name for Langgraph tracing by @sfc-gh-dhuang in #2154
Full Changelog: trulens-2.1.1...trulens-2.1.2
TruLens v2.1.1
What's Changed
- Allow TruLens to use reasoning models in OpenAI provider by @sfc-gh-dhuang in #2138
- Fixed Score Parsing error & Added Plan Adherence and Plan Quality by @sfc-gh-ajia in #2137
- Improve main_input guessing logic by @sfc-gh-dhuang in #2142
- In-line evals notebook by @sfc-gh-jreini in #2143
Full Changelog: trulens-2.1.0...trulens-2.1.1
TruLens v2.1.0
TruLens v2.1
TruLens 2.1 includes a number of new features and bug fixes to support tracing and evaluation of agents including Inline evals, trajectory evals, native LangGraph instrumentation (via TruGraph
). Additionally, we made a variety of stability improvements to evaluators benefiting both OSS and Snowflake users including structured output support and shifting to new more stable serverside metric computation in Snowflake.
New Features
- Create
@inline_evaluations
decorator. by @sfc-gh-dkurokawa in #2127 - Allow for trace level metrics. by @sfc-gh-dkurokawa in #2075
- Add structured outputs to feedback providers by @sfc-gh-chu in #2098
- Validate openai models support for structured output by @sfc-gh-chu in #2102
- Enable OTel event queries with app_version by @sfc-gh-nvytla in #2099
- Implement TruGraph for Langgraph instrumentation by @sfc-gh-dhuang in #2114
- Support externalbrowser authentication in Snowflake for the Streamlit UI. by @sfc-gh-dkurokawa in #2130
- Create
trulens.apps.langgraph
package. by @sfc-gh-dkurokawa in #2119 - Minor evaluator improvements. by @sfc-gh-dkurokawa in #2123
- Trajectory Evals (experimental) by @sfc-gh-ajia in #2108
- Run feedback function on everything in event table initially. by @sfc-gh-dkurokawa in #2118
- [SDK][Major Version Change] SDK (Snowflake) uses LLM orchestration layer by @sfc-gh-dhuang in #1969
Bug Fixes
- Feat: Modernize trajectory evals to v2 feedback system with few-shot example support by @sfc-gh-nvytla in #2121
- Fix issues with using JSON in Snowflake regular tables. by @sfc-gh-dkurokawa in #2095
- [Snowflake] [major version release] Remove deprecated SDK orchestration code. by @sfc-gh-dhuang in #2091
- Allow ignoring None values. by @sfc-gh-dkurokawa in #2096
- Fix optionals for criteria ranges by @sfc-gh-chu in #2097
- Update
snowflake-connector-python
version dependency as the old one doesn't work for some auth stuff. by @sfc-gh-dkurokawa in #2125 - Accept Anaconda's TOS. by @sfc-gh-dkurokawa in #2128
- Fix: Handle division by zero errors in feedback, groundtruth, and hotspots, improve test coverage for groundtruth by @sfc-gh-nvytla in #2124
- Fix conda build issues by restricting
smmap
version. by @sfc-gh-dkurokawa in #2133 - Test: Add test coverage for GroundTruthAgreement by @sfc-gh-nvytla in #2132
- Handle when the spans don't have app ids in the UI. by @sfc-gh-dkurokawa in #2131
- update meta version by @sfc-gh-chu in #2074
- [Snowflake] Add test dataset notebook for LLM GA work with orchestration layer by @sfc-gh-dhuang in #2086
- [Snowflake][Major version change] Use new sprocs to ensure correct batch ingestion behavior by @sfc-gh-dhuang in #2090
- set alembic path_separator by @sfc-gh-chu in #2106
- Use pydantic ConfigDict and inherit by @sfc-gh-chu in #2105
- Allow
ignore_none_values
to be set forSelector
and set toFalse
by default. by @sfc-gh-dkurokawa in #2111
Docs
- [Docs] Nit: Update grammar for guardrails doc by @sfc-gh-nvytla in #2077
- [Docs] Improve and standardize language for instrumentation-specific docs, fix grammar and spelling errors by @sfc-gh-nvytla in #2078
- [Docs] Fix code examples, update grammar, spelling for logging-specific docs by @sfc-gh-nvytla in #2079
- [Docs] Fix grammatical and spelling errors and standardize language for evaluation-specific component guides and notebooks by @sfc-gh-nvytla in #2072
- remove online link for grit migration by @sfc-gh-jreini in #2085
- Add docs for inline evaluations by @sfc-gh-jreini in #2129
- Add DeepWiki badge to README by @sfc-gh-nvytla in #2088
- Feat: Update and standardize docstrings for llm_provider.py by @sfc-gh-nvytla in #2117
- Docs: Update docs website year, fix homepage link by @sfc-gh-nvytla in #2122
- More updates to selecting span components guides by @sfc-gh-jreini in #2089
- trace level selection docs by @sfc-gh-jreini in #2103
- Update trulens.org homepage with agent & otel copy by @sfc-gh-jreini in #2081
- [Docs] Standardize language, improve formatting for instrumentation, contributing, other docs by @sfc-gh-nvytla in #2080
- Update instrumentation component guides for OTEL by @sfc-gh-jreini in #2082
- remove outdated eval benchmarks from nav by @sfc-gh-jreini in #2084
- update selecting components page for OTel by @sfc-gh-jreini in #2083
Examples
- [Tutorials] Updates for LangChain notebooks by @sfc-gh-nvytla in #2092
- Notebook OTEL conversions and cleanup by @sfc-gh-jreini in #2107
- [Experimental] Add WIP notebook to directly compute feedbacks on OTel spans by @sfc-gh-nvytla in #2100
- Update snowflake data agent/langgraph example by @sfc-gh-jreini in #2120
- Formatting for snowflaketools nb by @sfc-gh-jreini in #2126
- Get more quickstarts OTEL-ready by @sfc-gh-jreini in #2076
- Update example notebooks to use the new TruGraph by @sfc-gh-dhuang in #2134
New Contributors
- @sfc-gh-ajia made their first contribution in #2108
Full Changelog: trulens-1.5.3...trulens-2.1.0
TruLens v1.5.3
What's Changed
- Get more quickstarts OTEL-ready by @sfc-gh-jreini in #2076
- [Docs] Nit: Update grammar for guardrails doc by @sfc-gh-nvytla in #2077
- [Docs] Improve and standardize language for instrumentation-specific docs, fix grammar and spelling errors by @sfc-gh-nvytla in #2078
- [Docs] Fix code examples, update grammar, spelling for logging-specific docs by @sfc-gh-nvytla in #2079
- Update trulens.org homepage with agent & otel copy by @sfc-gh-jreini in #2081
- [Docs] Standardize language, improve formatting for instrumentation, contributing, other docs by @sfc-gh-nvytla in #2080
- Update instrumentation component guides for OTEL by @sfc-gh-jreini in #2082
- remove outdated eval benchmarks from nav by @sfc-gh-jreini in #2084
- update selecting components page for OTel by @sfc-gh-jreini in #2083
- remove online link for grit migration by @sfc-gh-jreini in #2085
- [Docs] Fix grammatical and spelling errors and standardize language for evaluation-specific component guides and notebooks by @sfc-gh-nvytla in #2072
- Add DeepWiki badge to README by @sfc-gh-nvytla in #2088
- update meta version by @sfc-gh-chu in #2074
- [Snowflake] Add test dataset notebook for LLM GA work with orchestration layer by @sfc-gh-dhuang in #2086
- More updates to selecting span components guides by @sfc-gh-jreini in #2089
- [Tutorials] Updates for LangChain notebooks by @sfc-gh-nvytla in #2092
- Allow for trace level metrics. by @sfc-gh-dkurokawa in #2075
- Fix issues with using JSON in Snowflake regular tables. by @sfc-gh-dkurokawa in #2095
- Allow ignoring None values. by @sfc-gh-dkurokawa in #2096
- Fix optionals for criteria ranges by @sfc-gh-chu in #2097
- [Experimental] Add WIP notebook to directly compute feedbacks on OTel spans by @sfc-gh-nvytla in #2100
- Enable OTel event queries with app_version by @sfc-gh-nvytla in #2099
- Add structured outputs to feedback providers by @sfc-gh-chu in #2098
- Validate openai models support for structured output by @sfc-gh-chu in #2102
- trace level selection docs by @sfc-gh-jreini in #2103
- set alembic path_separator by @sfc-gh-chu in #2106
Full Changelog: trulens-1.5.2...trulens-1.5.3
TruLens v1.5.2
What's Changed
- Call
model_fields
andmodel_computed_fields
from the class and not the instance. by @sfc-gh-dkurokawa in #2018 - [Nit] Fix grammar for otel blog banner shoutout by @sfc-gh-nvytla in #2053
- [Nit] Sans serif font fix by @sfc-gh-nvytla in #2054
- Improve error messaging for no records found error (check cross-format records) by @sfc-gh-nvytla in #2047
- Clear query params and set value from session state after refresh by @sfc-gh-chu in #2056
- [Bugfix] Update groundedness parallelization to use TruLens custom ThreadPoolExecutor by @sfc-gh-nvytla in #2059
- If we try to get
app
from aTruApp
and it doesn't yet exist, don't go into an infinite loop. by @sfc-gh-dkurokawa in #2055 - [Feat] Dedup feedback calls when given same metric name and inputs (for Streamlit UI) by @sfc-gh-nvytla in #2003
- Add initial UI tests for Streamlit components, interactions, and Leaderboard by @sfc-gh-nvytla in #2057
- Add tooltips for diff, change variance to standard deviation by @sfc-gh-nvytla in #2058
- Allow Snowflake to use non-account level event table. by @sfc-gh-dkurokawa in #2064
- Update feat/bug assignee to Nikhil for triage by @sfc-gh-nvytla in #2065
- [Bugfix] Update dedup logic for feedback calls to gracefully handle both OTel and non-OTel spans by @sfc-gh-nvytla in #2066
- Consider Record Root Span to be Main Method (Main Method required for Snowflake usage via runs) by @sfc-gh-dkurokawa in #2067
- Add makefile recipe to install all subpackages by @sfc-gh-chu in #2070
- Support python3.12 for snowflake-adjacent packages by @sfc-gh-chu in #2069
- Remove deprecated trulens-core extras by @sfc-gh-chu in #2068
Full Changelog: trulens-1.5.1...trulens-1.5.2
Trulens v1.5.1
What's Changed
- Clean up
TruSession
before tests as previous tests can interfere with each other. by @sfc-gh-dkurokawa in #2044 - Have eval spans hold all direct function calls. by @sfc-gh-dkurokawa in #2043
- Summit agentic eval demo - streamlit feedback pill and result display by @sfc-gh-dhuang in #2035
- OTel Blog + Examples for pre-release by @sfc-gh-jreini in #2037
- blog link fix by @sfc-gh-jreini in #2046
- Check for any invalid
Selector
when giving feedbacks to anApp
by @sfc-gh-dkurokawa in #2045 - [SNOW-2126669] Nit: Update cost delta with currency by @sfc-gh-nvytla in #2030
- Update tooltips for cost and latency based on delta value by @sfc-gh-nvytla in #2048
- Ensure backward compatibility of trulens_feedback streamlit pill by @sfc-gh-dhuang in #2049
- Stop having tests shutdown the span processor. by @sfc-gh-dkurokawa in #2051
- Safely drop columns as sometimes the "output" column isn't there. by @sfc-gh-dkurokawa in #2050
Full Changelog: trulens-1.5.0...trulens-1.5.1
Trulens v1.5.0
Telemetry for the Agentic World: TruLens + OpenTelemetry
Agents are rapidly gaining traction across AI applications. With this growth comes a new set of challenges: how do we trace, observe, and evaluate these dynamic, distributed systems?
Today, we’re excited to share that TruLens now supports OpenTelemetry (OTel), unlocking powerful, interoperable observability for the agentic world.
Challenge for Tracing Agents
Tracing agentic applications is fundamentally different from tracing traditional software systems:
- Language-agnostic: Agents can be written in Python, Go, Java, or more, requiring tracing that transcends language boundaries.
- Distributed by nature: Multi-agent systems often span multiple machines or processes.
- Existing telemetry stacks: Many developers and enterprises already use OpenTelemetry, so tracing compatibility is essential.
- Dynamic execution: Unlike traditional apps, agents often make decisions on the fly, with branching workflows that can’t be fully known in advance.
- Interoperability standards: As frameworks like Model Context Protocol (MCP) and Agent2Agent Protocol (A2A) emerge, tracing must support agents working across different systems.
- Repeated tool usage: Agents may call the same function or tool multiple times in a single execution trace, requiring fine-grained visibility into span grouping to understand what’s happening and why.
What is TruLens
TruLens is an open source library for evaluating and tracing AI agents, including RAG systems and other LLM applications. It combines OpenTelemetry-based tracing with trustworthy evaluations, including both ground truth metrics and reference-free (LLM-as-a-Judge) feedback.
TruLens pioneered the RAG Triad—a structured evaluation of:
- Context relevance
- Groundedness
- Answer relevance
These evaluations provide a foundation for understanding the performance of RAGs and agentic RAGs, supported by benchmarks like LLM-AggreFact, TREC-DL, and HotPotQA.
This combination of trusted evaluators and open standard tracing gives you tools to both improve your application offline and monitor once it reaches production.
How TruLens Augments OpenTelemetry
As AI applications become increasingly agentic, TruLens’ shift to OpenTelemetry enables observability that is:
- Interoperable with existing telemetry stacks
- Compatible across languages and frameworks
- Capable of tracing dynamic agent workflows
TruLens now accepts any span that adheres to the OTel standard.
What is OpenTelemetry?
OpenTelemetry (OTel) is an open-source observability framework for generating, collecting, and exporting telemetry data such as traces, metrics, and logs.
In LLM and agentic contexts, OpenTelemetry enables language-agnostic, interoperable tracing for:
- Multi-agent systems
- Distributed environments
- Tooling interoperability
What is a span?
A span represents a single unit of work. In LLM apps, this might be: planning, routing, retrieval, tool usage, or generation.
TruLens Defines Semantic Conventions for the Agentic World
TruLens maps span attributes to common definitions using semantic conventions to ensure:
- Cross-framework interoperability
- Shared instrumentation for MCP and A2A
- Consistent evaluation across implementations
Using Semantic Conventions to Compute Evaluation Metrics
TruLens allows evaluation of metrics based on span instrumentation.
@instrument(
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
},
)
def retrieve(self, query: str) -> list:
results = vector_store.query(query_texts=query, n_results=4)
return [doc for sublist in results["documents"] for doc in sublist]
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on_context(call_feedback_function_per_entry_in_list=True)
.aggregate(np.mean)
)
Computing Metrics on Complex Execution Flows
TruLens introduces span groups to handle repeated tool calls within a trace.
class App:
@instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
def clean_up_question(question: str, idx: str) -> str:
...
@instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
def clean_up_response(response: str, idx: str) -> str:
...
@instrument()
def combine_responses(cleaned_responses: List[str]) -> str:
...
@instrument()
def query(complex_question: str) -> str:
questions = break_question_down(complex_question)
cleaned_responses = []
for i, question in enumerate(questions):
cleaned_question = clean_up_question(question, str(i))
response = call_llm(cleaned_question)
cleaned_response = clean_up_response(response, str(i))
cleaned_responses.append(cleaned_response)
return combine_responses(cleaned_responses)
How to Examine Execution Flows in TruLens
Run:
session.run_dashboard()
…and visually inspect execution traces. Span types are shown directly in the dashboard to help identify branching, errors, or performance issues.
How to Get Started
- Install TruLens:
pip install trulens-core==1.5.0
- Enable OpenTelemetry:
os.environ["TRULENS_OTEL_TRACING"] = "1"
- Instrument Methods:
from trulens.core.otel.instrument import instrument
@instrument(
attributes={
SpanAttributes.RECORD_ROOT.INPUT: "query",
SpanAttributes.RECORD_ROOT.OUTPUT: "return",
},
)
def query(self, query: str) -> str:
context_str = self.retrieve(query=query)
completion = self.generate_completion(query=query, context_str=context_str)
return completion
- Add Evaluations:
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input()
.on_output()
)
Using selectors:
from trulens.core.feedback.selector import Selector
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on({
"prompt": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.INPUT,
),
})
.on({
"response": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.OUTPUT,
),
})
)
- Register Your App:
from trulens.apps.app import TruApp
rag = RAG(model_name="gpt-4.1-mini")
tru_rag = TruApp(
rag,
app_name="OTEL-RAG",
app_version="4.1-mini",
feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
- Run the Dashboard:
from trulens.dashboard import run_dashboard
run_dashboard(session)
Concluding Thoughts
By building on top of OpenTelemetry, TruLens delivers a universal tracing and evaluation platform for modern AI systems. Whether your agents are built in Python, composed via MCP, or distributed across systems—TruLens provides a common observability layer for telemetry and evaluation.
Let’s build the future of trustworthy agentic AI together.
Full Release Details
- Evaluate Weaviate Query Agents by @sfc-gh-jreini in #1896
- [SNOW-2005227, SNOW-2037060] Add HuggingFace pytest annotation and deflake E2E test issues by @sfc-gh-nvytla in #1907
- move notebooks that shouldn't be in quickstarts to experimental by @sfc-gh-jreini in #1908
- [SNOW-2005236, SNOW-2040963]: Deprecate outdated/obsolete notebooks by @sfc-gh-nvytla in #1909
- Fix
test_dummy.py
andtest_session.py
E2E tests. by @sfc-gh-dkurokawa in #1910 - snowflake ai stack by @sfc-gh-jreini in #1912
- [E2E] Revert temporary change for e2e testing by @sfc-gh-nvytla in #1913
- gpt-4.1 by @sfc-gh-jreini in #1914
- Update contribution guide by @sfc-gh-jreini in #1915
- Restore rag evaluation quickstart notebook by @sfc-gh-nvytla in #1917
- Fix
test_deprecation.py
. by @sfc-gh-dkurokawa in #1918 - Fix
test_serial.py
. by @sfc-gh-dkurokawa in #1916 - Set up algorithm for feedback selectors 2.0. by @sfc-gh-dkurokawa in #1905
- Nit: Flakiness fix for test_tru_llama by @sfc-gh-nvytla in #1919
- Add some tests I forgot to add in PR#1905 on feedback computation. by @sfc-gh-dkurokawa in #1920
- Add function name to span attribute
ai.observability.call.function
. by @sfc-gh-dkurokawa in #1922 - Fix "future" issue on E2E tests. by @sfc-gh-dkurokawa in #1921
- Add experimental langgraph multi-agent demo by @sfc-gh-dhuang in #1923
- Add dashboard GIFs to weaviate query agents notebook by @sfc-gh-jreini in #1925
- Track langgraph saved images with Git LFS, fix formatting nit by @sf...
TruLens 1.4.9
What's Changed
- Update poetry to trulens 1.4.8 by @sfc-gh-nvytla in #1898
- Doc string spelling by @cronoik-inceptionai in #1895
- [SNOW-2030130] Standardize proper names across TruLens docs by @sfc-gh-nvytla in #1899
- Add Dashboard GIF to Viewing Results Page + update pypi downloads icon to trulens-core by @sfc-gh-jreini in #1900
- [SNOW-2031737] Refresh symbolic links for notebooks to test by @sfc-gh-nvytla in #1901
- [SNOW-2005231, SNOW-2033054, SNOW-2033730] Deflake E2E pipeline issues by @sfc-gh-nvytla in #1904
- Handle missing cortex costtracking response by @sfc-gh-chu in #1902
New Contributors
- @cronoik-inceptionai made their first contribution in #1895
Full Changelog: trulens-1.4.8...trulens-1.4.9
Trulens v1.4.8
What's Changed
- Fix
tests/e2e/test_providers.py::TestProviders::test_llmcompletion_calibration
. by @sfc-gh-dkurokawa in #1881 - Fix azure openai e2e tests by @sfc-gh-dhuang in #1886
- Fix
test_endpoints.py
. by @sfc-gh-dkurokawa in #1883 - [Nit] Update maintainer list, fix broken link to contributing guide by @sfc-gh-nvytla in #1889
- Fix litellm e2e tests with Azure Openai by @sfc-gh-dhuang in #1887
- Add documentation on how to run single tests. by @sfc-gh-dkurokawa in #1891
- SDK: Add compute metric check to avoid re-computation and refactor run status flow by @sfc-gh-dhuang in #1869
- [Docs] Fix minor typos by @sfc-gh-nvytla in #1894
- SDK: Fix logic for ingestion E2E timeout by @sfc-gh-dhuang in #1885
New Contributors
- @sfc-gh-nvytla made their first contribution in #1889
Full Changelog: trulens-1.4.7...trulens-1.4.8
TruLens v1.4.7
What's Changed
- Fix some e2e test issues. by @sfc-gh-dkurokawa in #1875
- Add quickstart notebook for PuPr by @sfc-gh-dhuang in #1876
- Rename PuPr quickstart path to be lowercase by @sfc-gh-dhuang in #1877
- [SNOW-1967293] Fix cortex cost tracking - only tallying requests at the end of the streamed SSE by @sfc-gh-dhuang in #1878
- [SNOW-1983345] SDK: app deletion and delete current version by @sfc-gh-dhuang in #1879
- [SNOW-1956008] SDK: implement run.cancel() and cancellation logic by @sfc-gh-dhuang in #1880
Full Changelog: trulens-1.4.6...trulens-1.4.7