Skip to content

Releases: truera/trulens

TruLens v2.1.2

31 Jul 18:09

Choose a tag to compare

What's Changed

Full Changelog: trulens-2.1.1...trulens-2.1.2

TruLens v2.1.1

25 Jul 00:37

Choose a tag to compare

What's Changed

Full Changelog: trulens-2.1.0...trulens-2.1.1

TruLens v2.1.0

22 Jul 22:07

Choose a tag to compare

TruLens v2.1

TruLens 2.1 includes a number of new features and bug fixes to support tracing and evaluation of agents including Inline evals, trajectory evals, native LangGraph instrumentation (via TruGraph). Additionally, we made a variety of stability improvements to evaluators benefiting both OSS and Snowflake users including structured output support and shifting to new more stable serverside metric computation in Snowflake.

New Features

Bug Fixes

Docs

Examples

New Contributors

Full Changelog: trulens-1.5.3...trulens-2.1.0

TruLens v1.5.3

01 Jul 20:47
26f7466

Choose a tag to compare

What's Changed

Full Changelog: trulens-1.5.2...trulens-1.5.3

TruLens v1.5.2

18 Jun 22:04
a09ae2b

Choose a tag to compare

What's Changed

Full Changelog: trulens-1.5.1...trulens-1.5.2

Trulens v1.5.1

05 Jun 22:51

Choose a tag to compare

What's Changed

Full Changelog: trulens-1.5.0...trulens-1.5.1

Trulens v1.5.0

02 Jun 18:53

Choose a tag to compare

Telemetry for the Agentic World: TruLens + OpenTelemetry

Agents are rapidly gaining traction across AI applications. With this growth comes a new set of challenges: how do we trace, observe, and evaluate these dynamic, distributed systems?
Today, we’re excited to share that TruLens now supports OpenTelemetry (OTel), unlocking powerful, interoperable observability for the agentic world.


Challenge for Tracing Agents

Tracing agentic applications is fundamentally different from tracing traditional software systems:

  • Language-agnostic: Agents can be written in Python, Go, Java, or more, requiring tracing that transcends language boundaries.
  • Distributed by nature: Multi-agent systems often span multiple machines or processes.
  • Existing telemetry stacks: Many developers and enterprises already use OpenTelemetry, so tracing compatibility is essential.
  • Dynamic execution: Unlike traditional apps, agents often make decisions on the fly, with branching workflows that can’t be fully known in advance.
  • Interoperability standards: As frameworks like Model Context Protocol (MCP) and Agent2Agent Protocol (A2A) emerge, tracing must support agents working across different systems.
  • Repeated tool usage: Agents may call the same function or tool multiple times in a single execution trace, requiring fine-grained visibility into span grouping to understand what’s happening and why.

What is TruLens

TruLens is an open source library for evaluating and tracing AI agents, including RAG systems and other LLM applications. It combines OpenTelemetry-based tracing with trustworthy evaluations, including both ground truth metrics and reference-free (LLM-as-a-Judge) feedback.

TruLens pioneered the RAG Triad—a structured evaluation of:

  • Context relevance
  • Groundedness
  • Answer relevance

These evaluations provide a foundation for understanding the performance of RAGs and agentic RAGs, supported by benchmarks like LLM-AggreFact, TREC-DL, and HotPotQA.

This combination of trusted evaluators and open standard tracing gives you tools to both improve your application offline and monitor once it reaches production.


How TruLens Augments OpenTelemetry

As AI applications become increasingly agentic, TruLens’ shift to OpenTelemetry enables observability that is:

  • Interoperable with existing telemetry stacks
  • Compatible across languages and frameworks
  • Capable of tracing dynamic agent workflows

TruLens now accepts any span that adheres to the OTel standard.


What is OpenTelemetry?

OpenTelemetry (OTel) is an open-source observability framework for generating, collecting, and exporting telemetry data such as traces, metrics, and logs.

In LLM and agentic contexts, OpenTelemetry enables language-agnostic, interoperable tracing for:

  • Multi-agent systems
  • Distributed environments
  • Tooling interoperability

What is a span?
A span represents a single unit of work. In LLM apps, this might be: planning, routing, retrieval, tool usage, or generation.


TruLens Defines Semantic Conventions for the Agentic World

TruLens maps span attributes to common definitions using semantic conventions to ensure:

  • Cross-framework interoperability
  • Shared instrumentation for MCP and A2A
  • Consistent evaluation across implementations

Using Semantic Conventions to Compute Evaluation Metrics

TruLens allows evaluation of metrics based on span instrumentation.

@instrument(
    span_type=SpanAttributes.SpanType.RETRIEVAL,
    attributes={
        SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
        SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
    },
)
def retrieve(self, query: str) -> list:
    results = vector_store.query(query_texts=query, n_results=4)
    return [doc for sublist in results["documents"] for doc in sublist]
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on_context(call_feedback_function_per_entry_in_list=True)
    .aggregate(np.mean)
)

Computing Metrics on Complex Execution Flows

TruLens introduces span groups to handle repeated tool calls within a trace.

class App:

    @instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
    def clean_up_question(question: str, idx: str) -> str:
        ...

    @instrument(attributes={SpanAttributes.SPAN_GROUPS: "idx"})
    def clean_up_response(response: str, idx: str) -> str:
        ...

    @instrument()
    def combine_responses(cleaned_responses: List[str]) -> str:
        ...

    @instrument()
    def query(complex_question: str) -> str:
        questions = break_question_down(complex_question)
        cleaned_responses = []
        for i, question in enumerate(questions):
            cleaned_question = clean_up_question(question, str(i))
            response = call_llm(cleaned_question)
            cleaned_response = clean_up_response(response, str(i))
            cleaned_responses.append(cleaned_response)
        return combine_responses(cleaned_responses)

How to Examine Execution Flows in TruLens

Run:

session.run_dashboard()

…and visually inspect execution traces. Span types are shown directly in the dashboard to help identify branching, errors, or performance issues.


How to Get Started

  1. Install TruLens:
pip install trulens-core==1.5.0
  1. Enable OpenTelemetry:
os.environ["TRULENS_OTEL_TRACING"] = "1"
  1. Instrument Methods:
from trulens.core.otel.instrument import instrument

@instrument(
    attributes={
        SpanAttributes.RECORD_ROOT.INPUT: "query",
        SpanAttributes.RECORD_ROOT.OUTPUT: "return",
    },
)
def query(self, query: str) -> str:
    context_str = self.retrieve(query=query)
    completion = self.generate_completion(query=query, context_str=context_str)
    return completion
  1. Add Evaluations:
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input()
    .on_output()
)

Using selectors:

from trulens.core.feedback.selector import Selector

f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on({
        "prompt": Selector(
            span_type=SpanAttributes.SpanType.RECORD_ROOT,
            span_attribute=SpanAttributes.RECORD_ROOT.INPUT,
        ),
    })
    .on({
        "response": Selector(
            span_type=SpanAttributes.SpanType.RECORD_ROOT,
            span_attribute=SpanAttributes.RECORD_ROOT.OUTPUT,
        ),
    })
)
  1. Register Your App:
from trulens.apps.app import TruApp

rag = RAG(model_name="gpt-4.1-mini")

tru_rag = TruApp(
    rag,
    app_name="OTEL-RAG",
    app_version="4.1-mini",
    feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
  1. Run the Dashboard:
from trulens.dashboard import run_dashboard

run_dashboard(session)

Concluding Thoughts

By building on top of OpenTelemetry, TruLens delivers a universal tracing and evaluation platform for modern AI systems. Whether your agents are built in Python, composed via MCP, or distributed across systems—TruLens provides a common observability layer for telemetry and evaluation.

Let’s build the future of trustworthy agentic AI together.

Full Release Details

Read more

TruLens 1.4.9

11 Apr 17:07
bf8bd9c

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: trulens-1.4.8...trulens-1.4.9

Trulens v1.4.8

03 Apr 20:09

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: trulens-1.4.7...trulens-1.4.8

TruLens v1.4.7

20 Mar 19:52

Choose a tag to compare

What's Changed

Full Changelog: trulens-1.4.6...trulens-1.4.7