- Install deps:
poetry install - Install with dev tools:
poetry install --with dev - Install with all optional packages:
poetry install --with dev,apps,providers - Install pre-commit hooks:
pre-commit install - Start documentation server:
make docs-serve
- Line length: 80 characters
- Formatter/linter:
ruff(runmake formatandmake lint) - Google-style docstrings
- Import modules, not classes directly:
# ✓ Do this from trulens.schema import record as record_schema from trulens.providers.openai import provider as openai_provider # ✗ Not this from trulens.schema.record import Record
- Standard module rename patterns:
from trulens.schema import X as X_schema from trulens.utils import X as X_utils from trulens.providers.X import provider as X_provider from trulens.apps.X import Y as Y_app from trulens.core import X as core_X from trulens.core.database import base as core_db from trulens.feedback.templates import rag as templates_rag from trulens.feedback.templates import base as templates_base
- Use
TYPE_CHECKINGblocks for type-only imports - Use
from __future__ import annotationsfor forward references - Call
model_rebuild()after Pydantic models with forward refs
- Run all unit tests:
make test-unit - Run single test file:
TEST_OPTIONAL=true poetry run pytest tests/unit/test_file.py -v - Run specific test:
TEST_OPTIONAL=true poetry run pytest tests/unit/test_file.py::TestClass::test_method - OTEL tests require isolation (uses pytest-xdist):
TEST_OPTIONAL=1 poetry run pytest tests/unit/test_otel*.py -n auto --dist=loadscope - Regenerate golden files:
WRITE_GOLDEN=1 TEST_OPTIONAL=1 poetry run pytest <test_path>
@pytest.mark.optional- requires optional dependencies@pytest.mark.snowflake- requires Snowflake credentials@pytest.mark.huggingface- requires HuggingFace access
Enable optional tests: TEST_OPTIONAL=true
- Format code:
make format - Lint code:
make lint - Build all packages:
make build - Build docs:
make docs - Update poetry locks:
make lock - Generate coverage:
make coverage
src/
├── core/ # trulens-core: Core abstractions, session, database
├── feedback/ # trulens-feedback: Feedback function implementations
│ ├── templates/ # Prompt template classes organized by domain:
│ │ ├── base.py # FeedbackTemplate base class, shared scaffolding
│ │ ├── rag.py # RAG evals (Groundedness, ContextRelevance, …)
│ │ ├── safety.py # Moderation (Harmfulness, Toxicity, …)
│ │ ├── quality.py # Text quality (Coherence, Sentiment, …)
│ │ └── agent.py # Agentic evals (ToolSelection, PlanAdherence, …)
│ ├── llm_provider.py # LLMProvider base — orchestrates template → LLM calls
│ ├── v2/ # Backward-compat shim (re-exports from templates/)
│ └── prompts.py # Backward-compat shim (re-exports from templates/)
├── dashboard/ # trulens-dashboard: Streamlit UI + React components
├── apps/ # App integrations (langchain, langgraph, llamaindex)
├── providers/ # LLM providers (openai, bedrock, cortex, huggingface, litellm)
├── connectors/ # Database connectors (snowflake)
└── otel/semconv/ # OpenTelemetry semantic conventions
from trulens.core import TruSession
session = TruSession() # Default SQLite
session = TruSession(database_url="postgresql://...")from trulens.apps.langchain import TruChain
tru_app = TruChain(chain, app_name="MyApp", app_version="v1", feedbacks=[...])
with tru_app as recording:
result = chain.invoke("query")Basic instrumentation - captures function args and return as span attributes:
from trulens.core.otel.instrument import instrument
@instrument()
def my_function():
pass # Automatically tracedUse span_type to categorize spans for semantic meaning:
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
@instrument(span_type=SpanAttributes.SpanType.RETRIEVAL)
def retrieve(self, query: str) -> list:
pass
@instrument(span_type=SpanAttributes.SpanType.GENERATION)
def generate(self, prompt: str) -> str:
passAvailable span types: RETRIEVAL, GENERATION, RERANKING, TOOL, AGENT, WORKFLOW, GRAPH_NODE, GRAPH_TASK, MCP, GUARDRAIL, RECORD_ROOT, EVAL_ROOT, UNKNOWN
Map function args/return to semantic attributes:
@instrument(
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query", # maps "query" arg
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return", # maps return value
},
)
def retrieve(self, query: str) -> list:
return ["context1", "context2"]Common attribute namespaces:
SpanAttributes.RETRIEVAL-QUERY_TEXT,RETRIEVED_CONTEXTS,NUM_CONTEXTSSpanAttributes.RECORD_ROOT-INPUT,OUTPUT,ERRORSpanAttributes.MCP-TOOL_NAME,SERVER_NAME,INPUT_ARGUMENTS,OUTPUT_CONTENTSpanAttributes.RERANKING-QUERY_TEXT,MODEL_NAME,TOP_N,INPUT_CONTEXT_TEXTS
For complex data extraction, use a lambda with signature (ret, exception, *args, **kwargs):
@instrument(
attributes=lambda ret, exception, *args, **kwargs: {
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: [doc["text"] for doc in ret],
SpanAttributes.RETRIEVAL.QUERY_TEXT: kwargs["query"].upper(),
}
)
def retrieve_contexts(self, query: str) -> list:
return [{"text": "ctx1", "source": "doc.pdf"}, {"text": "ctx2", "source": "doc2.pdf"}]Lambda parameters:
ret- function return valueexception- any exception raised (None if successful)*args- positional arguments**kwargs- keyword arguments (includes positional args by name)
Use instrument_method() when you can't modify source code:
from trulens.core.otel.instrument import instrument_method
from somepackage import CustomRetriever
instrument_method(
cls=CustomRetriever,
method_name="retrieve",
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
}
)Feedback functions evaluate LLM app quality. Must return float in [0.0, 1.0] or dict[str, float].
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
provider = OpenAI()
f_relevance = Feedback(provider.relevance_with_cot_reasons).on_input().on_output()Shortcuts:
on_input()- selectsRECORD_ROOT.INPUTon_output()- selectsRECORD_ROOT.OUTPUTon_context()- selectsRETRIEVAL.RETRIEVED_CONTEXTS
Use Selector to explicitly select instrumented span attributes for evaluation:
from trulens.core import Feedback
from trulens.core.feedback.selector import Selector
from trulens.otel.semconv.trace import SpanAttributes
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on({
"prompt": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.INPUT,
),
})
.on({
"response": Selector(
span_type=SpanAttributes.SpanType.RECORD_ROOT,
span_attribute=SpanAttributes.RECORD_ROOT.OUTPUT,
),
})
)collect_list=False- evaluate each context individually (for context relevance)collect_list=True- concatenate all contexts for single evaluation (for groundedness)
# Evaluate each retrieved context individually
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on({
"context": Selector(
span_type=SpanAttributes.SpanType.RETRIEVAL,
span_attribute=SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS,
collect_list=False
),
})
)
# Evaluate groundedness against all contexts combined
f_groundedness = (
Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
.on({
"context": Selector(
span_type=SpanAttributes.SpanType.RETRIEVAL,
span_attribute=SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS,
collect_list=True
),
})
.on_output()
)from trulens.core.experimental import Feature
session = TruSession(experimental_feature_flags=[Feature.OTEL_TRACING])- Create
src/providers/<name>/withpyproject.toml - Extend
trulens.feedback.LLMProvider - Implement
_create_chat_completion(self, prompt, messages, **kwargs) - Add endpoint class for API interactions
- Create
src/apps/<name>/withpyproject.toml - Extend
trulens.core.app.App - Define
Default.CLASSESandDefault.METHODSfor instrumentation - Implement
main_input()andmain_output()methods
- Add a prompt template class to the appropriate
templates/domain file (e.g.rag.py,safety.py)- Extend the relevant base (e.g.
Semantics,WithPrompt,CriteriaOutputSpaceMixin) - Define
system_prompt,user_prompt,criteria_template, andoutput_space
- Extend the relevant base (e.g.
- Add the evaluation method to the provider class or
LLMProvider - Return float [0, 1] or dict of floats
- Add Google-style docstring with example
- Add re-exports to
templates/__init__.py, and toprompts.py/v2/feedback.pyshims if needed for backward compat - Add tests
- Circular imports: Use
from __future__ import annotationsandTYPE_CHECKINGblocks - OTEL tests failing in batch: Install pytest-xdist (
poetry install --with dev) and usemake test-unit - Missing optional deps: TruLens uses lazy imports - install specific packages as needed