Run LangSmith evaluate() on a full dataset while only tracing a configurable subset of runs to LangSmith.
- All dataset examples are evaluated, with complete results available locally
- Only a configurable subset of runs (and their associated evaluator traces + feedback) are sent to LangSmith
- For any run that is traced, its evaluator traces and feedback are correlated and inspectable in the UI
A two-pass approach:
- Partition the dataset into a traced subset and an untraced remainder based on a sample rate
- Pass 1 —
evaluate(..., upload_results=True)on the subset. Runs, evaluator traces, and feedback all land in LangSmith, fully correlated - Pass 2 —
evaluate(..., upload_results=False)on the remainder. Everything runs locally, nothing is sent to LangSmith - Merge both result sets so the caller has the full evaluation locally
uv syncCreate a .env file with your LangSmith credentials:
LANGSMITH_API_KEY=lsv2_...
Create the sample dataset (20 Q&A examples):
uv run python create_dataset.pyRun the evaluation:
uv run python main.pySet these in .env or as environment variables:
| Variable | Default | Description |
|---|---|---|
TRACING_SAMPLE_RATE |
0.1 |
Fraction of examples to trace (0.0–1.0) |
DATASET_NAME |
my-dataset |
LangSmith dataset name |
EXPERIMENT_PREFIX |
subset-tracing |
Prefix for experiments in LangSmith |
LANGSMITH_API_KEY |
— | Your LangSmith API key |