Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
import asyncio
import json
import os
import random
import time
from decimal import Decimal
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_mistralai.chat_models import ChatMistralAI
from langsmith import Client, traceable
load_dotenv()
SYSTEM_PROMPT = """You are a knowledgeable travel planning assistant. Your role is to help users plan detailed itineraries for trips around the world. You understand geography, local customs, transportation options, seasonal weather patterns, and popular attractions. You can suggest accommodations ranging from budget hostels to luxury resorts, recommend local cuisine and restaurants, and help optimize travel routes for efficiency and enjoyment. You always consider the traveler's budget, interests, physical abilities, and time constraints when making recommendations. You provide practical tips about visa requirements, currency exchange, and local etiquette. You prioritize authentic local experiences over tourist traps whenever possible."""
USER_MESSAGES = [
"""I'm planning a two-week trip to Japan in April. I'll be arriving in Tokyo and want to experience both the major cities and some rural areas. My budget is moderate — I'm happy to stay in a mix of traditional ryokans and business hotels. I'm particularly interested in cherry blossom viewing, traditional temples, and local food culture. I also enjoy hiking and would love to include at least one or two nature-focused days. Can you help me outline a rough itinerary that balances urban exploration with countryside experiences?""",
"""Thanks for the suggestions. I've decided to spend the first four days in Tokyo, then head to Hakone for a night before going to Kyoto. For Tokyo, I want to cover Shibuya, Shinjuku, Asakusa, and Akihabara at minimum. I'm also curious about the Tsukiji outer market for breakfast — is it still worth visiting or has everything moved to Toyosu? I heard the tuna auction is at the new location now. Also, for transportation between cities, should I get a 14-day Japan Rail Pass or would individual shinkansen tickets be more cost-effective for my specific route? I'll be traveling with one large suitcase and a backpack.""",
"""Great advice on the rail pass — I'll go with the 14-day option. Now for the Kyoto portion, I'm thinking of spending five days there using it as a base for day trips. I definitely want to visit Fushimi Inari early in the morning to avoid crowds, and I'd like to see Arashiyama bamboo grove as well. I've also heard that Nara is an easy day trip from Kyoto with the famous deer park and Todai-ji temple. For one of the days, I'm considering a day trip to Hiroshima and Miyajima Island — is that feasible in a single day from Kyoto? And what about the food scene in Kyoto — any must-try dishes that are specific to the Kansai region?""",
"""Perfect, the Hiroshima day trip sounds doable. For the final stretch of the trip, I'm torn between spending the last three days in Osaka or splitting them between Osaka and a more off-the-beaten-path destination like Kanazawa or Takayama. I love street food so Osaka's Dotonbori appeals to me, but I also want to avoid ending the trip in just another big city. Kanazawa's Kenroku-en garden and the samurai district sound fascinating, and Takayama's old town and morning markets seem charming. What would you recommend given that I'll already have had plenty of urban time in Tokyo and Kyoto? Also, are there any festivals or special events happening in late April in any of these areas?""",
]
CACHE_KEY = f"test-mistral-cache-demo-{random.randint(0, 1000000)}"
MODEL = "mistral-small-latest"
@traceable(name="mistral-cache-test", run_type="chain")
async def run_conversation():
model = ChatMistralAI(
name=MODEL,
api_key=os.getenv("MISTRAL_API_KEY"),
model=MODEL,
temperature=0.1,
)
messages = [SystemMessage(content=SYSTEM_PROMPT)]
all_responses = []
for i, user_msg in enumerate(USER_MESSAGES):
messages.append(HumanMessage(content=user_msg))
print(f"\n{'='*80}")
print(f"MESSAGE {i+1}")
print(f"{'='*80}")
response = await model.ainvoke(messages, prompt_cache_key=CACHE_KEY)
print(
json.dumps(
{
"content": response.content[:200] + "..."
if len(response.content) > 200
else response.content,
"usage_metadata": response.usage_metadata,
"response_metadata": response.response_metadata,
"type": response.type,
"id": response.id,
},
indent=2,
default=str,
)
)
all_responses.append(response)
messages.append(response)
return all_responses
PRICE_INPUT = 0.15 / 1_000_000 # $0.15 per 1M input tokens
PRICE_CACHED_INPUT = 0.015 / 1_000_000 # $0.015 per 1M cached input tokens
PRICE_OUTPUT = 0.6 / 1_000_000 # $0.60 per 1M output tokens
async def main():
responses = await run_conversation()
# --- Compute actual cost from Mistral response metadata ---
actual_total_input = 0
actual_total_cached = 0
actual_total_output = 0
print(f"\n\n{'='*80}")
print("ACTUAL TOKEN USAGE (from Mistral response_metadata)")
print(f"{'='*80}")
for i, resp in enumerate(responses):
token_usage = resp.response_metadata.get("token_usage", {})
prompt_tokens = token_usage.get("prompt_tokens", 0)
completion_tokens = token_usage.get("completion_tokens", 0)
cached_tokens = token_usage.get("prompt_tokens_details", {}).get("cached_tokens", 0)
non_cached_input = prompt_tokens - cached_tokens
actual_total_input += non_cached_input
actual_total_cached += cached_tokens
actual_total_output += completion_tokens
print(
f"\n Run {i+1}: input={prompt_tokens} (cached={cached_tokens}, non-cached={non_cached_input}), output={completion_tokens}"
)
actual_input_cost = actual_total_input * PRICE_INPUT
actual_cached_cost = actual_total_cached * PRICE_CACHED_INPUT
actual_output_cost = actual_total_output * PRICE_OUTPUT
actual_total_cost = actual_input_cost + actual_cached_cost + actual_output_cost
print(
f"\n Totals: non-cached input={actual_total_input}, cached input={actual_total_cached}, output={actual_total_output}"
)
print(
f" Cost: input=${actual_input_cost:.6f} + cached=${actual_cached_cost:.6f} + output=${actual_output_cost:.6f} = ${actual_total_cost:.6f}"
)
# --- Fetch Langsmith costs ---
print("\n\nWaiting 20s for traces to flush to Langsmith...")
time.sleep(20)
client = Client()
project_name = os.getenv("LANGSMITH_PROJECT", "default")
runs = list(
client.list_runs(
project_name=project_name,
filter='eq(name, "mistral-cache-test")',
limit=1,
)
)
if not runs:
print("ERROR: Could not find the parent trace in Langsmith")
return
parent_run = runs[0]
print(f"\nTrace ID: {parent_run.id}")
print(f"Trace URL: {parent_run.url}")
# Use trace_id to find all LLM runs in the trace tree
child_runs = list(
client.list_runs(
project_name=project_name,
trace_id=parent_run.trace_id,
run_type="llm",
)
)
child_runs.sort(key=lambda r: r.start_time)
ls_total_input_cost = Decimal(0)
ls_total_output_cost = Decimal(0)
ls_total_cost = Decimal(0)
print(f"\n{'='*80}")
print(f"LANGSMITH TOKEN USAGE & COSTS ({len(child_runs)} LLM runs)")
print(f"{'='*80}")
for i, run in enumerate(child_runs):
input_t = run.prompt_tokens or 0
output_t = run.completion_tokens or 0
run_total_cost = run.total_cost or Decimal(0)
run_input_cost = run.prompt_cost or Decimal(0)
run_output_cost = run.completion_cost or Decimal(0)
ls_total_cost += run_total_cost
ls_total_input_cost += run_input_cost
ls_total_output_cost += run_output_cost
print(f"\n Run {i+1}: input={input_t}, output={output_t}")
print(f" input_cost=${run_input_cost:.6f}, output_cost=${run_output_cost:.6f}, total_cost=${run_total_cost:.6f}")
print(
f"\n Langsmith totals: input_cost=${ls_total_input_cost:.6f}, output_cost=${ls_total_output_cost:.6f}, total=${ls_total_cost:.6f}"
)
# --- Comparison ---
print(f"\n\n{'='*80}")
print("COMPARISON: Actual (with cache pricing) vs Langsmith")
print(f"{'='*80}")
ls_in = float(ls_total_input_cost)
ls_out = float(ls_total_output_cost)
ls_tot = float(ls_total_cost)
print(f" {'':30s} {'Actual':>12s} {'Langsmith':>12s} {'Difference':>12s}")
print(f" {'Input cost':30s} ${actual_input_cost:>11.6f} ${ls_in:>11.6f} ${ls_in - actual_input_cost:>+11.6f}")
print(f" {'Cached input cost':30s} ${actual_cached_cost:>11.6f} {'$ N/A':>12s} {'':>12s}")
print(f" {'Output cost':30s} ${actual_output_cost:>11.6f} ${ls_out:>11.6f} ${ls_out - actual_output_cost:>+11.6f}")
print(f" {'TOTAL':30s} ${actual_total_cost:>11.6f} ${ls_tot:>11.6f} ${ls_tot - actual_total_cost:>+11.6f}")
if ls_tot and actual_total_cost:
overcharge_pct = (ls_tot - actual_total_cost) / actual_total_cost * 100
print(f"\n Langsmith overestimates cost by {overcharge_pct:+.1f}% (treats cached tokens as full-price input)")
elif not ls_tot:
print("\n NOTE: Langsmith returned $0 total cost — it may not have pricing data for this Mistral model yet.")
if __name__ == "__main__":
asyncio.run(main())
Error Message and Stack Trace (if applicable)
Description
Token usage reporting in ChatMistralAI output is deprecated (I think), and LangSmith is failing to parse the "cached_tokens" field.
I appended a Python script that displays cached tokens in sequential invokes, and compares costs calculated manually vs Langsmith.
This means that I cannot get correct price information in LangSmith, and the problem stems from the output format that comes from ChatMistralAI.
Here is the model pricing configuration from Langsmith:

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 25.4.0: Thu Mar 19 19:33:25 PDT 2026; root:xnu-12377.101.15~1/RELEASE_ARM64_T6041
Python Version: 3.11.9 (v3.11.9:de54cf5be3, Apr 2 2024, 07:12:50) [Clang 13.0.0 (clang-1300.0.29.30)]
Package Information
langchain_core: 1.4.8
langchain: 1.2.15
langchain_community: 0.4.1
langsmith: 0.7.32
langchain_anthropic: 1.4.1
langchain_classic: 1.0.4
langchain_google_genai: 4.2.2
langchain_google_vertexai: 3.2.2
langchain_mistralai: 1.1.5
langchain_ollama: 1.1.0
langchain_openai: 1.1.14
langchain_protocol: 0.0.18
langchain_tests: 1.1.6
langchain_text_splitters: 1.1.2
langgraph_sdk: 0.3.13
Optional packages not installed
deepagents
deepagents-cli
Other Dependencies
aiohttp: 3.13.5
anthropic: 0.96.0
bottleneck: 1.6.0
claude-agent-sdk: 0.1.48
dataclasses-json: 0.6.7
filetype: 1.2.0
google-cloud-aiplatform: 1.148.0
google-cloud-storage: 3.10.1
google-cloud-vectorsearch: 0.10.0
google-genai: 1.73.1
httpx: 0.28.1
httpx-sse: 0.4.3
jsonpatch: 1.33
langgraph: 1.1.7
numexpr: 2.14.1
numpy: 2.4.4
ollama: 0.6.1
openai: 2.32.0
opentelemetry-api: 1.41.0
opentelemetry-exporter-otlp-proto-http: 1.41.0
opentelemetry-sdk: 1.41.0
orjson: 3.11.8
packaging: 25.0
pyarrow: 22.0.0
pydantic: 2.13.2
pydantic-settings: 2.13.1
pytest: 8.4.2
pytest-asyncio: 1.3.0
pytest-benchmark: 5.2.3
pytest-codspeed: 4.4.0
pytest-recording: 0.13.4
pytest-socket: 0.7.0
pyyaml: 6.0.3
PyYAML: 6.0.3
requests: 2.33.1
requests-toolbelt: 1.0.0
rich: 14.3.4
sqlalchemy: 2.0.49
SQLAlchemy: 2.0.49
syrupy: 5.1.0
tenacity: 9.1.4
tiktoken: 0.12.0
tokenizers: 0.23.1
typing-extensions: 4.15.0
uuid-utils: 0.14.1
validators: 0.35.0
vcrpy: 8.1.1
websockets: 16.0
wrapt: 1.17.3
xxhash: 3.6.0
zstandard: 0.23.0
Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
Error Message and Stack Trace (if applicable)
Description
Token usage reporting in ChatMistralAI output is deprecated (I think), and LangSmith is failing to parse the "cached_tokens" field.
I appended a Python script that displays cached tokens in sequential invokes, and compares costs calculated manually vs Langsmith.
This means that I cannot get correct price information in LangSmith, and the problem stems from the output format that comes from ChatMistralAI.
Here is the model pricing configuration from Langsmith:

System Info
System Information
Package Information
Optional packages not installed
Other Dependencies