Type-safe LLM outputs across any provider. Track every call and its cost.
from covenance import ask_llm
review = ask_llm("Write a short review of Inception", model="gpt-4.1-nano")
is_positive = ask_llm(
"Is this review positive? '{review}'",
model="gemini-2.5-flash-lite",
response_type=bool)
print(is_positive) # True/False- Structured outputs that work - Same code, any provider. Pydantic models, primitives, lists, tuples.
- Zero routing code - Model name determines provider automatically (
gemini-*,claude-*,gpt-*) - Convenience - you get TPM (Token Per Minute) limit retries automatically, as well as if the LLM fails to return the type you have requested.
- Visibility: Know what you're calling and spending - Every call logged with token counts and cost.
print_usage()for totals,print_call_timeline()for a visual waterfall.
Install only the providers you need:
pip install covenance[openai] # OpenAI, Grok, OpenRouter
pip install covenance[anthropic] # Anthropic Claude
pip install covenance[google] # Google Gemini
pip install covenance[mistral] # Mistral
# Multiple providers
pip install covenance[openai,anthropic]
# All providers
pip install covenance[all]Pass response_type to get validated, typed results:
# Pydantic models
class Evaluation(BaseModel):
reasoning: str
is_correct: bool
result = ask_llm("Is 2+2=5?", model="gemini-2.5-flash-lite", response_type=Evaluation)
print(result.reasoning) # "2+2 equals 4, not 5"
print(result.is_correct) # False
# Primitives
answer = ask_llm("Is Python interpreted?", model="gpt-4.1-nano", response_type=bool)
print(answer) # True
# Collections
items = ask_llm("List 3 prime numbers", model="claude-sonnet-4-20250514", response_type=list[int])
print(items) # [2, 3, 5]Works identically across OpenAI, Gemini, Anthropic, Mistral, Grok, and OpenRouter.
Every call is recorded with token counts and cost:
from covenance import ask_llm, print_usage, print_call_timeline, get_records
ask_llm("Hello", model="gpt-4.1-nano")
ask_llm("Hello", model="gemini-2.5-flash-lite")
print_usage()
# ==================================================
# LLM Usage Summary (default client)
# ==================================================
# Calls: 2
# Tokens: 45 (In: 12, Out: 33)
# Cost: $0.0001
# Models: gemini/gemini-2.5-flash-lite, openai/gpt-4.1-nano
# Access individual records
for record in get_records():
print(f"{record.model}: {record.cost_usd}")Persist records by setting COVENANCE_RECORDS_DIR or calling set_llm_call_records_dir().
Visualize call sequences and parallelism in your terminal:
from covenance import print_call_timeline
print_call_timeline()
# LLM Call Timeline (4.4s total, 5 calls)
# |0s 4.4s|
# gpt-4.1-nano 1.3s |████████████████ |
# g2.5-flash-l 1.1s | ████████████ |
# g2.5-flash-l 1.1s | ████████████ |
# g2.5-flash-l 1.5s | ████████████████ |
# g2.5-flash-l 1.5s | █████████████████|Each line is a call, sorted by start time. Blocks show when each call was active - parallel calls appear as overlapping bars on different rows.
Run parallel LLM calls and integrate results for higher quality:
from covenance import llm_consensus
result = llm_consensus(
"Explain quantum entanglement",
model="gpt-4.1-nano",
response_type=Evaluation,
num_candidates=3, # 3 parallel calls + integration
)Provider is determined by model name prefix:
| Prefix | Provider |
|---|---|
gpt-*, o1-*, o3-* |
OpenAI |
gemini-* |
Google Gemini |
claude-* |
Anthropic |
mistral-*, codestral-* |
Mistral |
grok-* |
xAI Grok |
org/model (contains /) |
OpenRouter |
Providers differ in how they enforce JSON schema compliance:
| Provider | Method | Guarantee |
|---|---|---|
| OpenAI | Constrained decoding | 100% schema-valid JSON |
| Google Gemini | Controlled generation | 100% schema-valid JSON |
| Grok | Constrained decoding | 100% schema-valid JSON |
| Anthropic | Structured outputs beta | 100% schema-valid JSON* |
| Mistral | Best-effort | Probabilistic |
| OpenRouter | Varies | Depends on underlying model |
*Anthropic structured outputs requires SDK >= 0.74.1 (uses anthropic-beta: structured-outputs-2025-11-13). Mistral uses probabilistic generation. Covenance retries automatically (up to 3 times) on JSON parse errors for Mistral.
Set environment variables for the providers you use:
OPENAI_API_KEYGOOGLE_API_KEY(orGEMINI_API_KEY)ANTHROPIC_API_KEYMISTRAL_API_KEYOPENROUTER_API_KEYXAI_API_KEY(for Grok)
A .env file in the working directory is loaded automatically.
Use Covenance instances for separate API keys and call records per subsystem:
from covenance import Covenance
from pydantic import BaseModel
# Each client tracks its own usage
question_client = Covenance(label="questions")
review_client = Covenance(label="review")
answer = question_client.ask_llm("Who is David Blaine?", model="gpt-4.1-nano")
class Evaluation(BaseModel):
reasoning: str
is_correct: bool
eval = review_client.llm_consensus(
f"Is this accurate? '''{answer}'''",
model="gemini-2.5-flash-lite",
response_type=Evaluation,
)
question_client.print_usage() # Shows only the question call
review_client.print_usage() # Shows only the review callCovenance uses two backends for structured output and picks the better one per provider:
- Native SDK — calls the provider's API directly (e.g., OpenAI Responses API with
responses.parse) - pydantic-ai — uses pydantic-ai as a unified layer
The default routing:
| Provider | Backend | Why |
|---|---|---|
| OpenAI | Native | Responses API with constrained decoding handles enums, recursive types, and large schemas more reliably |
| Grok | Native | OpenAI-compatible API, same benefits |
| Gemini | pydantic-ai | Native SDK hits RecursionError on self-referencing types (e.g., tree nodes) |
| Anthropic | pydantic-ai | No native client implemented |
| Mistral | pydantic-ai | Similar pass rates; pydantic-ai handles recursive types better |
| OpenRouter | pydantic-ai | No native client implemented |
These defaults are based on a stress test suite that runs 14 test categories across providers with both backends. The results for the cheapest model per provider:
OpenAI (gpt-4.1-nano): native 14/14, pydantic-ai 10/14
Gemini (gemini-2.5-flash-lite): native 11/14, pydantic-ai 13/14
Mistral (mistral-small-latest): native 9/14, pydantic-ai 8/14
Where native beats pydantic-ai on OpenAI: enum adherence (strict values vs. hallucinated ones), recursive types (deeper trees), real-world schemas (fewer empty fields), and extreme schema limits (100+ fields with Literal types).
Where pydantic-ai beats native on Gemini: recursive/self-referencing types (native Google SDK crashes with RecursionError).
Each Covenance instance has a backends object with a field per provider. You can inspect and override them:
from covenance import Covenance
client = Covenance()
print(client.backends)
# Backends(native=[openai, grok], pydantic=[gemini, anthropic, mistral, openrouter])
# Override a specific provider
client.backends.anthropic = "native"
# Force all providers to one backend (useful for benchmarking)
client.backends.set_all("native")Only "native" and "pydantic" are accepted — anything else raises ValueError.
Every call records which backend was used:
for record in client.get_records():
print(f"{record.model}: {record.backend}") # "native" or "pydantic"The backend also shows in print_call_timeline() as (N) or (P):
print_call_timeline()
# LLM Call Timeline (2.1s total, 2 calls)
# |0s 2.1s|
# gpt-4.1-nano(N) 0.8s |█████████████████ |
# g2.5-flash-l(P) 1.1s | ██████████████████████████ |To see routing decisions in real time, enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
# DEBUG:covenance:ask_llm: model=gpt-4.1-nano provider=openai backend=native