covenance

Type-safe LLM outputs across any provider. Track every call and its cost.

from covenance import ask_llm

review = ask_llm("Write a short review of Inception", model="gpt-4.1-nano")
is_positive = ask_llm(
    "Is this review positive? '{review}'", 
    model="gemini-2.5-flash-lite", 
    response_type=bool)
print(is_positive)  # True/False

Usecases

Structured outputs that work - Same code, any provider. Pydantic models, primitives, lists, tuples.
Zero routing code - Model name determines provider automatically (gemini-*, claude-*, gpt-*)
Convenience - you get TPM (Token Per Minute) limit retries automatically, as well as if the LLM fails to return the type you have requested.
Visibility: Know what you're calling and spending - Every call logged with token counts and cost. print_usage() for totals, print_call_timeline() for a visual waterfall.

Installation

Install only the providers you need:

pip install covenance[openai]      # OpenAI, Grok, OpenRouter
pip install covenance[anthropic]   # Anthropic Claude
pip install covenance[google]      # Google Gemini
pip install covenance[mistral]     # Mistral

# Multiple providers
pip install covenance[openai,anthropic]

# All providers
pip install covenance[all]

Structured outputs

Pass response_type to get validated, typed results:

# Pydantic models
class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

result = ask_llm("Is 2+2=5?", model="gemini-2.5-flash-lite", response_type=Evaluation)
print(result.reasoning)  # "2+2 equals 4, not 5"
print(result.is_correct)  # False

# Primitives
answer = ask_llm("Is Python interpreted?", model="gpt-4.1-nano", response_type=bool)
print(answer)  # True

# Collections
items = ask_llm("List 3 prime numbers", model="claude-sonnet-4-20250514", response_type=list[int])
print(items)  # [2, 3, 5]

Works identically across OpenAI, Gemini, Anthropic, Mistral, Grok, and OpenRouter.

Cost tracking

Every call is recorded with token counts and cost:

from covenance import ask_llm, print_usage, print_call_timeline, get_records

ask_llm("Hello", model="gpt-4.1-nano")
ask_llm("Hello", model="gemini-2.5-flash-lite")

print_usage()
# ==================================================
# LLM Usage Summary (default client)
# ==================================================
#   Calls: 2
#   Tokens: 45 (In: 12, Out: 33)
#   Cost: $0.0001
#   Models: gemini/gemini-2.5-flash-lite, openai/gpt-4.1-nano

# Access individual records
for record in get_records():
    print(f"{record.model}: {record.cost_usd}")

Persist records by setting COVENANCE_RECORDS_DIR or calling set_llm_call_records_dir().

Call timeline

Visualize call sequences and parallelism in your terminal:

from covenance import print_call_timeline

print_call_timeline()
# LLM Call Timeline (4.4s total, 5 calls)
#                         |0s                                            4.4s|
#   gpt-4.1-nano    1.3s  |████████████████                                  |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.5s  |                 ████████████████                 |
#   g2.5-flash-l    1.5s  |                                 █████████████████|

Each line is a call, sorted by start time. Blocks show when each call was active - parallel calls appear as overlapping bars on different rows.

Consensus for quality

Run parallel LLM calls and integrate results for higher quality:

from covenance import llm_consensus

result = llm_consensus(
    "Explain quantum entanglement",
    model="gpt-4.1-nano",
    response_type=Evaluation,
    num_candidates=3,  # 3 parallel calls + integration
)

Supported providers

Provider is determined by model name prefix:

Prefix	Provider
`gpt-`, `o1-`, `o3-*`	OpenAI
`gemini-*`	Google Gemini
`claude-*`	Anthropic
`mistral-`, `codestral-`	Mistral
`grok-*`	xAI Grok
`org/model` (contains `/`)	OpenRouter

Structured output reliability

Providers differ in how they enforce JSON schema compliance:

Provider	Method	Guarantee
OpenAI	Constrained decoding	100% schema-valid JSON
Google Gemini	Controlled generation	100% schema-valid JSON
Grok	Constrained decoding	100% schema-valid JSON
Anthropic	Structured outputs beta	100% schema-valid JSON*
Mistral	Best-effort	Probabilistic
OpenRouter	Varies	Depends on underlying model

*Anthropic structured outputs requires SDK >= 0.74.1 (uses anthropic-beta: structured-outputs-2025-11-13). Mistral uses probabilistic generation. Covenance retries automatically (up to 3 times) on JSON parse errors for Mistral.

API keys

Set environment variables for the providers you use:

OPENAI_API_KEY
GOOGLE_API_KEY (or GEMINI_API_KEY)
ANTHROPIC_API_KEY
MISTRAL_API_KEY
OPENROUTER_API_KEY
XAI_API_KEY (for Grok)

A .env file in the working directory is loaded automatically.

Isolated clients

Use Covenance instances for separate API keys and call records per subsystem:

from covenance import Covenance
from pydantic import BaseModel

# Each client tracks its own usage
question_client = Covenance(label="questions")
review_client = Covenance(label="review")

answer = question_client.ask_llm("Who is David Blaine?", model="gpt-4.1-nano")

class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

eval = review_client.llm_consensus(
    f"Is this accurate? '''{answer}'''",
    model="gemini-2.5-flash-lite",
    response_type=Evaluation,
)

question_client.print_usage()  # Shows only the question call
review_client.print_usage()    # Shows only the review call

How it works: dual backend

Covenance uses two backends for structured output and picks the better one per provider:

Native SDK — calls the provider's API directly (e.g., OpenAI Responses API with responses.parse)
pydantic-ai — uses pydantic-ai as a unified layer

The default routing:

Provider	Backend	Why
OpenAI	Native	Responses API with constrained decoding handles enums, recursive types, and large schemas more reliably
Grok	Native	OpenAI-compatible API, same benefits
Gemini	pydantic-ai	Native SDK hits `RecursionError` on self-referencing types (e.g., tree nodes)
Anthropic	pydantic-ai	No native client implemented
Mistral	pydantic-ai	Similar pass rates; pydantic-ai handles recursive types better
OpenRouter	pydantic-ai	No native client implemented

These defaults are based on a stress test suite that runs 14 test categories across providers with both backends. The results for the cheapest model per provider:

OpenAI  (gpt-4.1-nano):          native 14/14, pydantic-ai 10/14
Gemini  (gemini-2.5-flash-lite): native 11/14, pydantic-ai 13/14
Mistral (mistral-small-latest):  native  9/14, pydantic-ai  8/14

Where native beats pydantic-ai on OpenAI: enum adherence (strict values vs. hallucinated ones), recursive types (deeper trees), real-world schemas (fewer empty fields), and extreme schema limits (100+ fields with Literal types).

Where pydantic-ai beats native on Gemini: recursive/self-referencing types (native Google SDK crashes with RecursionError).

Overriding the backend

Each Covenance instance has a backends object with a field per provider. You can inspect and override them:

from covenance import Covenance

client = Covenance()
print(client.backends)
# Backends(native=[openai, grok], pydantic=[gemini, anthropic, mistral, openrouter])

# Override a specific provider
client.backends.anthropic = "native"

# Force all providers to one backend (useful for benchmarking)
client.backends.set_all("native")

Only "native" and "pydantic" are accepted — anything else raises ValueError.

Every call records which backend was used:

for record in client.get_records():
    print(f"{record.model}: {record.backend}")  # "native" or "pydantic"

The backend also shows in print_call_timeline() as (N) or (P):

print_call_timeline()
# LLM Call Timeline (2.1s total, 2 calls)
#                            |0s                                       2.1s|
#   gpt-4.1-nano(N)    0.8s  |█████████████████                            |
#   g2.5-flash-l(P)    1.1s  |                  ██████████████████████████  |

To see routing decisions in real time, enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)
# DEBUG:covenance:ask_llm: model=gpt-4.1-nano provider=openai backend=native

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
covenance		covenance
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DEV.md		DEV.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

covenance

Usecases

Installation

Structured outputs

Cost tracking

Call timeline

Consensus for quality

Supported providers

Structured output reliability

API keys

Isolated clients

How it works: dual backend

Overriding the backend

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

ikamensh/covenance

Folders and files

Latest commit

History

Repository files navigation

covenance

Usecases

Installation

Structured outputs

Cost tracking

Call timeline

Consensus for quality

Supported providers

Structured output reliability

API keys

Isolated clients

How it works: dual backend

Overriding the backend

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages