Skip to content

ikamensh/covenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

covenance

PyPI version Tests codecov

Type-safe LLM outputs across any provider. Track every call and its cost.

from covenance import ask_llm

review = ask_llm("Write a short review of Inception", model="gpt-4.1-nano")
is_positive = ask_llm(
    "Is this review positive? '{review}'", 
    model="gemini-2.5-flash-lite", 
    response_type=bool)
print(is_positive)  # True/False

Usecases

  • Structured outputs that work - Same code, any provider. Pydantic models, primitives, lists, tuples.
  • Zero routing code - Model name determines provider automatically (gemini-*, claude-*, gpt-*)
  • Convenience - you get TPM (Token Per Minute) limit retries automatically, as well as if the LLM fails to return the type you have requested.
  • Visibility: Know what you're calling and spending - Every call logged with token counts and cost. print_usage() for totals, print_call_timeline() for a visual waterfall.

Installation

Install only the providers you need:

pip install covenance[openai]      # OpenAI, Grok, OpenRouter
pip install covenance[anthropic]   # Anthropic Claude
pip install covenance[google]      # Google Gemini
pip install covenance[mistral]     # Mistral

# Multiple providers
pip install covenance[openai,anthropic]

# All providers
pip install covenance[all]

Structured outputs

Pass response_type to get validated, typed results:

# Pydantic models
class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

result = ask_llm("Is 2+2=5?", model="gemini-2.5-flash-lite", response_type=Evaluation)
print(result.reasoning)  # "2+2 equals 4, not 5"
print(result.is_correct)  # False

# Primitives
answer = ask_llm("Is Python interpreted?", model="gpt-4.1-nano", response_type=bool)
print(answer)  # True

# Collections
items = ask_llm("List 3 prime numbers", model="claude-sonnet-4-20250514", response_type=list[int])
print(items)  # [2, 3, 5]

Works identically across OpenAI, Gemini, Anthropic, Mistral, Grok, and OpenRouter.

Cost tracking

Every call is recorded with token counts and cost:

from covenance import ask_llm, print_usage, print_call_timeline, get_records

ask_llm("Hello", model="gpt-4.1-nano")
ask_llm("Hello", model="gemini-2.5-flash-lite")

print_usage()
# ==================================================
# LLM Usage Summary (default client)
# ==================================================
#   Calls: 2
#   Tokens: 45 (In: 12, Out: 33)
#   Cost: $0.0001
#   Models: gemini/gemini-2.5-flash-lite, openai/gpt-4.1-nano

# Access individual records
for record in get_records():
    print(f"{record.model}: {record.cost_usd}")

Persist records by setting COVENANCE_RECORDS_DIR or calling set_llm_call_records_dir().

Call timeline

Visualize call sequences and parallelism in your terminal:

from covenance import print_call_timeline

print_call_timeline()
# LLM Call Timeline (4.4s total, 5 calls)
#                         |0s                                            4.4s|
#   gpt-4.1-nano    1.3s  |████████████████                                  |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.1s  |                 ████████████                     |
#   g2.5-flash-l    1.5s  |                 ████████████████                 |
#   g2.5-flash-l    1.5s  |                                 █████████████████|

Each line is a call, sorted by start time. Blocks show when each call was active - parallel calls appear as overlapping bars on different rows.

Consensus for quality

Run parallel LLM calls and integrate results for higher quality:

from covenance import llm_consensus

result = llm_consensus(
    "Explain quantum entanglement",
    model="gpt-4.1-nano",
    response_type=Evaluation,
    num_candidates=3,  # 3 parallel calls + integration
)

Supported providers

Provider is determined by model name prefix:

Prefix Provider
gpt-*, o1-*, o3-* OpenAI
gemini-* Google Gemini
claude-* Anthropic
mistral-*, codestral-* Mistral
grok-* xAI Grok
org/model (contains /) OpenRouter

Structured output reliability

Providers differ in how they enforce JSON schema compliance:

Provider Method Guarantee
OpenAI Constrained decoding 100% schema-valid JSON
Google Gemini Controlled generation 100% schema-valid JSON
Grok Constrained decoding 100% schema-valid JSON
Anthropic Structured outputs beta 100% schema-valid JSON*
Mistral Best-effort Probabilistic
OpenRouter Varies Depends on underlying model

*Anthropic structured outputs requires SDK >= 0.74.1 (uses anthropic-beta: structured-outputs-2025-11-13). Mistral uses probabilistic generation. Covenance retries automatically (up to 3 times) on JSON parse errors for Mistral.

API keys

Set environment variables for the providers you use:

  • OPENAI_API_KEY
  • GOOGLE_API_KEY (or GEMINI_API_KEY)
  • ANTHROPIC_API_KEY
  • MISTRAL_API_KEY
  • OPENROUTER_API_KEY
  • XAI_API_KEY (for Grok)

A .env file in the working directory is loaded automatically.

Isolated clients

Use Covenance instances for separate API keys and call records per subsystem:

from covenance import Covenance
from pydantic import BaseModel

# Each client tracks its own usage
question_client = Covenance(label="questions")
review_client = Covenance(label="review")

answer = question_client.ask_llm("Who is David Blaine?", model="gpt-4.1-nano")

class Evaluation(BaseModel):
    reasoning: str
    is_correct: bool

eval = review_client.llm_consensus(
    f"Is this accurate? '''{answer}'''",
    model="gemini-2.5-flash-lite",
    response_type=Evaluation,
)

question_client.print_usage()  # Shows only the question call
review_client.print_usage()    # Shows only the review call

How it works: dual backend

Covenance uses two backends for structured output and picks the better one per provider:

  • Native SDK — calls the provider's API directly (e.g., OpenAI Responses API with responses.parse)
  • pydantic-ai — uses pydantic-ai as a unified layer

The default routing:

Provider Backend Why
OpenAI Native Responses API with constrained decoding handles enums, recursive types, and large schemas more reliably
Grok Native OpenAI-compatible API, same benefits
Gemini pydantic-ai Native SDK hits RecursionError on self-referencing types (e.g., tree nodes)
Anthropic pydantic-ai No native client implemented
Mistral pydantic-ai Similar pass rates; pydantic-ai handles recursive types better
OpenRouter pydantic-ai No native client implemented

These defaults are based on a stress test suite that runs 14 test categories across providers with both backends. The results for the cheapest model per provider:

OpenAI  (gpt-4.1-nano):          native 14/14, pydantic-ai 10/14
Gemini  (gemini-2.5-flash-lite): native 11/14, pydantic-ai 13/14
Mistral (mistral-small-latest):  native  9/14, pydantic-ai  8/14

Where native beats pydantic-ai on OpenAI: enum adherence (strict values vs. hallucinated ones), recursive types (deeper trees), real-world schemas (fewer empty fields), and extreme schema limits (100+ fields with Literal types).

Where pydantic-ai beats native on Gemini: recursive/self-referencing types (native Google SDK crashes with RecursionError).

Overriding the backend

Each Covenance instance has a backends object with a field per provider. You can inspect and override them:

from covenance import Covenance

client = Covenance()
print(client.backends)
# Backends(native=[openai, grok], pydantic=[gemini, anthropic, mistral, openrouter])

# Override a specific provider
client.backends.anthropic = "native"

# Force all providers to one backend (useful for benchmarking)
client.backends.set_all("native")

Only "native" and "pydantic" are accepted — anything else raises ValueError.

Every call records which backend was used:

for record in client.get_records():
    print(f"{record.model}: {record.backend}")  # "native" or "pydantic"

The backend also shows in print_call_timeline() as (N) or (P):

print_call_timeline()
# LLM Call Timeline (2.1s total, 2 calls)
#                            |0s                                       2.1s|
#   gpt-4.1-nano(N)    0.8s  |█████████████████                            |
#   g2.5-flash-l(P)    1.1s  |                  ██████████████████████████  |

To see routing decisions in real time, enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)
# DEBUG:covenance:ask_llm: model=gpt-4.1-nano provider=openai backend=native

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors