api_llm Specification

Goal

Provide direct, transparent HTTP API bindings for major LLM providers without abstraction layers or automatic behaviors.

Vision

A collection of thin API clients that developers can confidently use, knowing exactly what HTTP calls are being made and having complete control over all operations.

Scope

In Scope

Direct API Bindings - HTTP clients for LLM provider APIs
Enterprise Features - Optional reliability features (retry, circuit breaker, rate limiting, caching, etc.)
Workspace Secrets - Local API key management for development
Comprehensive Testing - Real API integration tests with zero-tolerance policy

Out of Scope

Provider Abstraction - No unified interface across providers
Provider Switching - No automatic fallback or routing logic
Service Layer - No proxy services or aggregation layers
Application Modules - No CLI tools or high-level applications

Architecture

Governing Principle: Thin Client, Rich API

All API bindings follow these principles:

API Transparency - Every method maps directly to an API endpoint
Zero Client Intelligence - No automatic decision-making
Explicit Control - Developers control all operations
Information vs Action - Clear separation of concerns

State Management Policy

Allowed: Runtime-Stateful, Process-Stateless

Connection pools, circuit breaker state, rate limiting buckets
Retry logic state, failover state, health check state
Runtime state that dies with the process
No persistent storage or cross-process state

Prohibited: Process-Persistent State

File storage, databases, configuration accumulation
State that survives process restarts

Enterprise Features

All enterprise features must be:

Feature-gated behind cargo features
Explicitly configured (no automatic enabling)
Transparently named (e.g., execute_with_retries())
Zero overhead when disabled

Available features:

retry - Exponential backoff retry logic
circuit_breaker - Failure threshold management
rate_limiting - Request throttling
request_caching - TTL-based response caching
failover - Multi-endpoint support
health_checks - Endpoint monitoring
streaming_control - Pause/resume/cancel streaming
count_tokens - Token counting before API calls
audio_processing - Speech-to-text and text-to-speech
batch_operations - Multiple request optimization
safety_settings - Content filtering and harm prevention

Crates

api_claude

Anthropic Claude API client with support for:

Chat completion with streaming
Prompt caching for system prompts and message history
Tool calling and function invocation
Vision support for image inputs
Token counting

Default Model: claude-sonnet-4-5-20250929

Features:

full - All features enabled
streaming - Streaming responses
tool_calling - Function calling support
vision_support - Image processing
cached_content - Prompt caching
count_tokens - Token counting
sync_api - Blocking API wrappers

api_gemini

Google Gemini API client with support for:

Chat completion with streaming
Content caching for system instructions
Function calling and tool use
Vision and multimodal inputs
File management and uploads
Code execution
Audio processing
Model tuning

Default Model: gemini-2.0-flash-exp

Features:

full - All features enabled
streaming - Streaming responses
tool_calling - Function calling support
vision_support - Multimodal inputs
cached_content - Content caching
count_tokens - Token counting
audio_processing - Speech-to-text/text-to-speech
batch_operations - Batch request optimization
sync_api - Blocking API wrappers

api_huggingface

Hugging Face Inference API client with support for:

Text generation
Chat completion
Embeddings
Token classification
Vision tasks
Audio processing
Streaming responses

Default Model: meta-llama/Llama-3.3-70B-Instruct

Features:

full - All features enabled
streaming - Streaming responses
embeddings - Embedding generation
vision_support - Image processing
audio_processing - Audio tasks
count_tokens - Token counting
sync_api - Blocking API wrappers

api_ollama

Ollama local LLM runtime API client with support for:

Chat completion
Text generation
Embeddings
Model management
Streaming responses
Vision support

Default Model: llama3.2:latest

Features:

full - All features enabled
streaming - Streaming responses
embeddings - Embedding generation
vision_support - Image processing
model_details - Enhanced model information
count_tokens - Token counting
cached_content - Response caching
sync_api - Blocking API wrappers

api_openai

OpenAI API client with support for:

Chat completion with streaming
Text generation
Embeddings
Vision inputs
Function calling
Audio processing (Whisper)
Image generation (DALL-E)

Default Model: gpt-4o

Features:

full - All features enabled
streaming - Streaming responses
tool_calling - Function calling support
vision_support - Image processing
audio_processing - Whisper integration
embeddings - Embedding generation
count_tokens - Token counting
sync_api - Blocking API wrappers

api_openai_compatible

Shared OpenAI wire-protocol HTTP layer consumed by any OpenAI-compatible API endpoint. Extracted from api_xai and available for reuse by other crates targeting OpenAI-compatible providers (KIE.ai, xAI, etc.).

Provides:

Chat completion request/response wire types
SSE streaming wire types
Async HTTP client generic over environment
Synchronous blocking wrapper
Environment configuration trait and default implementation

Features:

enabled — activates all public types and the HTTP client
streaming — Server-Sent Events streaming support
sync_api — blocking wrappers around the async client
integration — real-API integration tests (requires live credentials)
full — enables enabled, streaming, and sync_api

Architecture Notes:

Thin-client: every method maps to exactly one API endpoint
Generic over OpenAiCompatEnvironment to support multiple providers
api_openai wire types structurally differ (i32 vs u32, Role enum vs String, multimodal content) and are explicitly NOT consolidated; each crate retains its own type system

api_xai

X.AI Grok API client with support for:

Chat completion with streaming
Function calling and tool use
Model listing
OpenAI-compatible REST interface

Default Model: grok-beta

Features:

full - All features enabled
streaming - Streaming responses via SSE
tool_calling - Function calling support
retry - Exponential backoff retry logic
circuit_breaker - Failure threshold management
rate_limiting - Request throttling
failover - Multi-endpoint support
health_checks - Endpoint health monitoring
integration - Real API integration tests

Architecture Notes:

OpenAI-compatible API (base URL: https://api.x.ai/v1)
Simplified feature set compared to full OpenAI API
Focus on core chat and tool calling capabilities
Enterprise reliability features available but optional

Testing

Zero-Tolerance Policy

No Mocking - All tests use real API implementations
Loud Failures - Tests fail clearly when APIs unavailable
No Silent Passes - Integration tests never pass silently
Real Implementations Only - No stub/mock servers

Test Organization

api/*/tests/
├── integration_tests.rs     # Real API integration tests
├── unit_tests.rs            # Unit tests for client logic
└── manual/
    └── readme.md            # Manual testing procedures

Running Tests

# Load API keys
source secret/-secrets.sh

# Run all tests (requires API keys)
cargo test --workspace

# Run specific crate tests
cargo test -p api_openai

# Run with all features
cargo test --workspace --all-features

Secret Management

Local Development

API keys stored in secret/-secrets.sh:

#!/bin/bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export HUGGINGFACE_API_KEY="hf_..."
export XAI_API_KEY="xai-..."

File is gitignored and never committed.

CI/CD

API keys provided via environment variables in CI configuration.

Success Metrics

Compilation - All crates compile with zero warnings
Test Coverage - >90% code coverage across all crates
Integration Tests - All integration tests pass with real APIs
Documentation - All public APIs documented
Zero Panics - No unwrap() or expect() in production code paths
Feature Isolation - All features compile independently

Future Enhancements

Potential future additions (not currently in scope):

Additional provider APIs (Cohere, AI21, etc.)
Async runtime abstraction (support for different executors)
Custom HTTP client support
WebSocket streaming for real-time bidirectional communication
Enhanced observability (tracing, metrics)

Non-Goals

Explicitly not goals for this workspace:

Provider abstraction layer
Unified interface across providers
Provider routing or fallback logic
Service orchestration
Application frameworks
CLI tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api_llm Specification

Goal

Vision

Scope

In Scope

Out of Scope

Architecture

Governing Principle: Thin Client, Rich API

State Management Policy

Enterprise Features

Crates

api_claude

api_gemini

api_huggingface

api_ollama

api_openai

api_openai_compatible

api_xai

Testing

Zero-Tolerance Policy

Test Organization

Running Tests

Secret Management

Local Development

CI/CD

Success Metrics

Future Enhancements

Non-Goals

FilesExpand file tree

spec.md

Latest commit

History

spec.md

File metadata and controls

api_llm Specification

Goal

Vision

Scope

In Scope

Out of Scope

Architecture

Governing Principle: Thin Client, Rich API

State Management Policy

Enterprise Features

Crates

api_claude

api_gemini

api_huggingface

api_ollama

api_openai

api_openai_compatible

api_xai

Testing

Zero-Tolerance Policy

Test Organization

Running Tests

Secret Management

Local Development

CI/CD

Success Metrics

Future Enhancements

Non-Goals