Skip to content

Latest commit

 

History

History
320 lines (247 loc) · 9.3 KB

File metadata and controls

320 lines (247 loc) · 9.3 KB

api_llm Specification

Goal

Provide direct, transparent HTTP API bindings for major LLM providers without abstraction layers or automatic behaviors.

Vision

A collection of thin API clients that developers can confidently use, knowing exactly what HTTP calls are being made and having complete control over all operations.

Scope

In Scope

  1. Direct API Bindings - HTTP clients for LLM provider APIs
  2. Enterprise Features - Optional reliability features (retry, circuit breaker, rate limiting, caching, etc.)
  3. Workspace Secrets - Local API key management for development
  4. Comprehensive Testing - Real API integration tests with zero-tolerance policy

Out of Scope

  1. Provider Abstraction - No unified interface across providers
  2. Provider Switching - No automatic fallback or routing logic
  3. Service Layer - No proxy services or aggregation layers
  4. Application Modules - No CLI tools or high-level applications

Architecture

Governing Principle: Thin Client, Rich API

All API bindings follow these principles:

  1. API Transparency - Every method maps directly to an API endpoint
  2. Zero Client Intelligence - No automatic decision-making
  3. Explicit Control - Developers control all operations
  4. Information vs Action - Clear separation of concerns

State Management Policy

Allowed: Runtime-Stateful, Process-Stateless

  • Connection pools, circuit breaker state, rate limiting buckets
  • Retry logic state, failover state, health check state
  • Runtime state that dies with the process
  • No persistent storage or cross-process state

Prohibited: Process-Persistent State

  • File storage, databases, configuration accumulation
  • State that survives process restarts

Enterprise Features

All enterprise features must be:

  • Feature-gated behind cargo features
  • Explicitly configured (no automatic enabling)
  • Transparently named (e.g., execute_with_retries())
  • Zero overhead when disabled

Available features:

  • retry - Exponential backoff retry logic
  • circuit_breaker - Failure threshold management
  • rate_limiting - Request throttling
  • request_caching - TTL-based response caching
  • failover - Multi-endpoint support
  • health_checks - Endpoint monitoring
  • streaming_control - Pause/resume/cancel streaming
  • count_tokens - Token counting before API calls
  • audio_processing - Speech-to-text and text-to-speech
  • batch_operations - Multiple request optimization
  • safety_settings - Content filtering and harm prevention

Crates

api_claude

Anthropic Claude API client with support for:

  • Chat completion with streaming
  • Prompt caching for system prompts and message history
  • Tool calling and function invocation
  • Vision support for image inputs
  • Token counting

Default Model: claude-sonnet-4-5-20250929

Features:

  • full - All features enabled
  • streaming - Streaming responses
  • tool_calling - Function calling support
  • vision_support - Image processing
  • cached_content - Prompt caching
  • count_tokens - Token counting
  • sync_api - Blocking API wrappers

api_gemini

Google Gemini API client with support for:

  • Chat completion with streaming
  • Content caching for system instructions
  • Function calling and tool use
  • Vision and multimodal inputs
  • File management and uploads
  • Code execution
  • Audio processing
  • Model tuning

Default Model: gemini-2.0-flash-exp

Features:

  • full - All features enabled
  • streaming - Streaming responses
  • tool_calling - Function calling support
  • vision_support - Multimodal inputs
  • cached_content - Content caching
  • count_tokens - Token counting
  • audio_processing - Speech-to-text/text-to-speech
  • batch_operations - Batch request optimization
  • sync_api - Blocking API wrappers

api_huggingface

Hugging Face Inference API client with support for:

  • Text generation
  • Chat completion
  • Embeddings
  • Token classification
  • Vision tasks
  • Audio processing
  • Streaming responses

Default Model: meta-llama/Llama-3.3-70B-Instruct

Features:

  • full - All features enabled
  • streaming - Streaming responses
  • embeddings - Embedding generation
  • vision_support - Image processing
  • audio_processing - Audio tasks
  • count_tokens - Token counting
  • sync_api - Blocking API wrappers

api_ollama

Ollama local LLM runtime API client with support for:

  • Chat completion
  • Text generation
  • Embeddings
  • Model management
  • Streaming responses
  • Vision support

Default Model: llama3.2:latest

Features:

  • full - All features enabled
  • streaming - Streaming responses
  • embeddings - Embedding generation
  • vision_support - Image processing
  • model_details - Enhanced model information
  • count_tokens - Token counting
  • cached_content - Response caching
  • sync_api - Blocking API wrappers

api_openai

OpenAI API client with support for:

  • Chat completion with streaming
  • Text generation
  • Embeddings
  • Vision inputs
  • Function calling
  • Audio processing (Whisper)
  • Image generation (DALL-E)

Default Model: gpt-4o

Features:

  • full - All features enabled
  • streaming - Streaming responses
  • tool_calling - Function calling support
  • vision_support - Image processing
  • audio_processing - Whisper integration
  • embeddings - Embedding generation
  • count_tokens - Token counting
  • sync_api - Blocking API wrappers

api_openai_compatible

Shared OpenAI wire-protocol HTTP layer consumed by any OpenAI-compatible API endpoint. Extracted from api_xai and available for reuse by other crates targeting OpenAI-compatible providers (KIE.ai, xAI, etc.).

Provides:

  • Chat completion request/response wire types
  • SSE streaming wire types
  • Async HTTP client generic over environment
  • Synchronous blocking wrapper
  • Environment configuration trait and default implementation

Features:

  • enabled — activates all public types and the HTTP client
  • streaming — Server-Sent Events streaming support
  • sync_api — blocking wrappers around the async client
  • integration — real-API integration tests (requires live credentials)
  • full — enables enabled, streaming, and sync_api

Architecture Notes:

  • Thin-client: every method maps to exactly one API endpoint
  • Generic over OpenAiCompatEnvironment to support multiple providers
  • api_openai wire types structurally differ (i32 vs u32, Role enum vs String, multimodal content) and are explicitly NOT consolidated; each crate retains its own type system

api_xai

X.AI Grok API client with support for:

  • Chat completion with streaming
  • Function calling and tool use
  • Model listing
  • OpenAI-compatible REST interface

Default Model: grok-beta

Features:

  • full - All features enabled
  • streaming - Streaming responses via SSE
  • tool_calling - Function calling support
  • retry - Exponential backoff retry logic
  • circuit_breaker - Failure threshold management
  • rate_limiting - Request throttling
  • failover - Multi-endpoint support
  • health_checks - Endpoint health monitoring
  • integration - Real API integration tests

Architecture Notes:

  • OpenAI-compatible API (base URL: https://api.x.ai/v1)
  • Simplified feature set compared to full OpenAI API
  • Focus on core chat and tool calling capabilities
  • Enterprise reliability features available but optional

Testing

Zero-Tolerance Policy

  • No Mocking - All tests use real API implementations
  • Loud Failures - Tests fail clearly when APIs unavailable
  • No Silent Passes - Integration tests never pass silently
  • Real Implementations Only - No stub/mock servers

Test Organization

api/*/tests/
├── integration_tests.rs     # Real API integration tests
├── unit_tests.rs            # Unit tests for client logic
└── manual/
    └── readme.md            # Manual testing procedures

Running Tests

# Load API keys
source secret/-secrets.sh

# Run all tests (requires API keys)
cargo test --workspace

# Run specific crate tests
cargo test -p api_openai

# Run with all features
cargo test --workspace --all-features

Secret Management

Local Development

API keys stored in secret/-secrets.sh:

#!/bin/bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export HUGGINGFACE_API_KEY="hf_..."
export XAI_API_KEY="xai-..."

File is gitignored and never committed.

CI/CD

API keys provided via environment variables in CI configuration.

Success Metrics

  1. Compilation - All crates compile with zero warnings
  2. Test Coverage - >90% code coverage across all crates
  3. Integration Tests - All integration tests pass with real APIs
  4. Documentation - All public APIs documented
  5. Zero Panics - No unwrap() or expect() in production code paths
  6. Feature Isolation - All features compile independently

Future Enhancements

Potential future additions (not currently in scope):

  • Additional provider APIs (Cohere, AI21, etc.)
  • Async runtime abstraction (support for different executors)
  • Custom HTTP client support
  • WebSocket streaming for real-time bidirectional communication
  • Enhanced observability (tracing, metrics)

Non-Goals

Explicitly not goals for this workspace:

  • Provider abstraction layer
  • Unified interface across providers
  • Provider routing or fallback logic
  • Service orchestration
  • Application frameworks
  • CLI tools