this project was made by a student participating in Hack Club & Hack club midnight: https://midnight.hackclub.com & https://hackclub.com
PromptKit is a lightweight, configuration-driven prompt orchestration library that separates prompt definitions from application code. By storing templates, model specifications, and tool configurations in TOML files, PromptKit enables teams to share and version prompt definitions independently while swapping LLM providers through a unified interface.
- Configuration-first design: Define prompts, models, providers, and temperatures in TOML files separate from code
- Opt-in caching: No default cache behavior—provide a
PromptCacheProtocolimplementation toPromptRunnerfor caching - Streaming support: Stream LLM responses token-by-token using
PromptRunner.run_stream(...) - Built-in LiteLLM integration: Ships with a
LiteLLMClientthat provides access to 100+ LLM providers (OpenAI, Anthropic, Google, Azure, and more) via the LiteLLM SDK - Tool-calling: Define HTTP, stdio, or SSE tools in TOML; automatically passed to LLM for execution
- Extensible hooks: Observe and react to prompt lifecycle events with custom
PromptHookimplementations - Type-safe: Built with Pydantic models for runtime validation and type safety
PromptKit uses TOML files to define prompts and their execution parameters. Here's a minimal example:
# prompts.toml
[models]
welcome = "demo/unit-test"
[providers]
welcome = "demo"
[temperatures]
welcome = 0.1
[welcome]
template = "Hello {name}, welcome to {product}!"Configuration sections:
[models]: Maps prompt names to model identifiers (format varies by provider)[providers]: Maps prompt names to provider keys (registered withPromptRunner)[temperatures]: Maps prompt names to sampling temperatures (0.0–2.0)[prompt_name]: Individual prompt sections containing the template and optional tool configurations
Here's a minimal example using a custom echo client:
from py_promptkit import PromptLoader, PromptRunner
from py_promptkit.models.clients import LLMResponse, ToolSpecification
class EchoClient:
"""Simple echo client for testing.
This implements the `LLMClient` protocol (no subclassing required).
Clients are registered with `PromptRunner` as instances; the runner
will pass the prompt's `model` and `temperature` to `generate`/
`stream_generate` at call time.
"""
supports_tools = False
def generate(
self,
prompt: str,
tools: list[ToolSpecification] | None = None,
model: str | None = None,
temperature: float | None = None,
) -> LLMResponse:
return {"reasoning": "echo", "output": prompt}
def stream_generate(
self,
prompt: str,
tools: list[ToolSpecification] | None = None,
model: str | None = None,
temperature: float | None = None,
):
yield prompt
# Load configuration
loader = PromptLoader("prompts.toml")
loader.load()
# Create runner (no caching by default)
runner = PromptRunner(loader)
# Register client instance for the "demo" provider
# Note: PromptRunner registers client *instances* directly (the older
# ClientFactory pattern was removed). The runner will call the instance
# methods and provide model/temperature at call time.
runner.register_client("demo", EchoClient())
# Execute prompt
result = runner.run("welcome", {"name": "Ada", "product": "PromptKit"})
print(result["output"]) # Output: Hello Ada, welcome to PromptKit!Stream tokens as they arrive from the LLM:
chunks = runner.run_stream("welcome", {"name": "Ada", "product": "PromptKit"})
for chunk in chunks:
print(chunk, end="", flush=True)PromptKit includes a LiteLLMClient adapter that wraps the LiteLLM SDK, providing unified access to 100+ LLM providers including:
- OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo, o1, o3-mini, and more
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- Google: Gemini 2.0 Flash, Gemini 1.5 Pro
- Azure OpenAI: Enterprise-grade OpenAI models
- Open-source models: Via providers like Together AI, Replicate, Hugging Face, Ollama
- And many more: See the full provider list
The LiteLLM client requires the litellm package:
pip install litellmfrom py_promptkit import PromptLoader, PromptRunner
from py_promptkit.litellm.core import LiteLLMClient
# Load prompts
loader = PromptLoader("prompts.toml")
loader.load()
# Configure secrets for LiteLLM
secrets = {
"OPENAI_API_KEY": "sk-...",
# "ANTHROPIC_API_KEY": "...",
# "GOOGLE_API_KEY": "...",
}
# Use context manager for automatic cleanup
with PromptRunner(loader) as runner:
runner.register_client("openai", LiteLLMClient(secrets=secrets))
result = runner.run("welcome", {"name": "Ada", "product": "PromptKit"})
print(result["output"])
#### Model Naming Conventions
LiteLLM uses provider-specific model identifiers. Here are some examples:
**OpenAI models:**
```python
"gpt-4o" # Latest GPT-4o
"gpt-4o-mini" # Smaller, faster GPT-4o variant
"gpt-4-turbo" # GPT-4 Turbo
"gpt-3.5-turbo" # GPT-3.5 Turbo
"o1" # OpenAI o1 reasoning model
"o3-mini" # OpenAI o3-mini
Anthropic models:
"claude-3-5-sonnet-20241022" # Claude 3.5 Sonnet (latest)
"claude-3-opus-20240229" # Claude 3 Opus
"claude-3-haiku-20240307" # Claude 3 HaikuGoogle models:
"gemini-2.0-flash-exp" # Gemini 2.0 Flash (experimental)
"gemini-1.5-pro" # Gemini 1.5 ProAzure OpenAI:
"azure/gpt-4o" # Azure-hosted GPT-4oFor a complete list of supported models and naming conventions, see the LiteLLM providers documentation.
# prompts.toml
[models]
chat = "gpt-4o-mini" # OpenAI model
code_gen = "claude-3-5-sonnet-20241022" # Anthropic model
[providers]
chat = "openai"
code_gen = "anthropic"
[temperatures]
chat = 0.7
code_gen = 0.3
[chat]
template = "You are a helpful assistant. {user_message}"
[code_gen]
template = "Write clean, documented code for: {task}"with PromptRunner(loader) as runner:
runner.register_client("openai", LiteLLMClient(secrets=secrets))
for chunk in runner.run_stream("chat", {"user_message": "Explain quantum computing"}):
print(chunk, end="", flush=True)PromptKit supports automatic tool execution when tools are defined in your TOML configuration. The PromptLoader reads tool specifications and passes them to the LLM client during execution.
Tools are configured per-prompt in the TOML file:
[models]
assistant = "gpt-4o"
[providers]
assistant = "openai"
[temperatures]
assistant = 0.7
[assistant]
template = "Help the user with: {request}"
# Tool configuration for this prompt
[assistant.tool]
type = "http"
url = "https://api.example.com/tools/calculator"
name = "calculator"
description = "Performs mathematical calculations"
parameters = '{"type": "object", "properties": {"expression": {"type": "string", "description": "Mathematical expression to evaluate"}}, "required": ["expression"]}'When you run this prompt, the tool specification is automatically passed to the client:
from py_promptkit import PromptLoader, PromptRunner
from py_promptkit.litellm.core import LiteLLMClient
loader = PromptLoader("prompts.toml")
loader.load()
with PromptRunner(loader) as runner:
runner.register_client("openai", LiteLLMClient(secrets={"OPENAI_API_KEY": "sk-..."}))
# Tools from TOML are automatically used
result = runner.run("assistant", {"request": "What is 25 * 4?"})
print(result["output"])HTTP Tools (type = "http"):
- Makes POST requests to the specified URL
- Sends tool arguments as JSON in request body
- Returns response text to the LLM
MCP Tools (type = "stdio" or type = "sse"):
- Connects to MCP (Model Context Protocol) servers
stdio: Uses standard I/O for local MCP server executablessse: Uses Server-Sent Events for remote MCP servers
For MCP tools, you need to initialize MCP clients when creating the LiteLLMClient:
from py_promptkit.litellm.core import LiteLLMClient
# Initialize MCP client connections
mcp_tools = [
{
"name": "file_reader",
"type": "stdio",
"url": "./tools/file_reader" # Path to MCP server executable
}
]
with PromptRunner(loader) as runner:
runner.register_client("openai", LiteLLMClient(mcp_tools=mcp_tools, secrets=secrets))
# Use prompt that references the MCP tool
result = runner.run("file_assistant", {"filename": "data.json"})Then in your TOML:
[file_assistant]
template = "Read and summarize the file: {filename}"
[file_assistant.tool]
type = "stdio"
name = "file_reader"
description = "Reads file contents"
parameters = '{"type": "object", "properties": {"path": {"type": "string"}}}'Tool parameters should be a JSON schema (as a string or dict):
# As a JSON string
parameters = '{"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}'
# Or as a TOML table (will be converted)
[my_prompt.tool.parameters]
type = "object"
[my_prompt.tool.parameters.properties.query]
type = "string"
description = "Search query"Note: Tool-calling support varies by model. The LiteLLMClient uses pattern matching to detect tool-capable models (e.g., GPT-4, GPT-4o, Claude 3+, Gemini Pro). See the LiteLLM documentation for model-specific support.
Implement the LLMClient protocol to integrate any LLM provider:
from py_promptkit.models.clients import LLMClient, LLMResponse, ToolSpecification
from typing import Iterator
class MyCustomClient:
"""Example custom client implementing the `LLMClient` protocol.
Note: implement the same method signatures as the `LLMClient` protocol.
Register client instances directly with `PromptRunner`.
"""
supports_tools = False # Set to True if your client supports tool-calling
def generate(
self,
prompt: str,
tools: list[ToolSpecification] | None = None,
model: str | None = None,
temperature: float | None = None,
) -> LLMResponse:
# Your custom LLM API logic here
response_text = call_my_llm_api(prompt, model, temperature)
return {"reasoning": "", "output": response_text}
def stream_generate(
self,
prompt: str,
tools: list[ToolSpecification] | None = None,
model: str | None = None,
temperature: float | None = None,
) -> Iterator[str]:
# Stream tokens from your LLM
for token in stream_my_llm_api(prompt, model, temperature):
yield token
# Register your custom client instance
runner.register_client("my_provider", MyCustomClient())Implement PromptHook to observe or modify prompt execution:
from py_promptkit.models.hooks import PromptHook, HookContext
from py_promptkit.models.clients import LLMResponse
class LoggingHook(PromptHook):
def before_run(self, context: HookContext) -> None:
print(f"Running prompt: {context.prompt_name}")
print(f"Model: {context.model.name}")
print(f"Variables: {context.variables}")
def after_run(self, context: HookContext, response: LLMResponse) -> None:
print(f"Completed: {context.prompt_name}")
print(f"Output length: {len(response['output'])} chars")
def on_error(self, context: HookContext, error: Exception) -> None:
print(f"Error in {context.prompt_name}: {error}")
# Register hooks when creating the runner
runner = PromptRunner(loader, hooks=[LoggingHook()])Hook use cases:
- Logging and observability
- Cost tracking and usage metrics
- Custom validation or transformation
- Integration with monitoring systems
- A/B testing and experimentation
PromptKit does not cache responses by default. To enable caching, implement the PromptCacheProtocol and pass it to PromptRunner.
from typing import Protocol, Mapping
class PromptCacheProtocol(Protocol):
"""Protocol for custom cache implementations."""
def build_key(
self,
prompt: str,
model_name: str,
provider: str,
temperature: float,
variables: Mapping[str, str],
) -> str:
"""Generate a deterministic cache key."""
...
def get(self, key: str) -> str | None:
"""Retrieve cached value if present."""
...
def set(self, key: str, value: str) -> None:
"""Store a value in the cache."""
...import json
import hashlib
from typing import Mapping
class InMemoryCache:
def __init__(self) -> None:
self._store: dict[str, str] = {}
def build_key(
self,
prompt: str,
model_name: str,
provider: str,
temperature: float,
variables: Mapping[str, str],
) -> str:
# Create deterministic hash from all inputs
payload = {
"prompt": prompt,
"model": model_name,
"provider": provider,
"temperature": round(temperature, 3),
"variables": dict(sorted(variables.items())),
}
content = json.dumps(payload, sort_keys=True).encode()
return hashlib.sha256(content).hexdigest()
def get(self, key: str) -> str | None:
return self._store.get(key)
def set(self, key: str, value: str) -> None:
self._store[key] = value
# Use the cache
cache = InMemoryCache()
runner = PromptRunner(loader, cache=cache)
# First call executes the LLM
result1 = runner.run("welcome", {"name": "Ada", "product": "PromptKit"})
# Second call with same inputs returns cached result
result2 = runner.run("welcome", {"name": "Ada", "product": "PromptKit"})# Bypass cache for a specific request
result = runner.run("welcome", {"name": "Ada"}, use_cache=False)Note: Streaming via run_stream() never uses caching, even when a cache is configured.
Configure prompts to expect structured JSON responses:
[models]
data_extractor = "gpt-4o"
[providers]
data_extractor = "openai"
[temperatures]
data_extractor = 0.0
[data_extractor]
template = "Extract structured data from: {text}"
structured = true
schema_path = "schemas/extraction.json"The schema_path should point to a JSON schema file that defines the expected output structure.
Configure different prompts to use different providers:
[models]
chat = "gpt-4o-mini"
analysis = "claude-3-5-sonnet-20241022"
embedding = "text-embedding-3-small"
[providers]
chat = "openai"
analysis = "anthropic"
embedding = "openai"
[temperatures]
chat = 0.7
analysis = 0.3
embedding = 0.0
[chat]
template = "Chat with the user about: {topic}"
[analysis]
template = "Analyze this code:\n\n{code}\n\nProvide suggestions for improvement."
[embedding]
template = "{text}"Register multiple clients:
from py_promptkit.litellm.core import LiteLLMClient
openai_secrets = {"OPENAI_API_KEY": "sk-..."}
anthropic_secrets = {"ANTHROPIC_API_KEY": "sk-ant-..."}
with PromptRunner(loader) as runner:
runner.register_client("openai", LiteLLMClient(secrets=openai_secrets))
runner.register_client("anthropic", LiteLLMClient(secrets=anthropic_secrets))
# Use different models for different tasks
chat_result = runner.run("chat", {"topic": "AI safety"})
analysis_result = runner.run("analysis", {"code": "def factorial(n): ..."})PromptLoader
__init__(config_path: str | Path): Initialize with path to TOML configurationload() -> Dict[str, PromptDefinition]: Load and validate all prompt definitionsget(name: str) -> PromptDefinition: Retrieve a specific prompt definitionavailable_prompts: Iterable[str]: List of all available prompt names
PromptRunner
__init__(loader: PromptLoader, *, hooks: Sequence[PromptHook] | None = None, cache: PromptCacheProtocol | None = None): Initialize runnerregister_client(provider: str, client: LLMClient) -> None: Register an LLM client instance for a provider (pass a ready-to-use client object)run(prompt_name: str, variables: Mapping[str, object] | None = None, *, tools: Sequence[ToolSpecification] | None = None, use_cache: bool = True) -> LLMResponse: Execute a promptrun_stream(prompt_name: str, variables: Mapping[str, object] | None = None, *, tools: Sequence[ToolSpecification] | None = None) -> Iterator[str]: Stream prompt executionclose() -> None: Close all registered LLM clients- Context manager support: Use
with PromptRunner(loader) as runner:for automatic cleanup
LiteLLMClient
__init__(mcp_tools: list[dict[str, Any]] | None = None, secrets: dict[str, str | None] | None = None, verbose: bool = False): Initialize LiteLLM clientclose() -> None: Close all MCP clients and clean up resources- Context manager support: Use
with LiteLLMClient(...) as client:for automatic cleanup
LLMResponse (TypedDict)
{
"reasoning": str, # Model's reasoning or chain-of-thought (if available)
"output": str # Final response text
}ToolSpecification (TypedDict)
{
"name": str, # Tool identifier
"description": str, # Tool description for the LLM
"parameters": Dict[str, Any], # JSON schema for tool parameters
"type": str, # Transport type: "stdio", "sse", or "http"
"url": str # Tool endpoint URL or path
}HookContext (dataclass)
@dataclass(frozen=True)
class HookContext:
prompt_name: str # Name of the prompt being executed
model: ModelConfig # Model configuration
variables: Mapping[str, str] # Rendered variables
rendered_prompt: str # Final prompt after template rendering
tools: Sequence[ToolSpecification] | None # Tools available for this executionPromptKit defines specific exception types for different failure modes:
from py_promptkit.errors import (
PromptKitError, # Base exception
PromptConfigError, # Configuration/TOML parsing errors
PromptValidationError, # Variable validation errors
PromptProviderError, # Provider/client registration errors
ModelRequestError, # LLM API request failures (LiteLLM)
MCPError # MCP tool execution errors
)
try:
result = runner.run("my_prompt", {"var": "value"})
except PromptValidationError as e:
print(f"Missing or invalid variable: {e}")
except PromptProviderError as e:
print(f"Provider not registered: {e}")
except ModelRequestError as e:
print(f"LLM request failed: {e}")# Basic installation
pip install py_promptkit
# With LiteLLM support
pip install litellm
# Development installation
git clone https://github.com/yourusername/promptkit.git
cd py_promptkit
pip install -e ".[dev]"- Python 3.10+
- Pydantic >= 1.10
- tomli >= 2.0.1 (Python < 3.11)
- typing-extensions >= 4.8.0
Optional dependencies:
- litellm (for LiteLLMClient and 100+ provider support)
- LiteLLM Documentation: https://docs.litellm.ai/
- LiteLLM Providers: https://docs.litellm.ai/docs/providers
- Model Context Protocol (MCP): https://modelcontextprotocol.io/
- OpenAI Models: https://platform.openai.com/docs/models
- Anthropic Models: https://docs.anthropic.com/
- Google AI Models: https://ai.google.dev/models
MIT License - see LICENSE file for details.
PromptKit — Configuration-driven prompt orchestration for LLM applications.