Note
COMING SOON! 🚀 Cache is not yet implemented in Python, but is available today in TypeScript
- Overview
- Core Concepts
- Basic Usage
- Cache Types
- Advanced Usage
- Creating a Custom Cache Provider
- Examples
Caching is a technique used to temporarily store copies of data or computation results to improve performance by reducing the need to repeatedly fetch or compute the same data from slower or more resource-intensive sources.
In the context of AI applications, caching provides several important benefits:
- 🚀 Performance improvement - Avoid repeating expensive operations like API calls or complex calculations
- 💰 Cost reduction - Minimize repeated calls to paid services (like external APIs or LLM providers)
- ⚡ Latency reduction - Deliver faster responses to users by serving cached results
- 🔄 Consistency - Ensure consistent responses for identical inputs
BeeAI framework provides a robust caching system with multiple implementations to suit different use cases.
BeeAI framework offers several cache implementations out of the box:
Type | Description |
---|---|
UnconstrainedCache | Simple in-memory cache with no limits |
SlidingCache | In-memory cache that maintains a maximum number of entries |
FileCache | Persistent cache that stores data on disk |
NullCache | Special implementation that performs no caching (useful for testing) |
Each cache type implements the BaseCache
interface, making them interchangeable in your code.
BeeAI framework supports several caching patterns:
Usage pattern | Description |
---|---|
Direct caching | Manually store and retrieve values |
Function decoration | Automatically cache function returns |
Tool integration | Cache tool execution results |
LLM integration | Cache model responses |
The simplest way to use caching is to wrap a function that produces deterministic output:
import asyncio
import sys
import traceback
from beeai_framework.cache import UnconstrainedCache
from beeai_framework.errors import FrameworkError
async def main() -> None:
cache: UnconstrainedCache[int] = UnconstrainedCache()
async def fibonacci(n: int) -> int:
cache_key = str(n)
cached = await cache.get(cache_key)
if cached:
return int(cached)
if n < 1:
result = 0
elif n <= 2:
result = 1
else:
result = await fibonacci(n - 1) + await fibonacci(n - 2)
await cache.set(cache_key, result)
return result
print(await fibonacci(10)) # 55
print(await fibonacci(9)) # 34 (retrieved from cache)
print(f"Cache size {await cache.size()}") # 10
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Source: examples/cache/unconstrained_cache_function.py
BeeAI framework's caching system seamlessly integrates with tools:
import asyncio
import sys
import traceback
from beeai_framework.cache import SlidingCache
from beeai_framework.errors import FrameworkError
from beeai_framework.tools.search.wikipedia import (
WikipediaTool,
WikipediaToolInput,
)
async def main() -> None:
wikipedia_client = WikipediaTool({"full_text": True, "cache": SlidingCache(size=100, ttl=5 * 60)})
print(await wikipedia_client.cache.size()) # 0
tool_input = WikipediaToolInput(query="United States")
first = await wikipedia_client.run(tool_input)
print(await wikipedia_client.cache.size()) # 1
# new request with the EXACTLY same input will be retrieved from the cache
tool_input = WikipediaToolInput(query="United States")
second = await wikipedia_client.run(tool_input)
print(first.get_text_content() == second.get_text_content()) # True
print(await wikipedia_client.cache.size()) # 1
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Source: examples/cache/tool_cache.py
You can also cache LLM responses to save on API costs:
import asyncio
import sys
import traceback
from beeai_framework.adapters.ollama import OllamaChatModel
from beeai_framework.backend import ChatModelParameters, UserMessage
from beeai_framework.cache import SlidingCache
from beeai_framework.errors import FrameworkError
async def main() -> None:
llm = OllamaChatModel("llama3.1")
llm.config(parameters=ChatModelParameters(max_tokens=25), cache=SlidingCache(size=50))
print(await llm.cache.size()) # 0
first = await llm.create(messages=[UserMessage("Who is Amilcar Cabral?")])
print(await llm.cache.size()) # 1
# new request with the EXACTLY same input will be retrieved from the cache
second = await llm.create(messages=[UserMessage("Who is Amilcar Cabral?")])
print(first.get_text_content() == second.get_text_content()) # True
print(await llm.cache.size()) # 1
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Source: examples/cache/llm_cache.py
The simplest cache type with no constraints on size or entry lifetime. Good for development and smaller applications.
import asyncio
import sys
import traceback
from beeai_framework.cache import UnconstrainedCache
from beeai_framework.errors import FrameworkError
async def main() -> None:
cache: UnconstrainedCache[int] = UnconstrainedCache()
# Save
await cache.set("a", 1)
await cache.set("b", 2)
# Read
result = await cache.has("a")
print(result) # True
# Meta
print(cache.enabled) # True
print(await cache.has("a")) # True
print(await cache.has("b")) # True
print(await cache.has("c")) # False
print(await cache.size()) # 2
# Delete
await cache.delete("a")
print(await cache.has("a")) # False
# Clear
await cache.clear()
print(await cache.size()) # 0
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Source: examples/cache/unconstrained_cache.py
Maintains a maximum number of entries, removing the oldest entries when the limit is reached.
import asyncio
import sys
import traceback
from beeai_framework.cache import SlidingCache
from beeai_framework.errors import FrameworkError
async def main() -> None:
cache: SlidingCache[int] = SlidingCache(
size=3, # (required) number of items that can be live in the cache at a single moment
ttl=1, # // (optional, default is Infinity) Time in seconds after the element is removed from a cache
)
await cache.set("a", 1)
await cache.set("b", 2)
await cache.set("c", 3)
await cache.set("d", 4) # overflow - cache internally removes the oldest entry (key "a")
print(await cache.has("a")) # False
print(await cache.size()) # 3
if __name__ == "__main__":
try:
asyncio.run(main())
except FrameworkError as e:
traceback.print_exc()
sys.exit(e.explain())
Source: examples/cache/sliding_cache.py
Persists cache data to disk, allowing data to survive if application restarts.
Coming soon
Source: examples/cache/fileCache.py
You can customize how the FileCache stores data:
Coming soon
Source: examples/cache/fileCacheCustomProvider.py
A special cache that implements the BaseCache
interface but performs no caching. Useful for testing or temporarily disabling caching.
The reason for implementing is to enable Null object pattern.
The framework provides a convenient decorator for automatically caching function results:
Coming soon
Source: examples/cache/decoratorCache.py
For more complex caching logic, you can customize the key generation:
Coming soon
Source: /examples/cache/decoratorCacheComplex.py
For more dynamic caching needs, the CacheFn
helper provides a functional approach:
Coming soon
Source: /examples/cache/cacheFn.py
You can create your own cache implementation by extending the BaseCache
class:
from typing import TypeVar
from beeai_framework.cache import BaseCache
T = TypeVar("T")
class CustomCache(BaseCache[T]):
async def size(self) -> int:
raise NotImplementedError("CustomCache 'size' not yet implemented")
async def set(self, _key: str, _value: T) -> None:
raise NotImplementedError("CustomCache 'set' not yet implemented")
async def get(self, key: str) -> T | None:
raise NotImplementedError("CustomCache 'get' not yet implemented")
async def has(self, key: str) -> bool:
raise NotImplementedError("CustomCache 'has' not yet implemented")
async def delete(self, key: str) -> bool:
raise NotImplementedError("CustomCache 'delete' not yet implemented")
async def clear(self) -> None:
raise NotImplementedError("CustomCache 'clear' not yet implemented")
Source: examples/cache/custom.py
- All cache examples are coming soon in python.