BioCage

BioCage is a fast, secure, and practical Python sandbox designed specifically for safely executing code generated by Large Language Models (LLMs). It provides a robust containerized environment that isolates code execution while maintaining state persistence and file system integration.

🎯 Why BioCage?

BioCage addresses critical needs in the AI era:

🔒 Security: Complete isolation through Docker with no network access and restricted file system
🧠 State Persistence: Variables, imports, and functions persist across multiple executions
📁 File Integration: Seamlessly expose files and directories to the sandbox
⚡ Performance: Fast startup times with optimized container management
🛡️ Reliability: Comprehensive error handling with detailed diagnostics
🔧 Flexibility: Support for both ephemeral and persistent execution modes

✨ Key Features

🔒 Security First

Complete Docker containerization with no host system access
Network isolation (disabled by default)
Resource limits (memory, CPU, execution time)
Read-only filesystem with controlled write access
Safe execution of potentially malicious AI-generated code

🧠 Intelligent State Management

Variable persistence across executions
Import persistence - no need to re-import libraries
Function and class definitions remain available
DataFrame modifications persist between runs
Session-based workflows for complex data processing

📁 Advanced File System Integration

File exposure: Mount individual files from host to container
Directory exposure: Mount directories with read-only or read-write access
Temporary file creation: Create files accessible within the sandbox
Automatic cleanup: Files and mounts cleaned up after execution

⚡ Performance Optimized

Pre-built images with common data science libraries
Fast container startup and execution
Efficient resource usage with proper limits
Minimal overhead for code execution

🧠 Smart Dependency Management

Automatic dependency detection from Python code
Dynamic Docker image generation with UV package manager
Intelligent image caching for performance optimization
Auto-cleanup of dynamic images (enabled by default) to save disk space
Manual cleanup control for advanced use cases

🚀 Quick Start

Installation

pip install biocage

Basic Usage

from biocage import BioCageOrchestrator

# Simple one-time execution
with BioCageOrchestrator() as sandbox:
    result = sandbox.run('print("Hello from BioCage!")')
    print(result.stdout)  # "Hello from BioCage!"

State Persistence Example

# Variables and imports persist across executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up your environment once
    sandbox.run("""
    import pandas as pd
    import numpy as np

    # Define helper functions
    def analyze_data(data):
        return {
            'mean': np.mean(data),
            'std': np.std(data),
            'count': len(data)
        }

    # Initialize data
    df = pd.DataFrame({
        'values': np.random.randn(1000),
        'category': np.random.choice(['A', 'B', 'C'], 1000)
    })
    """)

    # Use the environment in subsequent executions
    result = sandbox.run("""
    # Everything from previous execution is still available
    stats = analyze_data(df['values'])
    print(f"Data statistics: {stats}")

    # Modify the DataFrame
    df['squared'] = df['values'] ** 2
    print(f"DataFrame shape: {df.shape}")
    """)

    print(result.stdout)

File and Directory Integration

# Expose files and directories to the sandbox
with BioCageOrchestrator() as sandbox:
    # Expose individual files
    sandbox.expose_file("/path/to/data.csv", "/app/data.csv")
    sandbox.expose_file("/path/to/config.json", "/app/config.json")

    # Expose entire directories
    sandbox.expose_directory("/path/to/models", "/app/models", readonly=True)

    result = sandbox.run("""
    import pandas as pd
    import json
    import os

    # Read exposed files
    df = pd.read_csv('/app/data.csv')
    with open('/app/config.json', 'r') as f:
        config = json.load(f)

    # List available models
    models = os.listdir('/app/models')
    print(f"Available models: {models}")

    print(f"Processed {len(df)} rows successfully")
    """)

    print(result.stdout)

Execution Modes

# Ephemeral mode - maximum security, no state persistence
with BioCageOrchestrator(execution_mode="ephemeral") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # NameError: x not defined
    print("Ephemeral mode: variables don't persist")

# Persistent mode - state persists between executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # Output: 42
    print("Persistent mode: variables persist")

# Auto mode - intelligent selection based on usage
with BioCageOrchestrator(execution_mode="auto") as sandbox:
    result = sandbox.run("print('Auto mode adapts to your needs')")

🔧 Advanced Features

Resource Management

# Configure container resources
with BioCageOrchestrator(
    memory_limit="2g",        # 2GB memory limit
    cpu_limit="2.0",          # 2 CPU cores
    network_access=False      # Disabled by default for security
) as sandbox:

    # Monitor container information
    info = sandbox.get_container_info()
    print(f"Container ID: {info['container_id'][:12]}")
    print(f"Memory limit: {info['memory_limit']}")
    print(f"CPU limit: {info['cpu_limit']}")

    result = sandbox.run("import numpy as np; print('Resources configured!')")

Error Handling and Debugging

def safe_execute(code, description=""):
    """Execute code with comprehensive error handling."""
    with BioCageOrchestrator() as sandbox:
        result = sandbox.run(code, timeout=30)

        if result.success:
            return f"✅ {description}: {result.stdout.strip()}"
        else:
            return f"❌ {description} failed: {result.stderr.strip()}"

# Test different scenarios
print(safe_execute("print('Hello World!')", "Basic execution"))
print(safe_execute("print(undefined_var)", "Variable error"))
print(safe_execute("import nonexistent_module", "Import error"))

Smart Dependencies and Auto-Cleanup

# Automatic dependency detection with auto-cleanup (default)
with BioCageOrchestrator(auto_detect_dependencies=True) as sandbox:
    # BioCage automatically detects pandas and numpy
    result = sandbox.run("""
import pandas as pd
import numpy as np

df = pd.DataFrame({'x': np.random.rand(5)})
print(f"Created DataFrame with shape: {df.shape}")
""")
    print(result.stdout)
    # Dynamic image automatically removed when context exits

# Disable auto-cleanup to cache images for reuse
with BioCageOrchestrator(
    auto_detect_dependencies=True,
    auto_cleanup_dynamic_images=False
) as sandbox:
    # First execution builds image with matplotlib
    result1 = sandbox.run("""
import matplotlib.pyplot as plt
print("Matplotlib imported successfully")
""")
    
    # Subsequent executions reuse the same image
    result2 = sandbox.run("""
# Matplotlib is still available
plt.figure()
plt.plot([1, 2, 3, 4])
print("Plot created successfully")
""")
    
    # Images preserved for future sessions

Container Restart with State Recovery

with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up initial state
    sandbox.run("x = 100; import math")

    # Get container info
    info = sandbox.get_container_info()
    old_container = info['container_id']

    # Restart container (state is automatically restored)
    new_container = sandbox.restart_container()

    # State persists after restart
    result = sandbox.run("print(f'x = {x}, sqrt(x) = {math.sqrt(x)}')")
    print(result.stdout)  # Output: x = 100, sqrt(x) = 10.0

📋 Execution Results

All executions return a comprehensive SandboxExecutionResult object:

result = sandbox.run('print("Hello"); import sys; print(sys.version)')

print(f"Success: {result.success}")           # True/False
print(f"Output: {result.stdout}")             # "Hello\n3.11.0..."
print(f"Errors: {result.stderr}")             # Any error messages
print(f"Exit code: {result.exit_code}")       # 0 for success
print(f"Time: {result.execution_time:.3f}s")  # Execution duration

# Convert to dictionary for JSON serialization
result_dict = result.to_dict()

🛡️ Security Features

BioCage is designed with security as a first-class concern:

Container Isolation: Complete separation from host system
No Network Access: Internet disabled by default during execution
Resource Limits: Memory, CPU, and execution time controls
Read-Only Filesystem: Prevents unauthorized file modifications
Timeout Controls: Automatic termination of long-running processes
Privilege Restrictions: Containers run with minimal privileges

Security Example

# Maximum security configuration
with BioCageOrchestrator(
    execution_mode="ephemeral",  # Fresh container each time
    memory_limit="256m",         # Limited memory
    cpu_limit="0.5",            # Half CPU core
    network_access=False        # No network access
) as sandbox:

    # This code runs in complete isolation
    result = sandbox.run("""
    try:
        import urllib.request
        urllib.request.urlopen('https://example.com')
    except Exception as e:
        print(f"Network blocked: {type(e).__name__}")
    """)
    print(result.stdout)  # Output: Network blocked: URLError

🎯 Use Cases

AI/LLM Integration

Code Generation: Safely execute AI-generated Python code
Interactive Assistants: Build chatbots that can run and debug code
Automated Testing: Validate generated code snippets

Education & Training

Online Learning: Secure code execution for student submissions
Coding Challenges: Isolated environment for competitive programming
Tutorial Platforms: Interactive Python learning experiences

Research & Development

Experiment Automation: Reproducible research environments
Data Processing: Secure analysis of sensitive datasets
Algorithm Testing: Isolated testing of new algorithms

Development Workflows

CI/CD Pipelines: Safe testing of code changes
Code Review: Automated validation of pull requests
Prototyping: Quick testing of code concepts

📚 Documentation

Comprehensive documentation is available:

Getting Started - Installation, prerequisites, and first steps
User Guide - Core concepts, execution modes, and usage patterns
API Reference - Complete API documentation
Advanced Guide - Performance optimization and complex workflows
Examples - Practical examples and use cases
Security Guide - Security model and best practices
Troubleshooting - Common issues and solutions

🚀 Examples

The examples/ directory contains working examples:

# Run basic examples
python examples/basic/simple_demo.py
python examples/basic/persistent_mode.py
python examples/basic/ephemeral_mode.py

# Run API examples
python examples/api/orchestrator_examples.py
python examples/api/execution_result_examples.py

# Run advanced examples
python examples/advanced/container_pool.py

Each example demonstrates real-world usage patterns and includes detailed comments.

🐳 Docker Usage

You can also use BioCage directly with Docker:

# Execute code via stdin
echo 'print("Hello from BioCage!")' | docker run --rm -i biocage:latest

# Execute via environment variable
docker run --rm -e PYTHON_CODE="import numpy as np; print(np.__version__)" biocage:latest

# With resource limits
docker run --rm -i --memory="1g" --cpus="2.0" biocage:latest

🏗️ Building Custom Images

Customize the Docker environment for your specific needs:

Edit dependencies in python_docker/pyproject.toml:

[project]
dependencies = [
    "numpy>=1.24.0",
    "pandas>=2.0.0",
    "scikit-learn>=1.3.0",
    "your-custom-package>=1.0.0",
]

Generate requirements and build:

cd python_docker
uv pip compile pyproject.toml -o requirements.txt
./build.sh

🔗 Links

GitHub Repository: https://github.com/biocypher/biocage
Documentation: Available in the docs/ directory
Issues & Support: https://github.com/biocypher/biocage/issues

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on how to get started.

BioCage: Safe, stateful, and powerful Python execution for the AI era.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
biocage		biocage
docs		docs
examples		examples
notebooks		notebooks
python_docker		python_docker
tests		tests
.DS_Store		.DS_Store
.coverage		.coverage
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yaml		codecov.yaml
coverage.svg		coverage.svg
coverage.xml		coverage.xml
debug_smart_dependencies.py		debug_smart_dependencies.py
logo.png		logo.png
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
test_orchestrator_integration.py		test_orchestrator_integration.py
test_smart_dependencies.py		test_smart_dependencies.py
tox.ini		tox.ini
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

BioCage

🎯 Why BioCage?

✨ Key Features

🔒 Security First

🧠 Intelligent State Management

📁 Advanced File System Integration

⚡ Performance Optimized

🧠 Smart Dependency Management

🚀 Quick Start

Installation

Basic Usage

State Persistence Example

File and Directory Integration

Execution Modes

🔧 Advanced Features

Resource Management

Error Handling and Debugging

Smart Dependencies and Auto-Cleanup

Container Restart with State Recovery

📋 Execution Results

🛡️ Security Features

Security Example

🎯 Use Cases

AI/LLM Integration

Education & Training

Research & Development

Development Workflows

📚 Documentation

🚀 Examples

🐳 Docker Usage

🏗️ Building Custom Images

🔗 Links

📄 License

🤝 Contributing

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages