Skip to content

Latest commit

 

History

History
425 lines (326 loc) · 13.7 KB

File metadata and controls

425 lines (326 loc) · 13.7 KB

BioCage

BioCage Logo

Release Build status Coverage Commit activity License

BioCage is a fast, secure, and practical Python sandbox designed specifically for safely executing code generated by Large Language Models (LLMs). It provides a robust containerized environment that isolates code execution while maintaining state persistence and file system integration.

🎯 Why BioCage?

BioCage addresses critical needs in the AI era:

  • 🔒 Security: Complete isolation through Docker with no network access and restricted file system
  • 🧠 State Persistence: Variables, imports, and functions persist across multiple executions
  • 📁 File Integration: Seamlessly expose files and directories to the sandbox
  • ⚡ Performance: Fast startup times with optimized container management
  • 🛡️ Reliability: Comprehensive error handling with detailed diagnostics
  • 🔧 Flexibility: Support for both ephemeral and persistent execution modes

✨ Key Features

🔒 Security First

  • Complete Docker containerization with no host system access
  • Network isolation (disabled by default)
  • Resource limits (memory, CPU, execution time)
  • Read-only filesystem with controlled write access
  • Safe execution of potentially malicious AI-generated code

🧠 Intelligent State Management

  • Variable persistence across executions
  • Import persistence - no need to re-import libraries
  • Function and class definitions remain available
  • DataFrame modifications persist between runs
  • Session-based workflows for complex data processing

📁 Advanced File System Integration

  • File exposure: Mount individual files from host to container
  • Directory exposure: Mount directories with read-only or read-write access
  • Temporary file creation: Create files accessible within the sandbox
  • Automatic cleanup: Files and mounts cleaned up after execution

⚡ Performance Optimized

  • Pre-built images with common data science libraries
  • Fast container startup and execution
  • Efficient resource usage with proper limits
  • Minimal overhead for code execution

🧠 Smart Dependency Management

  • Automatic dependency detection from Python code
  • Dynamic Docker image generation with UV package manager
  • Intelligent image caching for performance optimization
  • Auto-cleanup of dynamic images (enabled by default) to save disk space
  • Manual cleanup control for advanced use cases

🚀 Quick Start

Installation

pip install biocage

Basic Usage

from biocage import BioCageOrchestrator

# Simple one-time execution
with BioCageOrchestrator() as sandbox:
    result = sandbox.run('print("Hello from BioCage!")')
    print(result.stdout)  # "Hello from BioCage!"

State Persistence Example

# Variables and imports persist across executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up your environment once
    sandbox.run("""
    import pandas as pd
    import numpy as np

    # Define helper functions
    def analyze_data(data):
        return {
            'mean': np.mean(data),
            'std': np.std(data),
            'count': len(data)
        }

    # Initialize data
    df = pd.DataFrame({
        'values': np.random.randn(1000),
        'category': np.random.choice(['A', 'B', 'C'], 1000)
    })
    """)

    # Use the environment in subsequent executions
    result = sandbox.run("""
    # Everything from previous execution is still available
    stats = analyze_data(df['values'])
    print(f"Data statistics: {stats}")

    # Modify the DataFrame
    df['squared'] = df['values'] ** 2
    print(f"DataFrame shape: {df.shape}")
    """)

    print(result.stdout)

File and Directory Integration

# Expose files and directories to the sandbox
with BioCageOrchestrator() as sandbox:
    # Expose individual files
    sandbox.expose_file("/path/to/data.csv", "/app/data.csv")
    sandbox.expose_file("/path/to/config.json", "/app/config.json")

    # Expose entire directories
    sandbox.expose_directory("/path/to/models", "/app/models", readonly=True)

    result = sandbox.run("""
    import pandas as pd
    import json
    import os

    # Read exposed files
    df = pd.read_csv('/app/data.csv')
    with open('/app/config.json', 'r') as f:
        config = json.load(f)

    # List available models
    models = os.listdir('/app/models')
    print(f"Available models: {models}")

    print(f"Processed {len(df)} rows successfully")
    """)

    print(result.stdout)

Execution Modes

# Ephemeral mode - maximum security, no state persistence
with BioCageOrchestrator(execution_mode="ephemeral") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # NameError: x not defined
    print("Ephemeral mode: variables don't persist")

# Persistent mode - state persists between executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # Output: 42
    print("Persistent mode: variables persist")

# Auto mode - intelligent selection based on usage
with BioCageOrchestrator(execution_mode="auto") as sandbox:
    result = sandbox.run("print('Auto mode adapts to your needs')")

🔧 Advanced Features

Resource Management

# Configure container resources
with BioCageOrchestrator(
    memory_limit="2g",        # 2GB memory limit
    cpu_limit="2.0",          # 2 CPU cores
    network_access=False      # Disabled by default for security
) as sandbox:

    # Monitor container information
    info = sandbox.get_container_info()
    print(f"Container ID: {info['container_id'][:12]}")
    print(f"Memory limit: {info['memory_limit']}")
    print(f"CPU limit: {info['cpu_limit']}")

    result = sandbox.run("import numpy as np; print('Resources configured!')")

Error Handling and Debugging

def safe_execute(code, description=""):
    """Execute code with comprehensive error handling."""
    with BioCageOrchestrator() as sandbox:
        result = sandbox.run(code, timeout=30)

        if result.success:
            return f"✅ {description}: {result.stdout.strip()}"
        else:
            return f"❌ {description} failed: {result.stderr.strip()}"

# Test different scenarios
print(safe_execute("print('Hello World!')", "Basic execution"))
print(safe_execute("print(undefined_var)", "Variable error"))
print(safe_execute("import nonexistent_module", "Import error"))

Smart Dependencies and Auto-Cleanup

# Automatic dependency detection with auto-cleanup (default)
with BioCageOrchestrator(auto_detect_dependencies=True) as sandbox:
    # BioCage automatically detects pandas and numpy
    result = sandbox.run("""
import pandas as pd
import numpy as np

df = pd.DataFrame({'x': np.random.rand(5)})
print(f"Created DataFrame with shape: {df.shape}")
""")
    print(result.stdout)
    # Dynamic image automatically removed when context exits

# Disable auto-cleanup to cache images for reuse
with BioCageOrchestrator(
    auto_detect_dependencies=True,
    auto_cleanup_dynamic_images=False
) as sandbox:
    # First execution builds image with matplotlib
    result1 = sandbox.run("""
import matplotlib.pyplot as plt
print("Matplotlib imported successfully")
""")
    
    # Subsequent executions reuse the same image
    result2 = sandbox.run("""
# Matplotlib is still available
plt.figure()
plt.plot([1, 2, 3, 4])
print("Plot created successfully")
""")
    
    # Images preserved for future sessions

Container Restart with State Recovery

with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up initial state
    sandbox.run("x = 100; import math")

    # Get container info
    info = sandbox.get_container_info()
    old_container = info['container_id']

    # Restart container (state is automatically restored)
    new_container = sandbox.restart_container()

    # State persists after restart
    result = sandbox.run("print(f'x = {x}, sqrt(x) = {math.sqrt(x)}')")
    print(result.stdout)  # Output: x = 100, sqrt(x) = 10.0

📋 Execution Results

All executions return a comprehensive SandboxExecutionResult object:

result = sandbox.run('print("Hello"); import sys; print(sys.version)')

print(f"Success: {result.success}")           # True/False
print(f"Output: {result.stdout}")             # "Hello\n3.11.0..."
print(f"Errors: {result.stderr}")             # Any error messages
print(f"Exit code: {result.exit_code}")       # 0 for success
print(f"Time: {result.execution_time:.3f}s")  # Execution duration

# Convert to dictionary for JSON serialization
result_dict = result.to_dict()

🛡️ Security Features

BioCage is designed with security as a first-class concern:

  • Container Isolation: Complete separation from host system
  • No Network Access: Internet disabled by default during execution
  • Resource Limits: Memory, CPU, and execution time controls
  • Read-Only Filesystem: Prevents unauthorized file modifications
  • Timeout Controls: Automatic termination of long-running processes
  • Privilege Restrictions: Containers run with minimal privileges

Security Example

# Maximum security configuration
with BioCageOrchestrator(
    execution_mode="ephemeral",  # Fresh container each time
    memory_limit="256m",         # Limited memory
    cpu_limit="0.5",            # Half CPU core
    network_access=False        # No network access
) as sandbox:

    # This code runs in complete isolation
    result = sandbox.run("""
    try:
        import urllib.request
        urllib.request.urlopen('https://example.com')
    except Exception as e:
        print(f"Network blocked: {type(e).__name__}")
    """)
    print(result.stdout)  # Output: Network blocked: URLError

🎯 Use Cases

AI/LLM Integration

  • Code Generation: Safely execute AI-generated Python code
  • Interactive Assistants: Build chatbots that can run and debug code
  • Automated Testing: Validate generated code snippets

Education & Training

  • Online Learning: Secure code execution for student submissions
  • Coding Challenges: Isolated environment for competitive programming
  • Tutorial Platforms: Interactive Python learning experiences

Research & Development

  • Experiment Automation: Reproducible research environments
  • Data Processing: Secure analysis of sensitive datasets
  • Algorithm Testing: Isolated testing of new algorithms

Development Workflows

  • CI/CD Pipelines: Safe testing of code changes
  • Code Review: Automated validation of pull requests
  • Prototyping: Quick testing of code concepts

📚 Documentation

Comprehensive documentation is available:

🚀 Examples

The examples/ directory contains working examples:

# Run basic examples
python examples/basic/simple_demo.py
python examples/basic/persistent_mode.py
python examples/basic/ephemeral_mode.py

# Run API examples
python examples/api/orchestrator_examples.py
python examples/api/execution_result_examples.py

# Run advanced examples
python examples/advanced/container_pool.py

Each example demonstrates real-world usage patterns and includes detailed comments.

🐳 Docker Usage

You can also use BioCage directly with Docker:

# Execute code via stdin
echo 'print("Hello from BioCage!")' | docker run --rm -i biocage:latest

# Execute via environment variable
docker run --rm -e PYTHON_CODE="import numpy as np; print(np.__version__)" biocage:latest

# With resource limits
docker run --rm -i --memory="1g" --cpus="2.0" biocage:latest

🏗️ Building Custom Images

Customize the Docker environment for your specific needs:

  1. Edit dependencies in python_docker/pyproject.toml:
[project]
dependencies = [
    "numpy>=1.24.0",
    "pandas>=2.0.0",
    "scikit-learn>=1.3.0",
    "your-custom-package>=1.0.0",
]
  1. Generate requirements and build:
cd python_docker
uv pip compile pyproject.toml -o requirements.txt
./build.sh

🔗 Links

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on how to get started.


BioCage: Safe, stateful, and powerful Python execution for the AI era.