Skip to content

biocypher/biocage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BioCage

BioCage Logo

Release Build status Coverage Commit activity License

BioCage is a fast, secure, and practical Python sandbox designed specifically for safely executing code generated by Large Language Models (LLMs). It provides a robust containerized environment that isolates code execution while maintaining state persistence and file system integration.

🎯 Why BioCage?

BioCage addresses critical needs in the AI era:

  • πŸ”’ Security: Complete isolation through Docker with no network access and restricted file system
  • 🧠 State Persistence: Variables, imports, and functions persist across multiple executions
  • πŸ“ File Integration: Seamlessly expose files and directories to the sandbox
  • ⚑ Performance: Fast startup times with optimized container management
  • πŸ›‘οΈ Reliability: Comprehensive error handling with detailed diagnostics
  • πŸ”§ Flexibility: Support for both ephemeral and persistent execution modes

✨ Key Features

πŸ”’ Security First

  • Complete Docker containerization with no host system access
  • Network isolation (disabled by default)
  • Resource limits (memory, CPU, execution time)
  • Read-only filesystem with controlled write access
  • Safe execution of potentially malicious AI-generated code

🧠 Intelligent State Management

  • Variable persistence across executions
  • Import persistence - no need to re-import libraries
  • Function and class definitions remain available
  • DataFrame modifications persist between runs
  • Session-based workflows for complex data processing

πŸ“ Advanced File System Integration

  • File exposure: Mount individual files from host to container
  • Directory exposure: Mount directories with read-only or read-write access
  • Temporary file creation: Create files accessible within the sandbox
  • Automatic cleanup: Files and mounts cleaned up after execution

⚑ Performance Optimized

  • Pre-built images with common data science libraries
  • Fast container startup and execution
  • Efficient resource usage with proper limits
  • Minimal overhead for code execution

🧠 Smart Dependency Management

  • Automatic dependency detection from Python code
  • Dynamic Docker image generation with UV package manager
  • Intelligent image caching for performance optimization
  • Auto-cleanup of dynamic images (enabled by default) to save disk space
  • Manual cleanup control for advanced use cases

πŸš€ Quick Start

Installation

pip install biocage

Basic Usage

from biocage import BioCageOrchestrator

# Simple one-time execution
with BioCageOrchestrator() as sandbox:
    result = sandbox.run('print("Hello from BioCage!")')
    print(result.stdout)  # "Hello from BioCage!"

State Persistence Example

# Variables and imports persist across executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up your environment once
    sandbox.run("""
    import pandas as pd
    import numpy as np

    # Define helper functions
    def analyze_data(data):
        return {
            'mean': np.mean(data),
            'std': np.std(data),
            'count': len(data)
        }

    # Initialize data
    df = pd.DataFrame({
        'values': np.random.randn(1000),
        'category': np.random.choice(['A', 'B', 'C'], 1000)
    })
    """)

    # Use the environment in subsequent executions
    result = sandbox.run("""
    # Everything from previous execution is still available
    stats = analyze_data(df['values'])
    print(f"Data statistics: {stats}")

    # Modify the DataFrame
    df['squared'] = df['values'] ** 2
    print(f"DataFrame shape: {df.shape}")
    """)

    print(result.stdout)

File and Directory Integration

# Expose files and directories to the sandbox
with BioCageOrchestrator() as sandbox:
    # Expose individual files
    sandbox.expose_file("/path/to/data.csv", "/app/data.csv")
    sandbox.expose_file("/path/to/config.json", "/app/config.json")

    # Expose entire directories
    sandbox.expose_directory("/path/to/models", "/app/models", readonly=True)

    result = sandbox.run("""
    import pandas as pd
    import json
    import os

    # Read exposed files
    df = pd.read_csv('/app/data.csv')
    with open('/app/config.json', 'r') as f:
        config = json.load(f)

    # List available models
    models = os.listdir('/app/models')
    print(f"Available models: {models}")

    print(f"Processed {len(df)} rows successfully")
    """)

    print(result.stdout)

Execution Modes

# Ephemeral mode - maximum security, no state persistence
with BioCageOrchestrator(execution_mode="ephemeral") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # NameError: x not defined
    print("Ephemeral mode: variables don't persist")

# Persistent mode - state persists between executions
with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    sandbox.run("x = 42")
    result = sandbox.run("print(x)")  # Output: 42
    print("Persistent mode: variables persist")

# Auto mode - intelligent selection based on usage
with BioCageOrchestrator(execution_mode="auto") as sandbox:
    result = sandbox.run("print('Auto mode adapts to your needs')")

πŸ”§ Advanced Features

Resource Management

# Configure container resources
with BioCageOrchestrator(
    memory_limit="2g",        # 2GB memory limit
    cpu_limit="2.0",          # 2 CPU cores
    network_access=False      # Disabled by default for security
) as sandbox:

    # Monitor container information
    info = sandbox.get_container_info()
    print(f"Container ID: {info['container_id'][:12]}")
    print(f"Memory limit: {info['memory_limit']}")
    print(f"CPU limit: {info['cpu_limit']}")

    result = sandbox.run("import numpy as np; print('Resources configured!')")

Error Handling and Debugging

def safe_execute(code, description=""):
    """Execute code with comprehensive error handling."""
    with BioCageOrchestrator() as sandbox:
        result = sandbox.run(code, timeout=30)

        if result.success:
            return f"βœ… {description}: {result.stdout.strip()}"
        else:
            return f"❌ {description} failed: {result.stderr.strip()}"

# Test different scenarios
print(safe_execute("print('Hello World!')", "Basic execution"))
print(safe_execute("print(undefined_var)", "Variable error"))
print(safe_execute("import nonexistent_module", "Import error"))

Smart Dependencies and Auto-Cleanup

# Automatic dependency detection with auto-cleanup (default)
with BioCageOrchestrator(auto_detect_dependencies=True) as sandbox:
    # BioCage automatically detects pandas and numpy
    result = sandbox.run("""
import pandas as pd
import numpy as np

df = pd.DataFrame({'x': np.random.rand(5)})
print(f"Created DataFrame with shape: {df.shape}")
""")
    print(result.stdout)
    # Dynamic image automatically removed when context exits

# Disable auto-cleanup to cache images for reuse
with BioCageOrchestrator(
    auto_detect_dependencies=True,
    auto_cleanup_dynamic_images=False
) as sandbox:
    # First execution builds image with matplotlib
    result1 = sandbox.run("""
import matplotlib.pyplot as plt
print("Matplotlib imported successfully")
""")
    
    # Subsequent executions reuse the same image
    result2 = sandbox.run("""
# Matplotlib is still available
plt.figure()
plt.plot([1, 2, 3, 4])
print("Plot created successfully")
""")
    
    # Images preserved for future sessions

Container Restart with State Recovery

with BioCageOrchestrator(execution_mode="persistent") as sandbox:
    # Set up initial state
    sandbox.run("x = 100; import math")

    # Get container info
    info = sandbox.get_container_info()
    old_container = info['container_id']

    # Restart container (state is automatically restored)
    new_container = sandbox.restart_container()

    # State persists after restart
    result = sandbox.run("print(f'x = {x}, sqrt(x) = {math.sqrt(x)}')")
    print(result.stdout)  # Output: x = 100, sqrt(x) = 10.0

πŸ“‹ Execution Results

All executions return a comprehensive SandboxExecutionResult object:

result = sandbox.run('print("Hello"); import sys; print(sys.version)')

print(f"Success: {result.success}")           # True/False
print(f"Output: {result.stdout}")             # "Hello\n3.11.0..."
print(f"Errors: {result.stderr}")             # Any error messages
print(f"Exit code: {result.exit_code}")       # 0 for success
print(f"Time: {result.execution_time:.3f}s")  # Execution duration

# Convert to dictionary for JSON serialization
result_dict = result.to_dict()

πŸ›‘οΈ Security Features

BioCage is designed with security as a first-class concern:

  • Container Isolation: Complete separation from host system
  • No Network Access: Internet disabled by default during execution
  • Resource Limits: Memory, CPU, and execution time controls
  • Read-Only Filesystem: Prevents unauthorized file modifications
  • Timeout Controls: Automatic termination of long-running processes
  • Privilege Restrictions: Containers run with minimal privileges

Security Example

# Maximum security configuration
with BioCageOrchestrator(
    execution_mode="ephemeral",  # Fresh container each time
    memory_limit="256m",         # Limited memory
    cpu_limit="0.5",            # Half CPU core
    network_access=False        # No network access
) as sandbox:

    # This code runs in complete isolation
    result = sandbox.run("""
    try:
        import urllib.request
        urllib.request.urlopen('https://example.com')
    except Exception as e:
        print(f"Network blocked: {type(e).__name__}")
    """)
    print(result.stdout)  # Output: Network blocked: URLError

🎯 Use Cases

AI/LLM Integration

  • Code Generation: Safely execute AI-generated Python code
  • Interactive Assistants: Build chatbots that can run and debug code
  • Automated Testing: Validate generated code snippets

Education & Training

  • Online Learning: Secure code execution for student submissions
  • Coding Challenges: Isolated environment for competitive programming
  • Tutorial Platforms: Interactive Python learning experiences

Research & Development

  • Experiment Automation: Reproducible research environments
  • Data Processing: Secure analysis of sensitive datasets
  • Algorithm Testing: Isolated testing of new algorithms

Development Workflows

  • CI/CD Pipelines: Safe testing of code changes
  • Code Review: Automated validation of pull requests
  • Prototyping: Quick testing of code concepts

πŸ“š Documentation

Comprehensive documentation is available:

πŸš€ Examples

The examples/ directory contains working examples:

# Run basic examples
python examples/basic/simple_demo.py
python examples/basic/persistent_mode.py
python examples/basic/ephemeral_mode.py

# Run API examples
python examples/api/orchestrator_examples.py
python examples/api/execution_result_examples.py

# Run advanced examples
python examples/advanced/container_pool.py

Each example demonstrates real-world usage patterns and includes detailed comments.

🐳 Docker Usage

You can also use BioCage directly with Docker:

# Execute code via stdin
echo 'print("Hello from BioCage!")' | docker run --rm -i biocage:latest

# Execute via environment variable
docker run --rm -e PYTHON_CODE="import numpy as np; print(np.__version__)" biocage:latest

# With resource limits
docker run --rm -i --memory="1g" --cpus="2.0" biocage:latest

πŸ—οΈ Building Custom Images

Customize the Docker environment for your specific needs:

  1. Edit dependencies in python_docker/pyproject.toml:
[project]
dependencies = [
    "numpy>=1.24.0",
    "pandas>=2.0.0",
    "scikit-learn>=1.3.0",
    "your-custom-package>=1.0.0",
]
  1. Generate requirements and build:
cd python_docker
uv pip compile pyproject.toml -o requirements.txt
./build.sh

πŸ”— Links

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details on how to get started.


BioCage: Safe, stateful, and powerful Python execution for the AI era.

About

Fast, secure, and practical Python sandbox designed specifically for safely executing code generated by LLMs

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors