Skip to content

Latest commit

 

History

History
303 lines (229 loc) · 7.62 KB

File metadata and controls

303 lines (229 loc) · 7.62 KB

Getting Started with BioCage

BioCage is a secure Python sandbox designed for safely executing code generated by Large Language Models (LLMs) or any untrusted Python code. This guide will help you get started quickly.

Prerequisites

Docker Installation

BioCage requires Docker to be installed and running on your system.

macOS:

# Install Docker Desktop for Mac
# Download from: https://docs.docker.com/desktop/mac/install/

Linux:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install docker.io docker-compose

# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker

# Add your user to docker group (optional, to avoid sudo)
sudo usermod -aG docker $USER

Windows:

# Install Docker Desktop for Windows
# Download from: https://docs.docker.com/desktop/windows/install/

Verify Docker Installation

docker --version
docker info

Installation

Install BioCage using pip:

pip install biocage

Or for development:

git clone https://github.com/biocypher/biocage
cd biocage
pip install -e .

Your First BioCage Script

Let's start with a simple "Hello World" example:

from biocage import BioCageOrchestrator

# Create a sandbox instance
with BioCageOrchestrator() as sandbox:
    # Execute Python code safely
    result = sandbox.run("print('Hello, BioCage!')")

    # Check if execution was successful
    if result.success:
        print(f"Output: {result.stdout}")
    else:
        print(f"Error: {result.stderr}")

Expected Output:

Output: Hello, BioCage!

💡 Tip: The context manager (with statement) automatically handles container lifecycle, ensuring proper cleanup.

Understanding the Basics

1. The BioCageOrchestrator

The BioCageOrchestrator is your main interface to BioCage. It manages Docker containers, executes code, and handles resources.

from biocage import BioCageOrchestrator

# Default configuration
sandbox = BioCageOrchestrator()

# Custom configuration
sandbox = BioCageOrchestrator(
    memory_limit="1g",      # 1GB memory limit
    cpu_limit="2.0",        # 2 CPU cores
    execution_mode="persistent"  # Keep state between runs
)

2. Execution Results

Every code execution returns a SandboxExecutionResult object:

result = sandbox.run("x = 42; print(x)")

print(f"Success: {result.success}")      # True if no errors
print(f"Exit code: {result.exit_code}")  # 0 for success
print(f"Output: {result.stdout}")        # Standard output
print(f"Errors: {result.stderr}")        # Error messages
print(f"Time: {result.execution_time}")  # Execution time in seconds

3. Execution Modes

BioCage supports two execution modes:

Ephemeral Mode (default for simple operations):

  • Fresh container for each execution
  • No state persistence
  • Maximum isolation
  • Ideal for one-off executions

Persistent Mode (default for context managers):

  • Same container across executions
  • Variables and imports persist
  • Better performance for multiple executions
  • Ideal for interactive workflows
# Ephemeral mode - no state persistence
sandbox = BioCageOrchestrator(execution_mode="ephemeral")
sandbox.run("x = 42")
result = sandbox.run("print(x)")  # Error: x is not defined

# Persistent mode - state persists
sandbox = BioCageOrchestrator(execution_mode="persistent")
sandbox.run("x = 42")
result = sandbox.run("print(x)")  # Output: 42

Common Usage Patterns

1. Context Manager (Recommended)

Always use the context manager for automatic resource cleanup:

with BioCageOrchestrator() as sandbox:
    result1 = sandbox.run("import numpy as np")
    result2 = sandbox.run("arr = np.array([1, 2, 3])")
    result3 = sandbox.run("print(arr.mean())")
# Container automatically cleaned up

2. Manual Lifecycle Management

For more control over container lifecycle:

sandbox = BioCageOrchestrator()
try:
    sandbox.start_container()

    result1 = sandbox.run("x = 10")
    result2 = sandbox.run("y = 20")
    result3 = sandbox.run("print(x + y)")

finally:
    sandbox.cleanup()  # Always cleanup

3. Error Handling

Handle execution errors gracefully:

with BioCageOrchestrator() as sandbox:
    result = sandbox.run("undefined_variable")

    if not result.success:
        print(f"Error occurred: {result.stderr}")
        print(f"Exit code: {result.exit_code}")
        # Handle error appropriately
    else:
        print(f"Success: {result.stdout}")

Security Features

BioCage provides several layers of security:

  • Container Isolation: Code runs in isolated Docker containers
  • No Network Access: Network is disabled by default (configurable)
  • Resource Limits: Memory and CPU usage is controlled
  • Read-only Filesystem: Root filesystem is read-only
  • Execution Timeouts: Prevents infinite loops
# Configure security settings
with BioCageOrchestrator(
    memory_limit="512m",     # Limit memory to 512MB
    cpu_limit="1.0",         # Limit to 1 CPU core
    network_access=False     # Disable network (default)
) as sandbox:
    # This code is safely isolated
    result = sandbox.run("potentially_dangerous_code()")

What's Next?

Now that you understand the basics, explore these guides:

Quick Examples

Here are some common tasks to get you started:

Data Analysis

with BioCageOrchestrator() as sandbox:
    # Create sample data
    sandbox.run("""
    import pandas as pd
    data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35]}
    df = pd.DataFrame(data)
    """)

    # Analyze data
    result = sandbox.run("print(df.describe())")
    print(result.stdout)

Working with Files

with BioCageOrchestrator() as sandbox:
    # Expose a file to the sandbox
    sandbox.expose_file("/path/to/data.csv", "/app/data.csv")

    # Process the file
    result = sandbox.run("""
    import pandas as pd
    df = pd.read_csv('/app/data.csv')
    print(f"Dataset has {len(df)} rows")
    """)
    print(result.stdout)

Error Recovery

with BioCageOrchestrator() as sandbox:
    # Try potentially failing code
    result = sandbox.run("risky_operation()")

    if not result.success:
        # Try alternative approach
        result = sandbox.run("safe_alternative()")

Smart Dependency Detection

BioCage can automatically detect third-party dependencies and build custom Docker images:

# Enable automatic dependency detection
with BioCageOrchestrator(auto_detect_dependencies=True) as sandbox:
    # BioCage detects pandas, numpy, matplotlib and builds image
    result = sandbox.run("""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create and analyze data
data = pd.DataFrame({
    'x': np.random.randn(100),
    'y': np.random.randn(100)
})
print(f"Generated {len(data)} data points")
""")
    print(result.stdout)

Features:

  • 🎯 Automatic import detection from Python code
  • 🐳 Dynamic Docker image generation with UV package manager
  • ⚡ Intelligent caching for performance
  • 📊 Support for data science libraries (pandas, numpy, matplotlib, etc.)
  • 💾 Compatible with both persistent and ephemeral modes

💡 Tip: Smart dependency detection eliminates manual Docker image management while maintaining security and performance.

Happy coding with BioCage! 🚀