Sandbox: A Petri Dish for Code Evolution
This guide describes the vision and roadmap for using Sandbox as the runtime substrate for evolving BEAM systems. It covers both current capabilities and planned features.
- Vision
- What Evolution Substrate Means
- Current Capabilities
- Planned Features
- Integration Patterns
- Example: Conceptual Evolution Cycle
- Roadmap
- Contributing
Sandbox provides a foundational capability that no other mainstream runtime offers: hot-reload code into supervised, isolated processes without restart. This is the primitive on which genetic programming of live systems becomes possible.
"Your
sandboxlibrary isn't a plugin system. It's a Petri dish for code evolution."
Traditional plugin systems load code once and run it. Sandbox goes further: it creates isolated execution contexts where code can be continuously loaded, evaluated, replaced, and garbage collected. This makes it uniquely suited for:
- Genetic Programming: Evolve functions through mutation and selection
- LLM-Driven Code Generation: Safely evaluate AI-generated code variants
- Self-Modifying Systems: Build systems that improve themselves at runtime
- Experimental Workloads: Test code variations without affecting production
The BEAM VM provides unique properties that make evolution workloads practical:
- Hot Code Loading: Replace running code without process restart
- Process Isolation: Failures in one process cannot corrupt another's state
- Supervision Trees: Automatic recovery from crashes
- Lightweight Processes: Spawn thousands of individuals efficiently
- Message Passing: Clean communication between isolated components
Sandbox builds on these primitives to create a managed environment for code that is expected to fail, crash, and eventually improve.
An evolution substrate must support the full lifecycle of evolved code. Sandbox addresses seven core requirements:
Create isolated execution contexts for code variants. Each sandbox represents one "individual" in an evolutionary population.
# Current capability
{:ok, _} = Sandbox.create_sandbox("evo-gen1-001", MySupervisor)
{:ok, _} = Sandbox.create_sandbox("evo-gen1-002", MySupervisor)Status: Available today. Sandbox creation is fast (<50ms) and reliable.
Inject mutated code into running sandboxes without restart. This is the core evolution primitive.
# Load BEAM binary
{:ok, :hot_reloaded} = Sandbox.hot_reload("evo-gen1-001", beam_binary)
# Load source code directly
{:ok, :hot_reloaded} = Sandbox.hot_reload_source("evo-gen1-001", """
defmodule Candidate do
def fitness_target(x), do: x * x + 2 * x - 1
end
""")Status: Available today. Version management tracks up to 10 versions per module with rollback support.
Run tests, benchmarks, or fitness functions safely within the sandbox context.
# Execute with timeout
{:ok, result} = Sandbox.run("evo-gen1-001", fn ->
Candidate.fitness_target(42)
end, timeout: 5_000)Status: Available today. The run/3 function provides bounded execution with timeout enforcement.
One bad mutation cannot crash the host. This is fundamental for evolution where most variants will fail.
# Crashes are contained
{:error, {:crashed, _reason}} = Sandbox.run("evo-001", fn ->
raise "Evolved code crashed"
end)
# Host continues running
{:ok, _} = Sandbox.get_sandbox_info("evo-001") # Still accessibleStatus: Partial. Process isolation via supervision trees works. Resource enforcement (memory limits, runaway processes) is planned.
Know which version came from where. Essential for understanding what mutations improved fitness.
# Version management provides lineage
{:ok, version} = Sandbox.get_module_version("evo-001", Candidate)
history = Sandbox.get_version_history("evo-001", Candidate)
# => %{current_version: 3, total_versions: 3, versions: [...]}Status: Available for module versions. Generation and parent tracking (which sandbox spawned from which) is planned.
Clean up dead individuals efficiently. Evolution produces many short-lived variants.
# Manual cleanup today
:ok = Sandbox.destroy_sandbox("evo-gen1-001")
# Batch operations for populations
results = Sandbox.batch_destroy(["evo-001", "evo-002", "evo-003"])Status: Manual destruction works. Automatic garbage collection of abandoned sandboxes is planned.
Distribute populations across BEAM nodes for larger experiments.
Status: Planned. Each node can run its own Sandbox application. Coordination via external registries (Horde, pg) is the target architecture.
Sandbox provides solid foundations for evolution workloads today:
| Operation | API | Performance |
|---|---|---|
| Create sandbox | Sandbox.create_sandbox/3 |
<50ms |
| Destroy sandbox | Sandbox.destroy_sandbox/1 |
<20ms |
| Restart sandbox | Sandbox.restart_sandbox/1 |
<100ms |
| Get info | Sandbox.get_sandbox_info/1 |
<1ms |
| List all | Sandbox.list_sandboxes/0 |
<10ms |
| Operation | API | Notes |
|---|---|---|
| Reload BEAM | Sandbox.hot_reload/3 |
Binary bytecode |
| Reload source | Sandbox.hot_reload_source/3 |
Elixir source string |
| Get version | Sandbox.get_module_version/2 |
Current version number |
| List versions | Sandbox.list_module_versions/2 |
All tracked versions |
| Rollback | Sandbox.rollback_module/3 |
Return to previous version |
| Version history | Sandbox.get_version_history/2 |
Statistics and metadata |
| Operation | API | Notes |
|---|---|---|
| Run function | Sandbox.run/3 |
Zero-arity function |
| With timeout | timeout: ms option |
Kills on exceed |
| Operation | API | Notes |
|---|---|---|
| Batch create | Sandbox.batch_create/2 |
Parallel sandbox creation |
| Batch destroy | Sandbox.batch_destroy/2 |
Parallel cleanup |
| Batch run | Sandbox.batch_run/3 |
Parallel evaluation |
| Batch reload | Sandbox.batch_hot_reload/3 |
Parallel code loading |
Batch operations use Task.async_stream with configurable concurrency:
# Create a population
configs = for i <- 1..100 do
{"evo-#{i}", MySupervisor, []}
end
results = Sandbox.batch_create(configs, max_concurrency: 10)Sandbox automatically namespaces modules to prevent collisions:
# Original module name
MyModule
# Transformed to (example)
Sandbox_evo001_MyModuleThis allows multiple sandboxes to load different versions of the same module name.
Hot reload can preserve and migrate process state:
{:ok, :hot_reloaded} = Sandbox.hot_reload("evo-001", beam_data,
state_handler: fn old_state, old_version, new_version ->
# Migrate state between versions
%{old_state | schema_version: new_version}
end
)Track resource usage across sandboxes:
{:ok, usage} = Sandbox.resource_usage("evo-001")
# => %{memory: 1234567, process_count: 5, ...}Note: Resource tracking is available; enforcement of limits is planned.
The following features are on the roadmap to complete the evolution substrate:
Priority: High
- Execution timeout enforcement: Ensure
max_execution_timelimits are enforced in all code paths - Process isolation hardening: Apply spawn options consistently to all sandbox processes
- Module loading consistency: Reliable path from transformed source to executing code
Priority: High
-
Resource enforcement: Kill sandboxes that exceed memory or process limits
# Target API {:ok, _} = Sandbox.create_sandbox("evo-001", MySupervisor, resource_limits: %{ max_memory: 64 * 1024 * 1024, # 64MB - enforced max_processes: 50, # enforced max_message_queue: 1000 # enforced } )
-
Background garbage collection: Automatic cleanup of crashed or abandoned sandboxes
# Target configuration config :sandbox, gc_interval_ms: 60_000, gc_retain_generations: 3
-
Accurate resource aggregation: Track memory and processes across the entire sandbox tree
Priority: Medium
-
Population registry: Track groups of related sandboxes
# Target API {:ok, _} = Sandbox.PopulationRegistry.create_population("gen-1", sandbox_ids) {:ok, members} = Sandbox.PopulationRegistry.get_members("gen-1")
-
Generation tracking: Know which generation each sandbox belongs to
# Target extended state %Sandbox.Models.SandboxState{ generation: 5, parent_id: "evo-gen4-012", mutation_type: :crossover, lineage_depth: 5 }
-
Fitness recording: Store evaluation results per sandbox
# Target API Sandbox.FitnessRecorder.record("evo-001", %{ score: 0.85, components: %{correctness: 0.9, performance: 0.8}, evaluated_at: DateTime.utc_now() })
Priority: Medium
Extended telemetry events for evolution workloads:
# Planned events
[:sandbox, :evaluate, :start/:stop/:exception]
[:sandbox, :batch, :start/:stop/:exception]
[:sandbox, :fitness, :computed]
[:sandbox, :killed, :resource_limit]
[:sandbox, :killed, :timeout]
[:sandbox, :gc, :collected]Priority: Lower (for trusted evolution)
- AST scanning: Detect dangerous operations before loading
- Module allowlists: Restrict what modules evolved code can call
- Audit logging: Track all security-relevant events
Sandbox is the execution substrate, not the evolution brain. External systems handle mutation, selection, and population management.
EXTERNAL: Evolution Engine (mutation operators, population, selection) EXTERNAL: Fitness Evaluator (test runners, benchmarks, monitoring) SANDBOX (This Package) Spawn<text x="84" y="18" text-anchor="middle" font-size="14" fill="#64748b">→</text>
<rect x="96" y="0" width="72" height="28" rx="4" fill="#fff" stroke="#3b82f6" stroke-width="1"/>
<text x="132" y="18" text-anchor="middle" font-size="10" font-weight="500" fill="#1e40af">Load</text>
<text x="180" y="18" text-anchor="middle" font-size="14" fill="#64748b">→</text>
<rect x="192" y="0" width="72" height="28" rx="4" fill="#fff" stroke="#3b82f6" stroke-width="1"/>
<text x="228" y="18" text-anchor="middle" font-size="10" font-weight="500" fill="#1e40af">Execute</text>
<text x="276" y="18" text-anchor="middle" font-size="14" fill="#64748b">→</text>
<rect x="288" y="0" width="72" height="28" rx="4" fill="#fff" stroke="#3b82f6" stroke-width="1"/>
<text x="324" y="18" text-anchor="middle" font-size="10" font-weight="500" fill="#1e40af">Contain</text>
<text x="372" y="18" text-anchor="middle" font-size="14" fill="#64748b">→</text>
<rect x="384" y="0" width="72" height="28" rx="4" fill="#fff" stroke="#3b82f6" stroke-width="1"/>
<text x="420" y="18" text-anchor="middle" font-size="10" font-weight="500" fill="#1e40af">Track</text>
<text x="468" y="18" text-anchor="middle" font-size="14" fill="#64748b">→</text>
<rect x="480" y="0" width="72" height="28" rx="4" fill="#fff" stroke="#3b82f6" stroke-width="1"/>
<text x="516" y="18" text-anchor="middle" font-size="10" font-weight="500" fill="#1e40af">Cleanup</text>
The evolution engine calls Sandbox APIs:
defmodule Evolution.Engine do
def run_generation(population, fitness_fn) do
# 1. Create sandboxes for new individuals
configs = Enum.map(population.individuals, fn ind ->
{ind.id, EvolutionSupervisor, []}
end)
Sandbox.batch_create(configs)
# 2. Load evolved code into each sandbox
Enum.each(population.individuals, fn ind ->
Sandbox.hot_reload_source(ind.id, ind.code)
end)
# 3. Evaluate fitness in parallel
sandbox_ids = Enum.map(population.individuals, & &1.id)
results = Sandbox.batch_run(sandbox_ids, fitness_fn, timeout: 10_000)
# 4. Process results (external responsibility)
select_survivors(results)
# 5. Cleanup dead individuals
dead_ids = get_unfit_ids(results)
Sandbox.batch_destroy(dead_ids)
end
enddefmodule LLMEvolution do
def evolve_function(current_code, fitness_fn, llm_client) do
sandbox_id = "llm-#{:erlang.unique_integer([:positive])}"
# Create sandbox for evaluation
{:ok, _} = Sandbox.create_sandbox(sandbox_id, LLMSupervisor)
try do
# Generate variation via LLM
{:ok, new_code} = llm_client.mutate(current_code)
# Load and evaluate
case Sandbox.hot_reload_source(sandbox_id, new_code) do
{:ok, :hot_reloaded} ->
case Sandbox.run(sandbox_id, fitness_fn, timeout: 5_000) do
{:ok, score} -> {:ok, new_code, score}
{:error, reason} -> {:error, {:evaluation_failed, reason}}
end
{:error, reason} ->
{:error, {:load_failed, reason}}
end
after
Sandbox.destroy_sandbox(sandbox_id)
end
end
endMonitoring tools can provide fitness signals:
defmodule Beamlens.SandboxSkill do
def anomaly_handler(anomaly) do
# Extract sandbox ID from anomaly metadata
case extract_sandbox_id(anomaly) do
{:ok, sandbox_id} ->
# Anomaly detected = fitness penalty
# (External fitness recorder, not Sandbox's responsibility)
FitnessTracker.penalize(sandbox_id, 0.1, :anomaly_detected)
:error ->
:ignore
end
end
endBuild dashboards and alerting:
# Attach to sandbox telemetry
:telemetry.attach("evolution-dashboard",
[:sandbox, :run, :stop],
fn _event, measurements, metadata, _config ->
Dashboard.record_evaluation(
metadata.sandbox_id,
measurements.duration,
measurements.result
)
end,
nil
)This example demonstrates a complete evolution cycle using Sandbox.
defmodule EvolutionDemo do
@population_size 10
@generations 5
@target_fn &(&1 * &1) # Evolve toward x^2
def run do
# Initialize population with random candidates
population = initialize_population()
# Run evolution
final_population = Enum.reduce(1..@generations, population, fn gen, pop ->
IO.puts("Generation #{gen}")
evolve_generation(pop, gen)
end)
# Report best individual
best = Enum.max_by(final_population, & &1.fitness)
IO.puts("Best fitness: #{best.fitness}")
best
end
defp initialize_population do
for i <- 1..@population_size do
sandbox_id = "evo-#{i}"
# Create sandbox
{:ok, _} = Sandbox.create_sandbox(sandbox_id, Task.Supervisor)
# Generate random initial code
code = generate_random_function()
# Load into sandbox
case Sandbox.hot_reload_source(sandbox_id, code) do
{:ok, :hot_reloaded} ->
%{id: sandbox_id, code: code, fitness: 0.0}
{:error, _} ->
# Bad initial code, use fallback
fallback = "defmodule Candidate do\n def f(x), do: x\nend"
Sandbox.hot_reload_source(sandbox_id, fallback)
%{id: sandbox_id, code: fallback, fitness: 0.0}
end
end
end defp evaluate_fitness(individual) do
fitness_fn = fn ->
# Test candidate against target function
test_cases = [-5, -2, 0, 1, 3, 7, 10]
errors = Enum.map(test_cases, fn x ->
expected = @target_fn.(x)
actual = Candidate.f(x)
abs(expected - actual)
end)
# Fitness is inverse of error (higher is better)
total_error = Enum.sum(errors)
if total_error == 0, do: 1.0, else: 1.0 / (1.0 + total_error)
end
case Sandbox.run(individual.id, fitness_fn, timeout: 1_000) do
{:ok, score} -> %{individual | fitness: score}
{:error, _} -> %{individual | fitness: 0.0}
end
end defp evolve_generation(population, _gen) do
# Evaluate all individuals
evaluated = Enum.map(population, &evaluate_fitness/1)
# Select top performers
survivors = evaluated
|> Enum.sort_by(& &1.fitness, :desc)
|> Enum.take(div(@population_size, 2))
# Generate offspring through mutation
offspring = Enum.flat_map(survivors, fn parent ->
[
parent, # Keep parent
mutate(parent) # Create mutated child
]
end)
offspring
end
defp mutate(parent) do
new_code = mutate_code(parent.code) # Your mutation logic
case Sandbox.hot_reload_source(parent.id, new_code) do
{:ok, :hot_reloaded} ->
%{parent | code: new_code, fitness: 0.0}
{:error, _} ->
# Mutation produced invalid code, keep parent
parent
end
end defp cleanup(population) do
ids = Enum.map(population, & &1.id)
Sandbox.batch_destroy(ids)
end
end# Start sandbox in your application
children = [Sandbox]
Supervisor.start_link(children, strategy: :one_for_one)
# Run evolution
EvolutionDemo.run()Before MVP evolution features:
- Update dependencies for Elixir 1.19.4
- Ensure all tests pass with latest supertester
- Remove stubs with minimal working implementations
- Zero warnings, Dialyzer clean, Credo clean
- Config override hygiene (keyword lists and maps)
- State reset on startup by default
Core features for controlled experiments:
- Execution timeout enforcement in all paths
- ProcessIsolator applies spawn_opt consistently
- Reliable module loading path
- Per-runtime StatePreservation overrides
Production-ready evolution:
- Resource enforcement (memory, processes, message queues)
- Accurate resource usage aggregation across sandbox trees
- Background garbage collection
- Performance benchmarks for 100+ concurrent sandboxes
Enhanced evolution support:
- Population registry and generation tracking
- Parent-child lineage tracking
- Fitness recording per sandbox
- Extended telemetry events
For untrusted code evolution:
- SecurityController enforcement
- AST scanning for dangerous operations
- Module allowlists/blocklists
- Comprehensive audit logging
Sandbox is open to contributions, especially in these areas:
- Resource enforcement: Implementing memory and process limits
- Test coverage: Population-scale benchmarks and stress tests
- Documentation: Example evolution engines and patterns
- Telemetry: Dashboard templates for evolution monitoring
- Fork the repository
- Create a feature branch (
git checkout -b feature/resource-enforcement) - Ensure tests pass (
mix test) - Run quality checks (
mix dialyzer && mix credo --strict) - Submit a pull request
When contributing, keep these principles in mind:
- Sandbox is the substrate, not the brain: Mutation, selection, and population management belong in external systems
- Failure is expected: Design for crashed sandboxes, not against them
- Observability over control: Track and report; don't over-restrict
- BEAM-native: Use OTP patterns, not foreign abstractions
These are explicitly out of scope:
- Generate mutations (use LLMs, genetic operators externally)
- Select survivors (evolution engine responsibility)
- Store fitness history (record current only; external stores history)
- Distributed coordination (use Horde, pg externally)
- Persistence (use external storage for checkpoints)
- Getting Started - Installation and basic usage
- Architecture - System design and components
Sandbox- API reference (runmix docsto generate)- BEAM Hot Code Loading - Erlang/OTP documentation
- OTP Design Principles - Supervision and fault tolerance
This document describes the vision and roadmap for Sandbox as an evolution substrate. Features marked as "planned" are subject to change based on implementation experience and community feedback.