diff --git a/README.md b/README.md
index 55f40d262..4404c24d9 100644
--- a/README.md
+++ b/README.md
@@ -7,16 +7,16 @@
-
+
-`ShinkaEvolve` is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
+[`ShinkaEvolve`](https://arxiv.org/abs/2509.19349) is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
The framework supports **parallel evaluation of candidates** locally or on a Slurm cluster. It maintains an archive of successful solutions, enabling knowledge transfer between different evolutionary islands. `ShinkaEvolve` is particularly well-suited for scientific tasks where there is a verifier available and the goal is to optimize performance metrics while maintaining code correctness and readability.
-
+
## Documentation π
@@ -26,6 +26,7 @@ The framework supports **parallel evaluation of candidates** locally or on a Slu
| π **[Tutorial Notebook](examples/shinka_tutorial.ipynb)** | Interactive walkthrough of Shinka features | Hands-on examples, configuration, best practices |
| βοΈ **[Configuration](docs/configuration.md)** | Comprehensive configuration reference | All config options, optimization settings, advanced features |
| π¨ **[WebUI](docs/webui.md)** | Interactive visualization and monitoring | Real-time tracking, result analysis, debugging tools |
+|πΉοΈ **[Local LLM Support](https://github.com/SakanaAI/ShinkaEvolve/blob/main/docs/support_local_llm.md)**| Instructions for Local LLMs | How to setup local LLMs on your machine|
## Installation & Quick Start π
@@ -52,9 +53,9 @@ For detailed installation instructions and usage examples, see the [Getting Star
| Example | Description | Environment Setup |
|---------|-------------|-------------------|
| β [Circle Packing](examples/circle_packing) | Optimize circle packing to maximize radii. | `LocalJobConfig` |
-| π€ [Agent Design](examples/agent_design) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
+| π€ [Agent Design](examples/adas_aime) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
| π― [ALE-Bench](examples/ale_bench) | Code optimization for ALE-Bench tasks. | `LocalJobConfig` |
-| β¨ [Novelty Generator](examples/novelty_generator_bck) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |
+| β¨ [Novelty Generator](examples/novelty_generator) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |
## `shinka` Run with Python API π
@@ -308,9 +309,9 @@ If you use `ShinkaEvolve` in your research, please cite it as follows:
```
@article{lange2025shinka,
- title={ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution},
+ title={ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution},
author={Lange, Robert Tjarko and Imajuku, Yuki and Cetin, Edoardo},
- journal={arXiv preprint},
+ journal={arXiv preprint arXiv:2509.19349},
year={2025}
}
-```
\ No newline at end of file
+```
diff --git a/configs/config.yaml b/configs/config.yaml
index 9702c6617..577e1dfe2 100644
--- a/configs/config.yaml
+++ b/configs/config.yaml
@@ -2,9 +2,9 @@ defaults:
- _self_
- database@_global_: island_small
- evolution@_global_: small_budget
- - task@_global_: mad_tf
+ - task@_global_: circle_packing
- cluster@_global_: local
- - variant@_global_: mad_tf_example
+ - variant@_global_: circle_packing_example
verbose: false
results_dir: results
diff --git a/docs/getting_started.md b/docs/getting_started.md
index 234158839..03bc54c80 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -2,6 +2,8 @@
Shinka is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. This guide will help you get started with installing, configuring, and running your first evolutionary experiments.
+
+
## Table of Contents
1. [What is Shinka?](#what-is-shinka)
@@ -53,7 +55,7 @@ pip install uv
```bash
git clone
-cd shinka
+cd ShinkaEvolve
# Create virtual environment with Python 3.11
uv venv --python 3.11
@@ -79,7 +81,7 @@ conda activate shinka
```bash
git clone
-cd shinka
+cd ShinkaEvolve
pip install -e .
```
@@ -249,7 +251,7 @@ from shinka.core import run_shinka_eval
def main(program_path: str, results_dir: str):
"""Main evaluation function called by Shinka"""
-
+
metrics, correct, error_msg = run_shinka_eval(
program_path=program_path,
results_dir=results_dir,
@@ -268,11 +270,11 @@ def main(program_path: str, results_dir: str):
def validate_packing(run_output):
"""Returns (is_valid: bool, error_msg: str or None)"""
centers, radii, reported_sum = run_output
-
+
# Check constraints (bounds, overlaps, etc.)
if constraint_violated:
return False, "Specific error description"
-
+
return True, None # Valid solution
```
@@ -280,10 +282,10 @@ def validate_packing(run_output):
```python
def aggregate_metrics(results, results_dir):
"""Returns metrics dictionary with required structure"""
-
+
# Extract data from results
centers, radii, reported_sum = results[0]
-
+
return {
"combined_score": float(reported_sum), # PRIMARY FITNESS (higher = better)
"public": { # Visible in WebUI/logs
@@ -331,6 +333,75 @@ The `run_shinka_eval` function returns three values:
## Advanced Usage
+### Resuming Experiments
+
+If you need to pause and resume an evolutionary run, or extend a completed run with more generations, Shinka supports seamless resumption from existing results.
+
+#### How Resuming Works
+
+When you specify an existing `results_dir` that contains a database, Shinka will:
+- Detect the previous run automatically
+- Restore the population database and all program history
+- Resume meta-recommendations from the last checkpoint
+- Continue from the last completed generation
+
+#### Using the CLI (Hydra)
+
+```bash
+# Resume an existing run and extend to 50 generations
+shinka_launch \
+ variant=circle_packing_example \
+ evo_config.results_dir=results_20250101_120000 \
+ evo_config.num_generations=50
+
+# Or with a custom task
+shinka_launch \
+ task=circle_packing \
+ database=island_small \
+ evolution=small_budget \
+ cluster=local \
+ evo_config.results_dir=path/to/previous/results \
+ evo_config.num_generations=100
+```
+
+#### Using the Python API
+
+```python
+from shinka.core import EvolutionRunner, EvolutionConfig
+from shinka.database import DatabaseConfig
+from shinka.launch import LocalJobConfig
+
+# Point to existing results directory
+evo_config = EvolutionConfig(
+ num_generations=50, # Extend to 50 total generations
+ results_dir="results_20250101_120000", # Existing results
+ # ... other config parameters ...
+)
+
+job_config = LocalJobConfig(
+ eval_program_path="examples/circle_packing/evaluate.py",
+)
+
+db_config = DatabaseConfig(
+ archive_size=20,
+ num_islands=2,
+)
+
+# Run will automatically detect and resume
+runner = EvolutionRunner(
+ evo_config=evo_config,
+ job_config=job_config,
+ db_config=db_config,
+)
+runner.run()
+```
+
+**Important Notes:**
+- The `num_generations` parameter should be set to the **total** number of generations you want (not additional generations)
+- For example, if you completed 20 generations and want 30 more, set `num_generations=50`
+- The database configuration (number of islands, archive size, etc.) should match the original run
+- All previous progress, including the best solutions and meta-recommendations, will be preserved
+
### Environment Management for Local Jobs
When running jobs locally, you have several options for managing Python environments:
diff --git a/docs/support_local_llm.md b/docs/support_local_llm.md
new file mode 100644
index 000000000..5f406e7b9
--- /dev/null
+++ b/docs/support_local_llm.md
@@ -0,0 +1,232 @@
+
+# π§© Integrating Local LLMs into **ShinkaEvolve**
+
+## π§ Overview
+
+The original **ShinkaEvolve** code does **not** include built-in support for running **local LLMs**.
+To enable this functionality, parts of the codebase can be modified to integrate locally hosted models.
+
+---
+
+## ποΈ Code Organization
+
+**ShinkaEvolve** uses a **modular architecture** that supports multiple **LLM providers**.
+The relevant code for LLM interaction is located in the **`LLM/`** folder, which manages all model communications.
+ShinkaEvolve distinguishes between two LLM types:
+
+* **Regular LLMs**
+* **Embedding LLMs**
+
+---
+
+## βοΈ Adding a Regular LLM
+
+To add support for a **regular LLM**, follow these steps. They will show an example of adding support for gpt-oss models running with unsloth, which provides an API compatible with OpenAI API (v1/completions).
+This LLM can then be specified in the configuration variables:
+
+```yaml
+llm_models:
+meta_llm_models:
+```
+
+---
+
+### π§ Step 1: Modify the Client
+
+The file **`client.py`** is responsible for creating clients that interact with LLMs.
+Each client instance is later used to query a specific model.
+
+To add a local model, introduce a new client configuration.
+The API URL is extracted from the model name, which follows this format:
+
+```
+local-gptoss-unsloth-url
+```
+
+#### Example
+
+```python
+elif "local-gptoss-unsloth" in model_name:
+ # Extract URL from model name
+ pattern = r"https?://"
+ match = re.search(pattern, model_name)
+ if match:
+ start_index = match.start()
+ url = model_name[start_index:]
+ else:
+ raise ValueError(f"Invalid URL in model name: {model_name}")
+
+ # Create OpenAI-compatible client
+ client = openai.OpenAI(
+ api_key="filler",
+ base_url=url
+ )
+
+ # Structured output mode (if required)
+ if structured_output:
+ client = instructor.from_openai(
+ client,
+ mode=instructor.Mode.JSON,
+ )
+```
+
+---
+
+### π Step 2: Create the Local Query Function
+
+Inside the **`models/`** folder, create a new subfolder to store the query functions for your local models:
+
+```
+LLM/models/local/
+```
+
+> Donβt forget to include an empty `__init__.py` file.
+
+This folder should contain a **custom query function** for the local model. I called my file local_gptoss_unsloth.py.
+It should follow the same structure as other functions in `LLM/models/`, but with small adjustments.
+
+#### My Key Adjustments
+
+* Replace `max_output_tokens` with **`max_tokens`** to match the local API.
+* Extract additional response metadata such as:
+
+ * `total_tokens`
+ * `thinking_tokens` (if your model includes reasoning traces)
+
+This function is later imported and registered in **`query.py`**.
+
+---
+
+### π§© Step 3: Update `__init__.py`
+
+Configure **`__init__.py`** to include and expose the new local query function, so it can be imported elsewhere.
+
+```
+from .local.local_gptoss_unsloth import query_local_gptoss_unsloth # ADDED THIS LINE
+from .result import QueryResult
+
+__all__ = [
+ "query_anthropic",
+ "query_openai",
+ "query_deepseek",
+ "query_gemini",
+ "query_local_gptoss_unsloth", # ADDED THIS LINE
+ "QueryResult",
+]
+```
+
+---
+
+### π¬ Step 4: Update `query.py`
+
+Import and register the new local query function in query.py.
+
+#### Imports
+
+```python
+from .models import (
+ query_anthropic,
+ query_openai,
+ query_deepseek,
+ query_gemini,
+ query_local_gptoss_unsloth, # ADDED THIS LINE
+ QueryResult,
+)
+```
+
+#### Model Selection Logic
+
+```python
+elif "local-gptoss-unsloth" in model_name: # ADDED THIS LINE
+ query_fn = query_local_gptoss_unsloth
+```
+
+---
+
+### π§ Step 5: Other Observations
+
+The file **`query.py`** also defines functions such as:
+
+* `sample_model_kwargs`
+* `sample_batch_kwargs`
+
+However, these are **not referenced anywhere else** in the repository, so no modifications are required here for now.
+
+---
+
+### β
Summary
+
+| Step | File | Change | Description |
+| ---- | -------------------------------------------- | -------------------- | -------------------------------------------------------- |
+| 1 | `client.py` | Add new client block | Create OpenAI-compatible client for local LLM |
+| 2 | `models/local/query_local_gptoss_unsloth.py` | New function | Query local model, adjust tokens, extract reasoning info |
+| 3 | `__init__.py` | Add import | Expose new query function |
+| 4 | `query.py` | Register model | Add conditional for local LLM |
+| 5 | β | Review only | Ignored unused functions |
+
+---
+
+## 𧬠Adding a Local Embedding Model
+
+For embedding models, you can use **Ollama**, which follows the **OpenAI API** format.
+The only relevant file is **`embedding.py`**.
+
+### Code Addition
+
+```python
+elif model_name.startswith("local-"):
+ # Pattern: local-(model-name)-(http or https url)
+ match = re.match(r"local-(.+?)-(https?://.+)", model_name)
+ if match:
+ model_to_use = match.group(1)
+ url = match.group(2)
+ else:
+ raise ValueError(f"Invalid local model format: {model_name}")
+
+ client = openai.OpenAI(
+ base_url=url,
+ api_key="filler"
+ )
+```
+
+#### Notes
+
+* Compatible with **any Ollama model**.
+* The model name must follow this convention:
+
+ ```
+ local-model-name-url
+ ```
+* The code extracts both `model-name` and `url`, and uses them to query Ollama.
+
+---
+
+### Query Logic
+
+The existing line in **`embedding.py`** remains unchanged:
+
+```python
+response = self.client.embeddings.create(
+ model=self.model,
+ input=code,
+ encoding_format="float"
+)
+```
+
+For local embedding models, `self.model` corresponds to the extracted model name.
+The only addition to the **Embedding Client** class:
+
+```python
+elif self.model_name.startswith("local-"):
+ cost = 0.0
+```
+
+---
+
+## π Result
+
+ShinkaEvolve can now connect to **locally hosted LLMs** and **embedding models** through **OpenAI-compatible APIs**.
+This setup supports **Ollama** and other frameworks such as **gpt-oss** under **Unsloth**.
+
+If your model has different requirements, follow the same pattern with a distinct model identifier and your own custom logic.
+
diff --git a/examples/shinka_tutorial.ipynb b/examples/shinka_tutorial.ipynb
index 66a71a073..c6d818994 100644
--- a/examples/shinka_tutorial.ipynb
+++ b/examples/shinka_tutorial.ipynb
@@ -237,6 +237,17 @@
"if not llm_models:\n",
" llm_models = [\"gpt-5-mini\"] # fallback if no keys detected\n",
"\n",
+ "# pick embedding model based on available keys\n",
+ "embedding_model_name = \"\"\n",
+ "if os.getenv(\"GEMINI_API_KEY\"):\n",
+ " embedding_model_name = \"gemini-embedding-001\"\n",
+ "elif os.getenv(\"OPENAI_API_KEY\"):\n",
+ " embedding_model_name = \"text-embedding-3-small\"\n",
+ "else:\n",
+ " embedding_model_name = \"text-embedding-3-small\"\n",
+ "print(f\"β
Embedding model selected: {embedding_model_name}\")\n",
+ "\n",
+ "\n",
"# unique experiment directory\n",
"timestamp = dt.datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
"run_tag = f\"{timestamp}_weighted_fast\"\n",
@@ -271,6 +282,8 @@
" max_novelty_attempts=3,\n",
" # ensemble llm selection among candidates based on past performance\n",
" llm_dynamic_selection=None, # e.g. \"ucb1\"\n",
+ " # set embedding model\n",
+ " embedding_model=embedding_model_name,\n",
")\n",
"\n",
"db_config = DatabaseConfig(\n",
@@ -286,11 +299,13 @@
" enforce_island_separation=True,\n",
" parent_selection_strategy=\"weighted\",\n",
" parent_selection_lambda=10.0,\n",
+ " \n",
")\n",
"\n",
"job_config = LocalJobConfig(eval_program_path=\"evaluate.py\")\n",
"\n",
"print(\"llm_models:\", llm_models)\n",
+ "print(\"embedding_model:\", embedding_model_name)\n",
"print(\"results_dir:\", evo_config.results_dir)"
]
},
diff --git a/pyproject.toml b/pyproject.toml
index e3ec455af..5802a1522 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -45,17 +45,20 @@ dependencies = [
"adjustText",
"markdown",
"aiofiles",
+ "google-generativeai",
]
[tool.setuptools]
-packages = ["shinka"]
script-files = ["shinka/shinka_launch", "shinka/shinka_visualize"]
+[tool.setuptools.packages.find]
+include = ["shinka", "shinka.*"]
+
[tool.setuptools.package-data]
"*" = ["*"]
-[tool.uv]
-dev-dependencies = [
+[dependency-groups]
+dev = [
"pytest>=6.0",
"black",
"isort",
diff --git a/shinka/core/runner.py b/shinka/core/runner.py
index 3c818742c..a0dd5f81d 100644
--- a/shinka/core/runner.py
+++ b/shinka/core/runner.py
@@ -158,7 +158,12 @@ def __init__(
# Initialize database and scheduler
db_config.db_path = str(db_path)
- self.db = ProgramDatabase(config=db_config)
+ embedding_model_to_use = (
+ evo_config.embedding_model or "text-embedding-3-small"
+ )
+ self.db = ProgramDatabase(
+ config=db_config, embedding_model=embedding_model_to_use
+ )
self.scheduler = JobScheduler(
job_type=evo_config.job_type,
config=job_config, # type: ignore
@@ -231,6 +236,12 @@ def __init__(
self.lang_ext = "cpp"
elif self.evo_config.language == "python":
self.lang_ext = "py"
+ elif self.evo_config.language == "rust":
+ self.lang_ext = "rs"
+ elif self.evo_config.language == "swift":
+ self.lang_ext = "swift"
+ elif self.evo_config.language in ["json", "json5"]:
+ self.lang_ext = "json"
else:
msg = f"Language {self.evo_config.language} not supported"
raise ValueError(msg)
@@ -1096,9 +1107,10 @@ def run_patch(
# error_attempt is already set from apply_patch or default
pass
- # Only consider the diff summary for the original.py file!!!
- if "original.py" in diff_summary:
- diff_summary = diff_summary["original.py"]
+ # Only consider the diff summary for the original source file
+ original_filename = f"original.{self.lang_ext}"
+ if original_filename in diff_summary:
+ diff_summary = diff_summary[original_filename]
meta_edit_data = {
"patch_type": patch_type,
diff --git a/shinka/core/wrap_eval.py b/shinka/core/wrap_eval.py
index 7e1d1e5d3..bf2cf92eb 100644
--- a/shinka/core/wrap_eval.py
+++ b/shinka/core/wrap_eval.py
@@ -96,6 +96,9 @@ def run_shinka_eval(
num_valid_runs = 0
num_invalid_runs = 0
+ all_run_results: List[Any] = []
+ execution_times: List[float] = []
+
try:
module = load_program(program_path)
if not hasattr(module, experiment_fn_name):
@@ -105,9 +108,6 @@ def run_shinka_eval(
)
experiment_fn = getattr(module, experiment_fn_name)
- all_run_results: List[Any] = []
- execution_times: List[float] = []
-
for i in range(num_runs):
kwargs: Dict[str, Any] = {}
if get_experiment_kwargs:
diff --git a/shinka/database/complexity.py b/shinka/database/complexity.py
index 4116567e9..70cd5d3a1 100644
--- a/shinka/database/complexity.py
+++ b/shinka/database/complexity.py
@@ -259,8 +259,8 @@ def analyze_code_metrics(code_string, language="python"):
# If Python parsing fails, fall back to C++ analysis
return analyze_cpp_complexity(code_string)
- # For C/C++/CUDA and other languages, use regex-based analysis
- elif language in ["cpp", "c", "cuda", "c++"]:
+ # For C/C++/CUDA/Rust/Swift/JSON and other languages, use regex-based analysis
+ elif language in ["cpp", "c", "cuda", "c++", "rust", "swift", "json", "json5"]:
return analyze_cpp_complexity(code_string)
# For unknown languages, use simple line-based complexity
diff --git a/shinka/database/dbase.py b/shinka/database/dbase.py
index 69fdf5432..2118763c4 100644
--- a/shinka/database/dbase.py
+++ b/shinka/database/dbase.py
@@ -50,7 +50,7 @@ def clean_nan_values(obj: Any) -> Any:
@dataclass
class DatabaseConfig:
- db_path: Optional[str] = None
+ db_path: str = "evolution_db.sqlite"
num_islands: int = 4
archive_size: int = 100
@@ -82,6 +82,9 @@ class DatabaseConfig:
# Beam search parent selection parameters
num_beams: int = 5
+ # Embedding model name
+ embedding_model: str = "text-embedding-3-small"
+
def db_retry(max_retries=5, initial_delay=0.1, backoff_factor=2):
"""
@@ -248,12 +251,22 @@ class ProgramDatabase:
populations, and an archive of elites.
"""
- def __init__(self, config: DatabaseConfig, read_only: bool = False):
+ def __init__(
+ self,
+ config: DatabaseConfig,
+ embedding_model: str = "text-embedding-3-small",
+ read_only: bool = False,
+ ):
self.config = config
self.conn: Optional[sqlite3.Connection] = None
self.cursor: Optional[sqlite3.Cursor] = None
self.read_only = read_only
- self.embedding_client = EmbeddingClient()
+ # Only create embedding client if not in read-only mode
+ # (e.g., WebUI doesn't need it for visualization)
+ if not read_only:
+ self.embedding_client = EmbeddingClient(model_name=embedding_model)
+ else:
+ self.embedding_client = None
self.last_iteration: int = 0
self.best_program_id: Optional[str] = None
diff --git a/shinka/database/display.py b/shinka/database/display.py
index 4c34d3445..3e55439bf 100644
--- a/shinka/database/display.py
+++ b/shinka/database/display.py
@@ -122,6 +122,18 @@ def print_program_summary(self, program, console: Optional[RichConsole] = None):
else:
time_display = f"{time_val:.1f}s"
+ # Safely extract metadata fields for display
+ metadata = program.metadata or {}
+ patch_name_raw = metadata.get("patch_name", "[dim]N/A[/dim]")
+ if patch_name_raw is None:
+ patch_name_raw = "[dim]N/A[/dim]"
+ patch_name = str(patch_name_raw)[:30]
+
+ patch_type_raw = metadata.get("patch_type", "[dim]N/A[/dim]")
+ if patch_type_raw is None:
+ patch_type_raw = "[dim]N/A[/dim]"
+ patch_type = str(patch_type_raw)
+
# Add the data row
island_display = (
f"I-{program.island_idx}" if program.island_idx is not None else "N/A"
@@ -131,8 +143,8 @@ def print_program_summary(self, program, console: Optional[RichConsole] = None):
island_display,
status_display,
score_display,
- program.metadata.get("patch_name", "[dim]N/A[/dim]")[:30],
- program.metadata.get("patch_type", "[dim]N/A[/dim]"),
+ patch_name,
+ patch_type,
f"{program.complexity:.1f}",
cost_display,
time_display,
diff --git a/shinka/database/inspirations.py b/shinka/database/inspirations.py
index ee564dfa1..42c3859d8 100644
--- a/shinka/database/inspirations.py
+++ b/shinka/database/inspirations.py
@@ -72,6 +72,7 @@ def sample_context(self, parent: Any, n: int) -> List[Any]:
self.cursor.execute(
"""
SELECT p.id FROM programs p
+ JOIN archive a ON p.id = a.program_id
WHERE p.island_idx = ? AND p.correct = 1
ORDER BY p.combined_score DESC
LIMIT ?
@@ -93,7 +94,8 @@ def sample_context(self, parent: Any, n: int) -> List[Any]:
placeholders_rand = ",".join("?" * len(insp_ids))
sql_rand = f"""
SELECT p.id FROM programs p
- WHERE p.island_idx = ? AND p.correct = 1
+ JOIN archive a ON p.id = a.program_id
+ WHERE p.island_idx = ? AND p.correct = 1
AND p.id NOT IN ({placeholders_rand})
ORDER BY RANDOM() LIMIT ?
"""
@@ -111,9 +113,10 @@ def sample_context(self, parent: Any, n: int) -> List[Any]:
needed = n - len(inspirations)
if needed > 0:
placeholders_rand = ",".join("?" * len(insp_ids))
- sql_rand = f"""SELECT id FROM programs
- WHERE correct = 1
- AND id NOT IN ({placeholders_rand})
+ sql_rand = f"""SELECT p.id FROM programs p
+ JOIN archive a ON p.id = a.program_id
+ WHERE p.correct = 1
+ AND p.id NOT IN ({placeholders_rand})
ORDER BY RANDOM() LIMIT ?
"""
params_rand = list(insp_ids) + [needed]
diff --git a/shinka/database/islands.py b/shinka/database/islands.py
index 9975eac3b..341dea79c 100644
--- a/shinka/database/islands.py
+++ b/shinka/database/islands.py
@@ -682,6 +682,16 @@ def copy_program_to_islands(self, program: Any) -> List[str]:
f"Created copy {new_id[:8]}... of program {program.id[:8]}... "
f"for island {island_idx}"
)
+
+ # Add the copied program to the archive if it's correct
+ # This ensures it can be used as inspiration for that island
+ if program.correct:
+ self.cursor.execute(
+ "INSERT OR IGNORE INTO archive (program_id) VALUES (?)",
+ (new_id,),
+ )
+ logger.debug(f"Added copy {new_id[:8]}... to archive (correct program)")
+
self.conn.commit()
logger.info(
f"Created {len(created_ids)} copies of program "
diff --git a/shinka/edit/apply_diff.py b/shinka/edit/apply_diff.py
index ead28e231..d33f58042 100644
--- a/shinka/edit/apply_diff.py
+++ b/shinka/edit/apply_diff.py
@@ -698,7 +698,7 @@ def apply_diff_patch(
patch_str = _strip_trailing_whitespace(patch_str)
# Remove the EVOLVE-BLOCK START and EVOLVE-BLOCK END markers
- if language in ["cuda", "cpp"]:
+ if language in ["cuda", "cpp", "rust", "swift", "json", "json5"]:
patch_str = re.sub(r"// EVOLVE-BLOCK START\\n", "", patch_str)
patch_str = re.sub(r"// EVOLVE-BLOCK END\\n", "", patch_str)
elif language == "python":
@@ -730,6 +730,12 @@ def apply_diff_patch(
suffix = ".cpp"
elif language == "cuda":
suffix = ".cu"
+ elif language == "rust":
+ suffix = ".rs"
+ elif language == "swift":
+ suffix = ".swift"
+ elif language in ["json", "json5"]:
+ suffix = ".json"
else:
raise ValueError(f"Language {language} not supported")
diff --git a/shinka/edit/apply_full.py b/shinka/edit/apply_full.py
index b7e2e2b37..ac6288128 100644
--- a/shinka/edit/apply_full.py
+++ b/shinka/edit/apply_full.py
@@ -1,6 +1,6 @@
from pathlib import Path
from typing import Optional, Union
-from .apply_diff import write_git_diff, _mutable_ranges
+from .apply_diff import write_git_diff, _mutable_ranges, EVOLVE_START, EVOLVE_END
from shinka.llm import extract_between
import logging
@@ -72,10 +72,15 @@ def apply_full_patch(
updated_content = ""
last_end = 0
- # Check if patch_code contains EVOLVE-BLOCK markers
- patch_mutable_ranges = _mutable_ranges(patch_code)
+ # Detect EVOLVE markers presence in the patch content
+ patch_has_start = EVOLVE_START.search(patch_code) is not None
+ patch_has_end = EVOLVE_END.search(patch_code) is not None
+ patch_has_both = patch_has_start and patch_has_end
+ patch_has_none = not patch_has_start and not patch_has_end
- if patch_mutable_ranges:
+ if patch_has_both:
+ # Patch contains both EVOLVE-BLOCK markers, extract from them
+ patch_mutable_ranges = _mutable_ranges(patch_code)
# Patch contains EVOLVE-BLOCK markers, extract from them
for i, (start, end) in enumerate(mutable_ranges):
# Add immutable part before this mutable range
@@ -91,47 +96,158 @@ def apply_full_patch(
updated_content += replacement_content
last_end = end
- else:
+ elif patch_has_none:
# Patch doesn't contain EVOLVE-BLOCK markers
# Assume entire patch content should replace all mutable regions
if len(mutable_ranges) == 1:
- # Single mutable region, replace with entire patch content
+ # Single mutable region. If the patch appears to be a full-file
+ # rewrite that omitted EVOLVE markers, safely extract only the
+ # content intended for the evolve block by matching immutable
+ # prefix/suffix from the original file.
start, end = mutable_ranges[0]
- # The mutable range ends before "EVOLVE-BLOCK-END" text
- # We need to find the actual start of the comment line
- if language == "python":
- end_marker = "# EVOLVE-BLOCK-END"
- elif language in ["cuda", "cpp"]:
- end_marker = "// EVOLVE-BLOCK-END"
- else:
- end_marker = "# EVOLVE-BLOCK-END" # Default fallback
-
- end_marker_pos = original.find(end_marker, end - 5)
- if end_marker_pos == -1:
- # Fallback: use the original end position
- end_marker_pos = end
+ # Immutable portions that remain outside the evolve block
+ immutable_prefix = original[:start]
+ immutable_suffix = original[end:]
- # Ensure proper newline handling around the patch content
- if patch_code and not patch_code.startswith("\n"):
- patch_code = "\n" + patch_code
+ # Also compute the portions strictly outside the marker lines
+ # to detect full-file patches that omitted EVOLVE markers.
+ # Find the start and end marker line boundaries.
+ start_match = None
+ end_match = None
+ for m in EVOLVE_START.finditer(original):
+ if m.end() == start:
+ start_match = m
+ break
+ for m in EVOLVE_END.finditer(original):
+ if m.start() == end:
+ end_match = m
+ break
- if patch_code and not patch_code.endswith("\n"):
- patch_code = patch_code + "\n"
-
- updated_content = (
- original[:start] + patch_code + original[end_marker_pos:]
+ prefix_outside = (
+ original[: start_match.start()] if start_match else immutable_prefix
+ )
+ suffix_outside = (
+ original[end_match.end() :] if end_match else immutable_suffix
)
+
+ # Heuristic: if patch includes the same immutable prefix/suffix
+ # outside the markers, treat the middle part as the evolve-block
+ # replacement. Be tolerant to a missing trailing newline in the
+ # footer by checking both versions.
+ suffix_opts = (suffix_outside, suffix_outside.rstrip("\r\n"))
+ if patch_code.startswith(prefix_outside) and any(
+ patch_code.endswith(sfx) for sfx in suffix_opts
+ ):
+ mid_start = len(prefix_outside)
+ # choose the matching suffix option to compute end
+ sfx = next(sfx for sfx in suffix_opts if patch_code.endswith(sfx))
+ mid_end = len(patch_code) - len(sfx)
+ replacement_content = patch_code[mid_start:mid_end]
+ # Ensure marker boundaries stay on their own lines.
+ # Add a leading newline only if there is a START marker.
+ if (
+ start_match is not None
+ and replacement_content
+ and not replacement_content.startswith("\n")
+ ):
+ replacement_content = "\n" + replacement_content
+ # Add a trailing newline only if there is an END marker.
+ if (
+ end_match is not None
+ and replacement_content
+ and not replacement_content.endswith("\n")
+ ):
+ replacement_content = replacement_content + "\n"
+ updated_content = (
+ immutable_prefix + replacement_content + immutable_suffix
+ )
+ else:
+ # Otherwise, assume the patch_code represents only the
+ # evolve-block payload and insert it directly between markers.
+ # Ensure proper newline handling around the patch content.
+ payload = patch_code
+ if (
+ start_match is not None
+ and payload
+ and not payload.startswith("\n")
+ ):
+ payload = "\n" + payload
+ if end_match is not None and payload and not payload.endswith("\n"):
+ payload = payload + "\n"
+ updated_content = immutable_prefix + payload + immutable_suffix
else:
- # Multiple mutable regions, this is ambiguous
+ # Multiple EVOLVE-BLOCK regions found, ambiguous without markers
error_message = (
"Multiple EVOLVE-BLOCK regions found but patch "
"doesn't specify which to replace"
)
return original, 0, None, error_message, None, None
+ else:
+ # Patch contains exactly one marker (START xor END).
+ # Only safe to apply when original has a single evolve region.
+ if len(mutable_ranges) != 1:
+ error_message = (
+ "Patch contains only one EVOLVE-BLOCK marker, but the original "
+ f"has {len(mutable_ranges)} editable regions; cannot determine target"
+ )
+ return original, 0, None, error_message, None, None
+
+ # Single target region in original
+ start, end = mutable_ranges[0]
+ immutable_prefix = original[:start]
+ immutable_suffix = original[end:]
+
+ # Find exact marker locations in original for newline policy
+ start_match = None
+ end_match = None
+ for m in EVOLVE_START.finditer(original):
+ if m.end() == start:
+ start_match = m
+ break
+ for m in EVOLVE_END.finditer(original):
+ if m.start() == end:
+ end_match = m
+ break
+
+ # Compute outside-of-markers prefix/suffix from original
+ prefix_outside = (
+ original[: start_match.start()] if start_match else immutable_prefix
+ )
+ suffix_outside = (
+ original[end_match.end() :] if end_match else immutable_suffix
+ )
+
+ # Extract payload based on which single marker is present in patch
+ if patch_has_start and not patch_has_end:
+ m = EVOLVE_START.search(patch_code)
+ payload = patch_code[m.end() :] if m else patch_code
+ # Trim footer if the patch included it
+ for sfx in (suffix_outside, suffix_outside.rstrip("\r\n")):
+ if sfx and payload.endswith(sfx):
+ payload = payload[: -len(sfx)]
+ break
+ elif patch_has_end and not patch_has_start:
+ m = EVOLVE_END.search(patch_code)
+ payload = patch_code[: m.start()] if m else patch_code
+ # Trim header if the patch included it
+ for pfx in (prefix_outside, prefix_outside.rstrip("\r\n")):
+ if pfx and payload.startswith(pfx):
+ payload = payload[len(pfx) :]
+ break
+ else:
+ payload = patch_code
+
+ # Normalize newlines so markers remain on their own lines
+ if start_match is not None and payload and not payload.startswith("\n"):
+ payload = "\n" + payload
+ if end_match is not None and payload and not payload.endswith("\n"):
+ payload = payload + "\n"
+
+ updated_content = immutable_prefix + payload + immutable_suffix
# Add remaining immutable content after last mutable range
- if patch_mutable_ranges and mutable_ranges:
+ if patch_has_both and mutable_ranges:
updated_content += original[mutable_ranges[-1][1] :]
num_applied = 1
@@ -146,6 +262,12 @@ def apply_full_patch(
suffix = ".cpp"
elif language == "cuda":
suffix = ".cu"
+ elif language == "rust":
+ suffix = ".rs"
+ elif language == "swift":
+ suffix = ".swift"
+ elif language in ["json", "json5"]:
+ suffix = ".json"
else:
raise ValueError(f"Language {language} not supported")
diff --git a/shinka/edit/async_apply.py b/shinka/edit/async_apply.py
index 8e542c565..bf10b5b51 100644
--- a/shinka/edit/async_apply.py
+++ b/shinka/edit/async_apply.py
@@ -78,6 +78,30 @@ async def apply_patch_async(
return None, 0, None, str(e), None, None
+async def exec_language_tool(
+ *args: str, timeout: int
+) -> Tuple[bool, Optional[str]]:
+ """Execute a language tool and return the result."""
+ proc = await asyncio.create_subprocess_exec(
+ *args,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ )
+
+ try:
+ stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+ except asyncio.TimeoutError:
+ proc.kill()
+ await proc.wait()
+ return False, f"Validation timeout after {timeout}s"
+
+ if proc.returncode == 0:
+ return True, None
+ else:
+ error_msg = stderr.decode() if stderr else "Unknown compilation error"
+ return False, error_msg
+
+
async def validate_code_async(
code_path: str, language: str = "python", timeout: int = 30
) -> Tuple[bool, Optional[str]]:
@@ -94,54 +118,39 @@ async def validate_code_async(
try:
if language == "python":
# Use python -m py_compile for syntax checking
- proc = await asyncio.create_subprocess_exec(
+ return await exec_language_tool(
"python",
"-m",
"py_compile",
code_path,
- stdout=asyncio.subprocess.PIPE,
- stderr=asyncio.subprocess.PIPE,
+ timeout=timeout,
+ )
+ elif language == "rust":
+ # Use rustc for Rust syntax checking
+ return await exec_language_tool(
+ "rustc",
+ "--crate-type=lib",
+ "-Zparse-only",
+ code_path,
+ timeout=timeout,
)
-
- try:
- stdout, stderr = await asyncio.wait_for(
- proc.communicate(), timeout=timeout
- )
- except asyncio.TimeoutError:
- proc.kill()
- await proc.wait()
- return False, f"Validation timeout after {timeout}s"
-
- if proc.returncode == 0:
- return True, None
- else:
- error_msg = stderr.decode() if stderr else "Unknown compilation error"
- return False, error_msg
-
elif language == "cpp":
# Use g++ for C++ compilation check
- proc = await asyncio.create_subprocess_exec(
+ return await exec_language_tool(
"g++",
"-fsyntax-only",
code_path,
- stdout=asyncio.subprocess.PIPE,
- stderr=asyncio.subprocess.PIPE,
+ timeout=timeout,
+ )
+ elif language == "swift":
+ # Use swiftc for Swift syntax checking
+ return await exec_language_tool(
+ "swiftc",
+ "-typecheck",
+ "-parse-as-library",
+ code_path,
+ timeout=timeout,
)
-
- try:
- stdout, stderr = await asyncio.wait_for(
- proc.communicate(), timeout=timeout
- )
- except asyncio.TimeoutError:
- proc.kill()
- await proc.wait()
- return False, f"Validation timeout after {timeout}s"
-
- if proc.returncode == 0:
- return True, None
- else:
- error_msg = stderr.decode() if stderr else "Unknown compilation error"
- return False, error_msg
else:
# For other languages, just check if file exists and is readable
try:
diff --git a/shinka/launch/scheduler.py b/shinka/launch/scheduler.py
index 5782613ee..4e824c3ff 100644
--- a/shinka/launch/scheduler.py
+++ b/shinka/launch/scheduler.py
@@ -138,7 +138,13 @@ def _build_command(self, exec_fname_t: str, results_dir_t: str) -> List[str]:
]
if self.config.extra_cmd_args:
for k, v in self.config.extra_cmd_args.items():
- cmd.extend([f"--{k}", str(v)])
+ # Handle boolean flags
+ if isinstance(v, bool):
+ if v: # Only append flag if True
+ cmd.append(f"--{k}")
+ else:
+ # For non-boolean values, append both flag and value
+ cmd.extend([f"--{k}", str(v)])
return cmd
def run(
diff --git a/shinka/llm/dynamic_sampling.py b/shinka/llm/dynamic_sampling.py
index 6c038d9fa..eb0cd8cb3 100644
--- a/shinka/llm/dynamic_sampling.py
+++ b/shinka/llm/dynamic_sampling.py
@@ -28,7 +28,8 @@ def _logdiffexp(a_log, b_log):
def _logexpm1(z):
z = np.asarray(z, dtype=float)
- return np.where(z > 50.0, z, np.log(np.expm1(z)))
+ with np.errstate(divide='ignore', invalid='ignore'):
+ return np.where(z > 50.0, z, np.log(np.expm1(z)))
class BanditBase(ABC):
@@ -433,12 +434,13 @@ def decay(self, factor: float) -> None:
if self.use_exponential_scaling and self.asymmetric_scaling:
# shrink in exp space to match original score scale
s = self.s
- log1p_term = np.where(
- s > 0.0,
- s + np.log(one_minus_factor + np.exp(-s)),
- np.log1p(one_minus_factor * np.exp(s)),
- )
- self.s = s + np.log(factor) - log1p_term
+ with np.errstate(divide='ignore', invalid='ignore'):
+ log1p_term = np.where(
+ s > 0.0,
+ s + np.log(one_minus_factor + np.exp(-s)),
+ np.log1p(one_minus_factor * np.exp(s)),
+ )
+ self.s = s + np.log(factor) - log1p_term
if self.adaptive_scale and np.isfinite(self._obs_max):
means_log = self._mean()
diff --git a/shinka/llm/embedding.py b/shinka/llm/embedding.py
index a5c6b07cc..4082ad58b 100644
--- a/shinka/llm/embedding.py
+++ b/shinka/llm/embedding.py
@@ -1,5 +1,6 @@
import os
import openai
+import google.generativeai as genai
import pandas as pd
from typing import Union, List, Optional, Tuple
import numpy as np
@@ -20,13 +21,23 @@
"azure-text-embedding-3-large",
]
+GEMINI_EMBEDDING_MODELS = [
+ "gemini-embedding-exp-03-07",
+ "gemini-embedding-001",
+]
+
OPENAI_EMBEDDING_COSTS = {
"text-embedding-3-small": 0.02 / M,
"text-embedding-3-large": 0.13 / M,
}
+# Gemini embedding costs (approximate - check current pricing)
+GEMINI_EMBEDDING_COSTS = {
+ "gemini-embedding-exp-03-07": 0.0 / M, # Experimental model, often free
+ "gemini-embedding-001": 0.0 / M, # Check current pricing
+}
-def get_client_model(model_name: str) -> tuple[openai.OpenAI, str]:
+def get_client_model(model_name: str) -> tuple[Union[openai.OpenAI, str], str]:
if model_name in OPENAI_EMBEDDING_MODELS:
client = openai.OpenAI()
model_to_use = model_name
@@ -38,6 +49,14 @@ def get_client_model(model_name: str) -> tuple[openai.OpenAI, str]:
api_version=os.getenv("AZURE_API_VERSION"),
azure_endpoint=os.getenv("AZURE_API_ENDPOINT"),
)
+ elif model_name in GEMINI_EMBEDDING_MODELS:
+ # Configure Gemini API
+ api_key = os.getenv("GEMINI_API_KEY")
+ if not api_key:
+ raise ValueError("GEMINI_API_KEY environment variable not set for Gemini models")
+ genai.configure(api_key=api_key)
+ client = "gemini" # Use string identifier for Gemini
+ model_to_use = model_name
else:
raise ValueError(f"Invalid embedding model: {model_name}")
@@ -52,9 +71,10 @@ def __init__(
Initialize the EmbeddingClient.
Args:
- model (str): The OpenAI embedding model name to use.
+ model (str): The OpenAI, Azure, or Gemini embedding model name to use.
"""
self.client, self.model = get_client_model(model_name)
+ self.model_name = model_name
self.verbose = verbose
def get_embedding(
@@ -76,6 +96,34 @@ def get_embedding(
single_code = True
else:
single_code = False
+ # Handle Gemini models
+ if self.model_name in GEMINI_EMBEDDING_MODELS:
+ try:
+ embeddings = []
+ total_tokens = 0
+
+ for text in code:
+ result = genai.embed_content(
+ model=f"models/{self.model}",
+ content=text,
+ task_type="retrieval_document"
+ )
+ embeddings.append(result['embedding'])
+ total_tokens += len(text.split())
+
+ cost = total_tokens * GEMINI_EMBEDDING_COSTS.get(self.model, 0.0)
+
+ if single_code:
+ return embeddings[0] if embeddings else [], cost
+ else:
+ return embeddings, cost
+ except Exception as e:
+ logger.error(f"Error getting Gemini embedding: {e}")
+ if single_code:
+ return [], 0.0
+ else:
+ return [[]], 0.0
+ # Handle OpenAI and Azure models (same interface)
try:
response = self.client.embeddings.create(
model=self.model, input=code, encoding_format="float"
diff --git a/shinka/llm/models/pricing.py b/shinka/llm/models/pricing.py
index c9c101a2c..91e965c75 100644
--- a/shinka/llm/models/pricing.py
+++ b/shinka/llm/models/pricing.py
@@ -35,6 +35,10 @@
"input_price": 3.0 / M,
"output_price": 15.0 / M,
},
+ "claude-sonnet-4-5-20250929": {
+ "input_price": 3.0 / M,
+ "output_price": 15.0 / M,
+ },
}
OPENAI_MODELS = {
@@ -114,6 +118,10 @@
"input_price": 0.05 / M,
"output_price": 0.4 / M,
},
+ "gpt-5.1": {
+ "input_price": 1.25 / M,
+ "output_price": 10.0 / M,
+ },
}
@@ -141,6 +149,10 @@
"input_price": 0.1 / M,
"output_price": 0.4 / M,
},
+ "gemini-3-pro-preview" : {
+ "input_price": 2.0 / M,
+ "output_price": 12.0 / M,
+ },
}
BEDROCK_MODELS = {
@@ -176,6 +188,7 @@
REASONING_CLAUDE_MODELS = [
"claude-3-7-sonnet-20250219",
"claude-4-sonnet-20250514",
+ "claude-sonnet-4-5-20250929",
]
REASONING_DEEPSEEK_MODELS = [
@@ -186,6 +199,7 @@
"gemini-2.5-pro",
"gemini-2.5-flash",
"gemini-2.5-flash-lite-preview-06-17",
+ "gemini-3-pro-preview",
]
REASONING_AZURE_MODELS = [
diff --git a/shinka/llm/query.py b/shinka/llm/query.py
index a7288df8e..c88c7d7c3 100644
--- a/shinka/llm/query.py
+++ b/shinka/llm/query.py
@@ -137,16 +137,13 @@ def sample_model_kwargs(
r_effort = random.choice(reasoning_efforts)
think_bool = r_effort != "auto"
if think_bool:
- thinking_tokens = [
- t
- for t in THINKING_TOKENS.values()
- if t < kwargs_dict["max_tokens"] and t >= 1024
- ]
+ t = THINKING_TOKENS[r_effort]
+ thinking_tokens = t if t < kwargs_dict["max_tokens"] else 1024
kwargs_dict["extra_body"] = {
"extra_body": {
"google": {
"thinking_config": {
- "thinking_budget": random.choice(thinking_tokens),
+ "thinking_budget": thinking_tokens,
"include_thoughts": True,
}
}
@@ -157,19 +154,17 @@ def sample_model_kwargs(
REASONING_CLAUDE_MODELS + REASONING_BEDROCK_MODELS
):
kwargs_dict["max_tokens"] = min(random.choice(max_tokens), 16384)
- think_bool = random.choice(reasoning_efforts) != "auto"
+ r_effort = random.choice(reasoning_efforts)
+ think_bool = r_effort != "auto"
if think_bool:
# filter thinking tokens to be smaller than max_tokens
# not auto THINKING_TOKENS
- thinking_tokens = [
- t
- for t in THINKING_TOKENS.values()
- if t < kwargs_dict["max_tokens"] and t >= 1024
- ]
+ t = THINKING_TOKENS[r_effort]
+ thinking_tokens = t if t < kwargs_dict["max_tokens"] else 1024
# sample only from thinking tokens that are valid
kwargs_dict["thinking"] = {
"type": "enabled",
- "budget_tokens": random.choice(thinking_tokens),
+ "budget_tokens": thinking_tokens,
}
else:
diff --git a/shinka/webui/__init__.py b/shinka/webui/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/tests/test_edit_base.py b/tests/test_edit_base.py
index edc0e1178..67c6f2e20 100644
--- a/tests/test_edit_base.py
+++ b/tests/test_edit_base.py
@@ -161,6 +161,110 @@ def new_func2():
# Should have replaced both evolve blocks with new content
+def test_apply_full_patch_full_file_without_markers_extracts_block_only():
+ """Full-file patch without EVOLVE markers should not copy immutable code
+ into the evolve block; only the block payload is replaced."""
+ original_content = """# Header line\n# EVOLVE-BLOCK-START\nold_line()\n# EVOLVE-BLOCK-END\n# Footer line\n"""
+
+ # Patch is the entire file content but with the EVOLVE markers omitted.
+ patch_content = """```python
+new_line()
+another_new_line()
+```"""
+
+ expected = """# Header line
+# EVOLVE-BLOCK-START
+new_line()
+another_new_line()
+# EVOLVE-BLOCK-END
+# Footer line
+"""
+
+ result = apply_full_patch(
+ patch_str=patch_content,
+ original_str=original_content,
+ language="python",
+ verbose=False,
+ )
+ updated_content, num_applied, output_path, error, patch_txt, diff_path = result
+
+ assert error is None
+ assert num_applied == 1
+ assert updated_content == expected
+
+
+def test_apply_full_patch_patch_with_start_marker_only():
+ """Patch has only START marker; original has both markers."""
+ original_content = """# Header line
+# EVOLVE-BLOCK-START
+old_line()
+# EVOLVE-BLOCK-END
+# Footer line
+"""
+
+ patch_content = """```python
+# Header line
+# EVOLVE-BLOCK-START
+new_line()
+# Footer line
+```"""
+
+ expected = """# Header line
+# EVOLVE-BLOCK-START
+new_line()
+# EVOLVE-BLOCK-END
+# Footer line
+"""
+
+ result = apply_full_patch(
+ patch_str=patch_content,
+ original_str=original_content,
+ language="python",
+ verbose=False,
+ )
+ updated_content, num_applied, output_path, error, patch_txt, diff_path = result
+
+ assert error is None
+ assert num_applied == 1
+ assert updated_content == expected
+
+
+def test_apply_full_patch_patch_with_end_marker_only():
+ """Patch has only END marker; original has both markers."""
+ original_content = """# Header line
+# EVOLVE-BLOCK-START
+old_line()
+# EVOLVE-BLOCK-END
+# Footer line
+"""
+
+ patch_content = """```python
+# Header line
+new_line()
+# EVOLVE-BLOCK-END
+# Footer line
+```"""
+
+ expected = """# Header line
+# EVOLVE-BLOCK-START
+new_line()
+# EVOLVE-BLOCK-END
+# Footer line
+"""
+
+ result = apply_full_patch(
+ patch_str=patch_content,
+ original_str=original_content,
+ language="python",
+ verbose=False,
+ )
+ updated_content, num_applied, output_path, error, patch_txt, diff_path = result
+
+ assert error is None
+ assert num_applied == 1
+ assert updated_content == expected
+
+
def test_apply_full_patch_no_evolve_blocks():
"""Test apply_full_patch with no EVOLVE-BLOCK regions - should error."""
original_content = """# Just regular code
@@ -221,6 +325,41 @@ def new_function():
assert updated_content == original_content # Should return original content
+def test_apply_full_patch_patch_with_single_marker_ambiguous_multiple_regions():
+ """Single marker in patch is ambiguous when original has multiple regions."""
+ original_content = """# Header
+# EVOLVE-BLOCK-START
+func1()
+# EVOLVE-BLOCK-END
+
+# EVOLVE-BLOCK-START
+func2()
+# EVOLVE-BLOCK-END
+# Footer
+"""
+
+ # Patch includes only START marker
+ patch_content = """```python
+# Header
+# EVOLVE-BLOCK-START
+new_code()
+# Footer
+```"""
+
+ updated_content, num_applied, output_path, error, patch_txt, diff_path = (
+ apply_full_patch(
+ patch_str=patch_content,
+ original_str=original_content,
+ language="python",
+ verbose=False,
+ )
+ )
+
+ assert num_applied == 0
+ assert error is not None
+ assert "only one EVOLVE-BLOCK marker" in error
+
+
def test_apply_full_patch_invalid_extraction():
"""Test apply_full_patch with invalid code extraction."""
original_content = """# EVOLVE-BLOCK-START