diff --git a/README.md b/README.md index 55f40d262..4404c24d9 100644 --- a/README.md +++ b/README.md @@ -7,16 +7,16 @@ - +

-`ShinkaEvolve` is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements. +[`ShinkaEvolve`](https://arxiv.org/abs/2509.19349) is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements. The framework supports **parallel evaluation of candidates** locally or on a Slurm cluster. It maintains an archive of successful solutions, enabling knowledge transfer between different evolutionary islands. `ShinkaEvolve` is particularly well-suited for scientific tasks where there is a verifier available and the goal is to optimize performance metrics while maintaining code correctness and readability. -![](docs/conceptual.png) +![evolution](https://github.com/user-attachments/assets/22cf3468-17fe-4995-9e13-d602b490a54e) ## Documentation πŸ“ @@ -26,6 +26,7 @@ The framework supports **parallel evaluation of candidates** locally or on a Slu | πŸ““ **[Tutorial Notebook](examples/shinka_tutorial.ipynb)** | Interactive walkthrough of Shinka features | Hands-on examples, configuration, best practices | | βš™οΈ **[Configuration](docs/configuration.md)** | Comprehensive configuration reference | All config options, optimization settings, advanced features | | 🎨 **[WebUI](docs/webui.md)** | Interactive visualization and monitoring | Real-time tracking, result analysis, debugging tools | +|πŸ•ΉοΈ **[Local LLM Support](https://github.com/SakanaAI/ShinkaEvolve/blob/main/docs/support_local_llm.md)**| Instructions for Local LLMs | How to setup local LLMs on your machine| ## Installation & Quick Start πŸš€ @@ -52,9 +53,9 @@ For detailed installation instructions and usage examples, see the [Getting Star | Example | Description | Environment Setup | |---------|-------------|-------------------| | β­• [Circle Packing](examples/circle_packing) | Optimize circle packing to maximize radii. | `LocalJobConfig` | -| πŸ€– [Agent Design](examples/agent_design) | Design agent scaffolds for math tasks. | `LocalJobConfig` | +| πŸ€– [Agent Design](examples/adas_aime) | Design agent scaffolds for math tasks. | `LocalJobConfig` | | 🎯 [ALE-Bench](examples/ale_bench) | Code optimization for ALE-Bench tasks. | `LocalJobConfig` | -| ✨ [Novelty Generator](examples/novelty_generator_bck) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` | +| ✨ [Novelty Generator](examples/novelty_generator) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` | ## `shinka` Run with Python API 🐍 @@ -308,9 +309,9 @@ If you use `ShinkaEvolve` in your research, please cite it as follows: ``` @article{lange2025shinka, - title={ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution}, + title={ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution}, author={Lange, Robert Tjarko and Imajuku, Yuki and Cetin, Edoardo}, - journal={arXiv preprint}, + journal={arXiv preprint arXiv:2509.19349}, year={2025} } -``` \ No newline at end of file +``` diff --git a/configs/config.yaml b/configs/config.yaml index 9702c6617..577e1dfe2 100644 --- a/configs/config.yaml +++ b/configs/config.yaml @@ -2,9 +2,9 @@ defaults: - _self_ - database@_global_: island_small - evolution@_global_: small_budget - - task@_global_: mad_tf + - task@_global_: circle_packing - cluster@_global_: local - - variant@_global_: mad_tf_example + - variant@_global_: circle_packing_example verbose: false results_dir: results diff --git a/docs/getting_started.md b/docs/getting_started.md index 234158839..03bc54c80 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -2,6 +2,8 @@ Shinka is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. This guide will help you get started with installing, configuring, and running your first evolutionary experiments. +![](../docs/conceptual.png) + ## Table of Contents 1. [What is Shinka?](#what-is-shinka) @@ -53,7 +55,7 @@ pip install uv ```bash git clone -cd shinka +cd ShinkaEvolve # Create virtual environment with Python 3.11 uv venv --python 3.11 @@ -79,7 +81,7 @@ conda activate shinka ```bash git clone -cd shinka +cd ShinkaEvolve pip install -e . ``` @@ -249,7 +251,7 @@ from shinka.core import run_shinka_eval def main(program_path: str, results_dir: str): """Main evaluation function called by Shinka""" - + metrics, correct, error_msg = run_shinka_eval( program_path=program_path, results_dir=results_dir, @@ -268,11 +270,11 @@ def main(program_path: str, results_dir: str): def validate_packing(run_output): """Returns (is_valid: bool, error_msg: str or None)""" centers, radii, reported_sum = run_output - + # Check constraints (bounds, overlaps, etc.) if constraint_violated: return False, "Specific error description" - + return True, None # Valid solution ``` @@ -280,10 +282,10 @@ def validate_packing(run_output): ```python def aggregate_metrics(results, results_dir): """Returns metrics dictionary with required structure""" - + # Extract data from results centers, radii, reported_sum = results[0] - + return { "combined_score": float(reported_sum), # PRIMARY FITNESS (higher = better) "public": { # Visible in WebUI/logs @@ -331,6 +333,75 @@ The `run_shinka_eval` function returns three values: ## Advanced Usage +### Resuming Experiments + +If you need to pause and resume an evolutionary run, or extend a completed run with more generations, Shinka supports seamless resumption from existing results. + +#### How Resuming Works + +When you specify an existing `results_dir` that contains a database, Shinka will: +- Detect the previous run automatically +- Restore the population database and all program history +- Resume meta-recommendations from the last checkpoint +- Continue from the last completed generation + +#### Using the CLI (Hydra) + +```bash +# Resume an existing run and extend to 50 generations +shinka_launch \ + variant=circle_packing_example \ + evo_config.results_dir=results_20250101_120000 \ + evo_config.num_generations=50 + +# Or with a custom task +shinka_launch \ + task=circle_packing \ + database=island_small \ + evolution=small_budget \ + cluster=local \ + evo_config.results_dir=path/to/previous/results \ + evo_config.num_generations=100 +``` + +#### Using the Python API + +```python +from shinka.core import EvolutionRunner, EvolutionConfig +from shinka.database import DatabaseConfig +from shinka.launch import LocalJobConfig + +# Point to existing results directory +evo_config = EvolutionConfig( + num_generations=50, # Extend to 50 total generations + results_dir="results_20250101_120000", # Existing results + # ... other config parameters ... +) + +job_config = LocalJobConfig( + eval_program_path="examples/circle_packing/evaluate.py", +) + +db_config = DatabaseConfig( + archive_size=20, + num_islands=2, +) + +# Run will automatically detect and resume +runner = EvolutionRunner( + evo_config=evo_config, + job_config=job_config, + db_config=db_config, +) +runner.run() +``` + +**Important Notes:** +- The `num_generations` parameter should be set to the **total** number of generations you want (not additional generations) +- For example, if you completed 20 generations and want 30 more, set `num_generations=50` +- The database configuration (number of islands, archive size, etc.) should match the original run +- All previous progress, including the best solutions and meta-recommendations, will be preserved + ### Environment Management for Local Jobs When running jobs locally, you have several options for managing Python environments: diff --git a/docs/support_local_llm.md b/docs/support_local_llm.md new file mode 100644 index 000000000..5f406e7b9 --- /dev/null +++ b/docs/support_local_llm.md @@ -0,0 +1,232 @@ + +# 🧩 Integrating Local LLMs into **ShinkaEvolve** + +## 🧠 Overview + +The original **ShinkaEvolve** code does **not** include built-in support for running **local LLMs**. +To enable this functionality, parts of the codebase can be modified to integrate locally hosted models. + +--- + +## πŸ—οΈ Code Organization + +**ShinkaEvolve** uses a **modular architecture** that supports multiple **LLM providers**. +The relevant code for LLM interaction is located in the **`LLM/`** folder, which manages all model communications. +ShinkaEvolve distinguishes between two LLM types: + +* **Regular LLMs** +* **Embedding LLMs** + +--- + +## βš™οΈ Adding a Regular LLM + +To add support for a **regular LLM**, follow these steps. They will show an example of adding support for gpt-oss models running with unsloth, which provides an API compatible with OpenAI API (v1/completions). +This LLM can then be specified in the configuration variables: + +```yaml +llm_models: +meta_llm_models: +``` + +--- + +### πŸ”§ Step 1: Modify the Client + +The file **`client.py`** is responsible for creating clients that interact with LLMs. +Each client instance is later used to query a specific model. + +To add a local model, introduce a new client configuration. +The API URL is extracted from the model name, which follows this format: + +``` +local-gptoss-unsloth-url +``` + +#### Example + +```python +elif "local-gptoss-unsloth" in model_name: + # Extract URL from model name + pattern = r"https?://" + match = re.search(pattern, model_name) + if match: + start_index = match.start() + url = model_name[start_index:] + else: + raise ValueError(f"Invalid URL in model name: {model_name}") + + # Create OpenAI-compatible client + client = openai.OpenAI( + api_key="filler", + base_url=url + ) + + # Structured output mode (if required) + if structured_output: + client = instructor.from_openai( + client, + mode=instructor.Mode.JSON, + ) +``` + +--- + +### πŸ“ Step 2: Create the Local Query Function + +Inside the **`models/`** folder, create a new subfolder to store the query functions for your local models: + +``` +LLM/models/local/ +``` + +> Don’t forget to include an empty `__init__.py` file. + +This folder should contain a **custom query function** for the local model. I called my file local_gptoss_unsloth.py. +It should follow the same structure as other functions in `LLM/models/`, but with small adjustments. + +#### My Key Adjustments + +* Replace `max_output_tokens` with **`max_tokens`** to match the local API. +* Extract additional response metadata such as: + + * `total_tokens` + * `thinking_tokens` (if your model includes reasoning traces) + +This function is later imported and registered in **`query.py`**. + +--- + +### 🧩 Step 3: Update `__init__.py` + +Configure **`__init__.py`** to include and expose the new local query function, so it can be imported elsewhere. + +``` +from .local.local_gptoss_unsloth import query_local_gptoss_unsloth # ADDED THIS LINE +from .result import QueryResult + +__all__ = [ + "query_anthropic", + "query_openai", + "query_deepseek", + "query_gemini", + "query_local_gptoss_unsloth", # ADDED THIS LINE + "QueryResult", +] +``` + +--- + +### πŸ“¬ Step 4: Update `query.py` + +Import and register the new local query function in query.py. + +#### Imports + +```python +from .models import ( + query_anthropic, + query_openai, + query_deepseek, + query_gemini, + query_local_gptoss_unsloth, # ADDED THIS LINE + QueryResult, +) +``` + +#### Model Selection Logic + +```python +elif "local-gptoss-unsloth" in model_name: # ADDED THIS LINE + query_fn = query_local_gptoss_unsloth +``` + +--- + +### 🧠 Step 5: Other Observations + +The file **`query.py`** also defines functions such as: + +* `sample_model_kwargs` +* `sample_batch_kwargs` + +However, these are **not referenced anywhere else** in the repository, so no modifications are required here for now. + +--- + +### βœ… Summary + +| Step | File | Change | Description | +| ---- | -------------------------------------------- | -------------------- | -------------------------------------------------------- | +| 1 | `client.py` | Add new client block | Create OpenAI-compatible client for local LLM | +| 2 | `models/local/query_local_gptoss_unsloth.py` | New function | Query local model, adjust tokens, extract reasoning info | +| 3 | `__init__.py` | Add import | Expose new query function | +| 4 | `query.py` | Register model | Add conditional for local LLM | +| 5 | β€” | Review only | Ignored unused functions | + +--- + +## 🧬 Adding a Local Embedding Model + +For embedding models, you can use **Ollama**, which follows the **OpenAI API** format. +The only relevant file is **`embedding.py`**. + +### Code Addition + +```python +elif model_name.startswith("local-"): + # Pattern: local-(model-name)-(http or https url) + match = re.match(r"local-(.+?)-(https?://.+)", model_name) + if match: + model_to_use = match.group(1) + url = match.group(2) + else: + raise ValueError(f"Invalid local model format: {model_name}") + + client = openai.OpenAI( + base_url=url, + api_key="filler" + ) +``` + +#### Notes + +* Compatible with **any Ollama model**. +* The model name must follow this convention: + + ``` + local-model-name-url + ``` +* The code extracts both `model-name` and `url`, and uses them to query Ollama. + +--- + +### Query Logic + +The existing line in **`embedding.py`** remains unchanged: + +```python +response = self.client.embeddings.create( + model=self.model, + input=code, + encoding_format="float" +) +``` + +For local embedding models, `self.model` corresponds to the extracted model name. +The only addition to the **Embedding Client** class: + +```python +elif self.model_name.startswith("local-"): + cost = 0.0 +``` + +--- + +## πŸš€ Result + +ShinkaEvolve can now connect to **locally hosted LLMs** and **embedding models** through **OpenAI-compatible APIs**. +This setup supports **Ollama** and other frameworks such as **gpt-oss** under **Unsloth**. + +If your model has different requirements, follow the same pattern with a distinct model identifier and your own custom logic. + diff --git a/examples/shinka_tutorial.ipynb b/examples/shinka_tutorial.ipynb index 66a71a073..c6d818994 100644 --- a/examples/shinka_tutorial.ipynb +++ b/examples/shinka_tutorial.ipynb @@ -237,6 +237,17 @@ "if not llm_models:\n", " llm_models = [\"gpt-5-mini\"] # fallback if no keys detected\n", "\n", + "# pick embedding model based on available keys\n", + "embedding_model_name = \"\"\n", + "if os.getenv(\"GEMINI_API_KEY\"):\n", + " embedding_model_name = \"gemini-embedding-001\"\n", + "elif os.getenv(\"OPENAI_API_KEY\"):\n", + " embedding_model_name = \"text-embedding-3-small\"\n", + "else:\n", + " embedding_model_name = \"text-embedding-3-small\"\n", + "print(f\"βœ… Embedding model selected: {embedding_model_name}\")\n", + "\n", + "\n", "# unique experiment directory\n", "timestamp = dt.datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n", "run_tag = f\"{timestamp}_weighted_fast\"\n", @@ -271,6 +282,8 @@ " max_novelty_attempts=3,\n", " # ensemble llm selection among candidates based on past performance\n", " llm_dynamic_selection=None, # e.g. \"ucb1\"\n", + " # set embedding model\n", + " embedding_model=embedding_model_name,\n", ")\n", "\n", "db_config = DatabaseConfig(\n", @@ -286,11 +299,13 @@ " enforce_island_separation=True,\n", " parent_selection_strategy=\"weighted\",\n", " parent_selection_lambda=10.0,\n", + " \n", ")\n", "\n", "job_config = LocalJobConfig(eval_program_path=\"evaluate.py\")\n", "\n", "print(\"llm_models:\", llm_models)\n", + "print(\"embedding_model:\", embedding_model_name)\n", "print(\"results_dir:\", evo_config.results_dir)" ] }, diff --git a/pyproject.toml b/pyproject.toml index e3ec455af..5802a1522 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -45,17 +45,20 @@ dependencies = [ "adjustText", "markdown", "aiofiles", + "google-generativeai", ] [tool.setuptools] -packages = ["shinka"] script-files = ["shinka/shinka_launch", "shinka/shinka_visualize"] +[tool.setuptools.packages.find] +include = ["shinka", "shinka.*"] + [tool.setuptools.package-data] "*" = ["*"] -[tool.uv] -dev-dependencies = [ +[dependency-groups] +dev = [ "pytest>=6.0", "black", "isort", diff --git a/shinka/core/runner.py b/shinka/core/runner.py index 3c818742c..a0dd5f81d 100644 --- a/shinka/core/runner.py +++ b/shinka/core/runner.py @@ -158,7 +158,12 @@ def __init__( # Initialize database and scheduler db_config.db_path = str(db_path) - self.db = ProgramDatabase(config=db_config) + embedding_model_to_use = ( + evo_config.embedding_model or "text-embedding-3-small" + ) + self.db = ProgramDatabase( + config=db_config, embedding_model=embedding_model_to_use + ) self.scheduler = JobScheduler( job_type=evo_config.job_type, config=job_config, # type: ignore @@ -231,6 +236,12 @@ def __init__( self.lang_ext = "cpp" elif self.evo_config.language == "python": self.lang_ext = "py" + elif self.evo_config.language == "rust": + self.lang_ext = "rs" + elif self.evo_config.language == "swift": + self.lang_ext = "swift" + elif self.evo_config.language in ["json", "json5"]: + self.lang_ext = "json" else: msg = f"Language {self.evo_config.language} not supported" raise ValueError(msg) @@ -1096,9 +1107,10 @@ def run_patch( # error_attempt is already set from apply_patch or default pass - # Only consider the diff summary for the original.py file!!! - if "original.py" in diff_summary: - diff_summary = diff_summary["original.py"] + # Only consider the diff summary for the original source file + original_filename = f"original.{self.lang_ext}" + if original_filename in diff_summary: + diff_summary = diff_summary[original_filename] meta_edit_data = { "patch_type": patch_type, diff --git a/shinka/core/wrap_eval.py b/shinka/core/wrap_eval.py index 7e1d1e5d3..bf2cf92eb 100644 --- a/shinka/core/wrap_eval.py +++ b/shinka/core/wrap_eval.py @@ -96,6 +96,9 @@ def run_shinka_eval( num_valid_runs = 0 num_invalid_runs = 0 + all_run_results: List[Any] = [] + execution_times: List[float] = [] + try: module = load_program(program_path) if not hasattr(module, experiment_fn_name): @@ -105,9 +108,6 @@ def run_shinka_eval( ) experiment_fn = getattr(module, experiment_fn_name) - all_run_results: List[Any] = [] - execution_times: List[float] = [] - for i in range(num_runs): kwargs: Dict[str, Any] = {} if get_experiment_kwargs: diff --git a/shinka/database/complexity.py b/shinka/database/complexity.py index 4116567e9..70cd5d3a1 100644 --- a/shinka/database/complexity.py +++ b/shinka/database/complexity.py @@ -259,8 +259,8 @@ def analyze_code_metrics(code_string, language="python"): # If Python parsing fails, fall back to C++ analysis return analyze_cpp_complexity(code_string) - # For C/C++/CUDA and other languages, use regex-based analysis - elif language in ["cpp", "c", "cuda", "c++"]: + # For C/C++/CUDA/Rust/Swift/JSON and other languages, use regex-based analysis + elif language in ["cpp", "c", "cuda", "c++", "rust", "swift", "json", "json5"]: return analyze_cpp_complexity(code_string) # For unknown languages, use simple line-based complexity diff --git a/shinka/database/dbase.py b/shinka/database/dbase.py index 69fdf5432..2118763c4 100644 --- a/shinka/database/dbase.py +++ b/shinka/database/dbase.py @@ -50,7 +50,7 @@ def clean_nan_values(obj: Any) -> Any: @dataclass class DatabaseConfig: - db_path: Optional[str] = None + db_path: str = "evolution_db.sqlite" num_islands: int = 4 archive_size: int = 100 @@ -82,6 +82,9 @@ class DatabaseConfig: # Beam search parent selection parameters num_beams: int = 5 + # Embedding model name + embedding_model: str = "text-embedding-3-small" + def db_retry(max_retries=5, initial_delay=0.1, backoff_factor=2): """ @@ -248,12 +251,22 @@ class ProgramDatabase: populations, and an archive of elites. """ - def __init__(self, config: DatabaseConfig, read_only: bool = False): + def __init__( + self, + config: DatabaseConfig, + embedding_model: str = "text-embedding-3-small", + read_only: bool = False, + ): self.config = config self.conn: Optional[sqlite3.Connection] = None self.cursor: Optional[sqlite3.Cursor] = None self.read_only = read_only - self.embedding_client = EmbeddingClient() + # Only create embedding client if not in read-only mode + # (e.g., WebUI doesn't need it for visualization) + if not read_only: + self.embedding_client = EmbeddingClient(model_name=embedding_model) + else: + self.embedding_client = None self.last_iteration: int = 0 self.best_program_id: Optional[str] = None diff --git a/shinka/database/display.py b/shinka/database/display.py index 4c34d3445..3e55439bf 100644 --- a/shinka/database/display.py +++ b/shinka/database/display.py @@ -122,6 +122,18 @@ def print_program_summary(self, program, console: Optional[RichConsole] = None): else: time_display = f"{time_val:.1f}s" + # Safely extract metadata fields for display + metadata = program.metadata or {} + patch_name_raw = metadata.get("patch_name", "[dim]N/A[/dim]") + if patch_name_raw is None: + patch_name_raw = "[dim]N/A[/dim]" + patch_name = str(patch_name_raw)[:30] + + patch_type_raw = metadata.get("patch_type", "[dim]N/A[/dim]") + if patch_type_raw is None: + patch_type_raw = "[dim]N/A[/dim]" + patch_type = str(patch_type_raw) + # Add the data row island_display = ( f"I-{program.island_idx}" if program.island_idx is not None else "N/A" @@ -131,8 +143,8 @@ def print_program_summary(self, program, console: Optional[RichConsole] = None): island_display, status_display, score_display, - program.metadata.get("patch_name", "[dim]N/A[/dim]")[:30], - program.metadata.get("patch_type", "[dim]N/A[/dim]"), + patch_name, + patch_type, f"{program.complexity:.1f}", cost_display, time_display, diff --git a/shinka/database/inspirations.py b/shinka/database/inspirations.py index ee564dfa1..42c3859d8 100644 --- a/shinka/database/inspirations.py +++ b/shinka/database/inspirations.py @@ -72,6 +72,7 @@ def sample_context(self, parent: Any, n: int) -> List[Any]: self.cursor.execute( """ SELECT p.id FROM programs p + JOIN archive a ON p.id = a.program_id WHERE p.island_idx = ? AND p.correct = 1 ORDER BY p.combined_score DESC LIMIT ? @@ -93,7 +94,8 @@ def sample_context(self, parent: Any, n: int) -> List[Any]: placeholders_rand = ",".join("?" * len(insp_ids)) sql_rand = f""" SELECT p.id FROM programs p - WHERE p.island_idx = ? AND p.correct = 1 + JOIN archive a ON p.id = a.program_id + WHERE p.island_idx = ? AND p.correct = 1 AND p.id NOT IN ({placeholders_rand}) ORDER BY RANDOM() LIMIT ? """ @@ -111,9 +113,10 @@ def sample_context(self, parent: Any, n: int) -> List[Any]: needed = n - len(inspirations) if needed > 0: placeholders_rand = ",".join("?" * len(insp_ids)) - sql_rand = f"""SELECT id FROM programs - WHERE correct = 1 - AND id NOT IN ({placeholders_rand}) + sql_rand = f"""SELECT p.id FROM programs p + JOIN archive a ON p.id = a.program_id + WHERE p.correct = 1 + AND p.id NOT IN ({placeholders_rand}) ORDER BY RANDOM() LIMIT ? """ params_rand = list(insp_ids) + [needed] diff --git a/shinka/database/islands.py b/shinka/database/islands.py index 9975eac3b..341dea79c 100644 --- a/shinka/database/islands.py +++ b/shinka/database/islands.py @@ -682,6 +682,16 @@ def copy_program_to_islands(self, program: Any) -> List[str]: f"Created copy {new_id[:8]}... of program {program.id[:8]}... " f"for island {island_idx}" ) + + # Add the copied program to the archive if it's correct + # This ensures it can be used as inspiration for that island + if program.correct: + self.cursor.execute( + "INSERT OR IGNORE INTO archive (program_id) VALUES (?)", + (new_id,), + ) + logger.debug(f"Added copy {new_id[:8]}... to archive (correct program)") + self.conn.commit() logger.info( f"Created {len(created_ids)} copies of program " diff --git a/shinka/edit/apply_diff.py b/shinka/edit/apply_diff.py index ead28e231..d33f58042 100644 --- a/shinka/edit/apply_diff.py +++ b/shinka/edit/apply_diff.py @@ -698,7 +698,7 @@ def apply_diff_patch( patch_str = _strip_trailing_whitespace(patch_str) # Remove the EVOLVE-BLOCK START and EVOLVE-BLOCK END markers - if language in ["cuda", "cpp"]: + if language in ["cuda", "cpp", "rust", "swift", "json", "json5"]: patch_str = re.sub(r"// EVOLVE-BLOCK START\\n", "", patch_str) patch_str = re.sub(r"// EVOLVE-BLOCK END\\n", "", patch_str) elif language == "python": @@ -730,6 +730,12 @@ def apply_diff_patch( suffix = ".cpp" elif language == "cuda": suffix = ".cu" + elif language == "rust": + suffix = ".rs" + elif language == "swift": + suffix = ".swift" + elif language in ["json", "json5"]: + suffix = ".json" else: raise ValueError(f"Language {language} not supported") diff --git a/shinka/edit/apply_full.py b/shinka/edit/apply_full.py index b7e2e2b37..ac6288128 100644 --- a/shinka/edit/apply_full.py +++ b/shinka/edit/apply_full.py @@ -1,6 +1,6 @@ from pathlib import Path from typing import Optional, Union -from .apply_diff import write_git_diff, _mutable_ranges +from .apply_diff import write_git_diff, _mutable_ranges, EVOLVE_START, EVOLVE_END from shinka.llm import extract_between import logging @@ -72,10 +72,15 @@ def apply_full_patch( updated_content = "" last_end = 0 - # Check if patch_code contains EVOLVE-BLOCK markers - patch_mutable_ranges = _mutable_ranges(patch_code) + # Detect EVOLVE markers presence in the patch content + patch_has_start = EVOLVE_START.search(patch_code) is not None + patch_has_end = EVOLVE_END.search(patch_code) is not None + patch_has_both = patch_has_start and patch_has_end + patch_has_none = not patch_has_start and not patch_has_end - if patch_mutable_ranges: + if patch_has_both: + # Patch contains both EVOLVE-BLOCK markers, extract from them + patch_mutable_ranges = _mutable_ranges(patch_code) # Patch contains EVOLVE-BLOCK markers, extract from them for i, (start, end) in enumerate(mutable_ranges): # Add immutable part before this mutable range @@ -91,47 +96,158 @@ def apply_full_patch( updated_content += replacement_content last_end = end - else: + elif patch_has_none: # Patch doesn't contain EVOLVE-BLOCK markers # Assume entire patch content should replace all mutable regions if len(mutable_ranges) == 1: - # Single mutable region, replace with entire patch content + # Single mutable region. If the patch appears to be a full-file + # rewrite that omitted EVOLVE markers, safely extract only the + # content intended for the evolve block by matching immutable + # prefix/suffix from the original file. start, end = mutable_ranges[0] - # The mutable range ends before "EVOLVE-BLOCK-END" text - # We need to find the actual start of the comment line - if language == "python": - end_marker = "# EVOLVE-BLOCK-END" - elif language in ["cuda", "cpp"]: - end_marker = "// EVOLVE-BLOCK-END" - else: - end_marker = "# EVOLVE-BLOCK-END" # Default fallback - - end_marker_pos = original.find(end_marker, end - 5) - if end_marker_pos == -1: - # Fallback: use the original end position - end_marker_pos = end + # Immutable portions that remain outside the evolve block + immutable_prefix = original[:start] + immutable_suffix = original[end:] - # Ensure proper newline handling around the patch content - if patch_code and not patch_code.startswith("\n"): - patch_code = "\n" + patch_code + # Also compute the portions strictly outside the marker lines + # to detect full-file patches that omitted EVOLVE markers. + # Find the start and end marker line boundaries. + start_match = None + end_match = None + for m in EVOLVE_START.finditer(original): + if m.end() == start: + start_match = m + break + for m in EVOLVE_END.finditer(original): + if m.start() == end: + end_match = m + break - if patch_code and not patch_code.endswith("\n"): - patch_code = patch_code + "\n" - - updated_content = ( - original[:start] + patch_code + original[end_marker_pos:] + prefix_outside = ( + original[: start_match.start()] if start_match else immutable_prefix + ) + suffix_outside = ( + original[end_match.end() :] if end_match else immutable_suffix ) + + # Heuristic: if patch includes the same immutable prefix/suffix + # outside the markers, treat the middle part as the evolve-block + # replacement. Be tolerant to a missing trailing newline in the + # footer by checking both versions. + suffix_opts = (suffix_outside, suffix_outside.rstrip("\r\n")) + if patch_code.startswith(prefix_outside) and any( + patch_code.endswith(sfx) for sfx in suffix_opts + ): + mid_start = len(prefix_outside) + # choose the matching suffix option to compute end + sfx = next(sfx for sfx in suffix_opts if patch_code.endswith(sfx)) + mid_end = len(patch_code) - len(sfx) + replacement_content = patch_code[mid_start:mid_end] + # Ensure marker boundaries stay on their own lines. + # Add a leading newline only if there is a START marker. + if ( + start_match is not None + and replacement_content + and not replacement_content.startswith("\n") + ): + replacement_content = "\n" + replacement_content + # Add a trailing newline only if there is an END marker. + if ( + end_match is not None + and replacement_content + and not replacement_content.endswith("\n") + ): + replacement_content = replacement_content + "\n" + updated_content = ( + immutable_prefix + replacement_content + immutable_suffix + ) + else: + # Otherwise, assume the patch_code represents only the + # evolve-block payload and insert it directly between markers. + # Ensure proper newline handling around the patch content. + payload = patch_code + if ( + start_match is not None + and payload + and not payload.startswith("\n") + ): + payload = "\n" + payload + if end_match is not None and payload and not payload.endswith("\n"): + payload = payload + "\n" + updated_content = immutable_prefix + payload + immutable_suffix else: - # Multiple mutable regions, this is ambiguous + # Multiple EVOLVE-BLOCK regions found, ambiguous without markers error_message = ( "Multiple EVOLVE-BLOCK regions found but patch " "doesn't specify which to replace" ) return original, 0, None, error_message, None, None + else: + # Patch contains exactly one marker (START xor END). + # Only safe to apply when original has a single evolve region. + if len(mutable_ranges) != 1: + error_message = ( + "Patch contains only one EVOLVE-BLOCK marker, but the original " + f"has {len(mutable_ranges)} editable regions; cannot determine target" + ) + return original, 0, None, error_message, None, None + + # Single target region in original + start, end = mutable_ranges[0] + immutable_prefix = original[:start] + immutable_suffix = original[end:] + + # Find exact marker locations in original for newline policy + start_match = None + end_match = None + for m in EVOLVE_START.finditer(original): + if m.end() == start: + start_match = m + break + for m in EVOLVE_END.finditer(original): + if m.start() == end: + end_match = m + break + + # Compute outside-of-markers prefix/suffix from original + prefix_outside = ( + original[: start_match.start()] if start_match else immutable_prefix + ) + suffix_outside = ( + original[end_match.end() :] if end_match else immutable_suffix + ) + + # Extract payload based on which single marker is present in patch + if patch_has_start and not patch_has_end: + m = EVOLVE_START.search(patch_code) + payload = patch_code[m.end() :] if m else patch_code + # Trim footer if the patch included it + for sfx in (suffix_outside, suffix_outside.rstrip("\r\n")): + if sfx and payload.endswith(sfx): + payload = payload[: -len(sfx)] + break + elif patch_has_end and not patch_has_start: + m = EVOLVE_END.search(patch_code) + payload = patch_code[: m.start()] if m else patch_code + # Trim header if the patch included it + for pfx in (prefix_outside, prefix_outside.rstrip("\r\n")): + if pfx and payload.startswith(pfx): + payload = payload[len(pfx) :] + break + else: + payload = patch_code + + # Normalize newlines so markers remain on their own lines + if start_match is not None and payload and not payload.startswith("\n"): + payload = "\n" + payload + if end_match is not None and payload and not payload.endswith("\n"): + payload = payload + "\n" + + updated_content = immutable_prefix + payload + immutable_suffix # Add remaining immutable content after last mutable range - if patch_mutable_ranges and mutable_ranges: + if patch_has_both and mutable_ranges: updated_content += original[mutable_ranges[-1][1] :] num_applied = 1 @@ -146,6 +262,12 @@ def apply_full_patch( suffix = ".cpp" elif language == "cuda": suffix = ".cu" + elif language == "rust": + suffix = ".rs" + elif language == "swift": + suffix = ".swift" + elif language in ["json", "json5"]: + suffix = ".json" else: raise ValueError(f"Language {language} not supported") diff --git a/shinka/edit/async_apply.py b/shinka/edit/async_apply.py index 8e542c565..bf10b5b51 100644 --- a/shinka/edit/async_apply.py +++ b/shinka/edit/async_apply.py @@ -78,6 +78,30 @@ async def apply_patch_async( return None, 0, None, str(e), None, None +async def exec_language_tool( + *args: str, timeout: int +) -> Tuple[bool, Optional[str]]: + """Execute a language tool and return the result.""" + proc = await asyncio.create_subprocess_exec( + *args, + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + + try: + stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout) + except asyncio.TimeoutError: + proc.kill() + await proc.wait() + return False, f"Validation timeout after {timeout}s" + + if proc.returncode == 0: + return True, None + else: + error_msg = stderr.decode() if stderr else "Unknown compilation error" + return False, error_msg + + async def validate_code_async( code_path: str, language: str = "python", timeout: int = 30 ) -> Tuple[bool, Optional[str]]: @@ -94,54 +118,39 @@ async def validate_code_async( try: if language == "python": # Use python -m py_compile for syntax checking - proc = await asyncio.create_subprocess_exec( + return await exec_language_tool( "python", "-m", "py_compile", code_path, - stdout=asyncio.subprocess.PIPE, - stderr=asyncio.subprocess.PIPE, + timeout=timeout, + ) + elif language == "rust": + # Use rustc for Rust syntax checking + return await exec_language_tool( + "rustc", + "--crate-type=lib", + "-Zparse-only", + code_path, + timeout=timeout, ) - - try: - stdout, stderr = await asyncio.wait_for( - proc.communicate(), timeout=timeout - ) - except asyncio.TimeoutError: - proc.kill() - await proc.wait() - return False, f"Validation timeout after {timeout}s" - - if proc.returncode == 0: - return True, None - else: - error_msg = stderr.decode() if stderr else "Unknown compilation error" - return False, error_msg - elif language == "cpp": # Use g++ for C++ compilation check - proc = await asyncio.create_subprocess_exec( + return await exec_language_tool( "g++", "-fsyntax-only", code_path, - stdout=asyncio.subprocess.PIPE, - stderr=asyncio.subprocess.PIPE, + timeout=timeout, + ) + elif language == "swift": + # Use swiftc for Swift syntax checking + return await exec_language_tool( + "swiftc", + "-typecheck", + "-parse-as-library", + code_path, + timeout=timeout, ) - - try: - stdout, stderr = await asyncio.wait_for( - proc.communicate(), timeout=timeout - ) - except asyncio.TimeoutError: - proc.kill() - await proc.wait() - return False, f"Validation timeout after {timeout}s" - - if proc.returncode == 0: - return True, None - else: - error_msg = stderr.decode() if stderr else "Unknown compilation error" - return False, error_msg else: # For other languages, just check if file exists and is readable try: diff --git a/shinka/launch/scheduler.py b/shinka/launch/scheduler.py index 5782613ee..4e824c3ff 100644 --- a/shinka/launch/scheduler.py +++ b/shinka/launch/scheduler.py @@ -138,7 +138,13 @@ def _build_command(self, exec_fname_t: str, results_dir_t: str) -> List[str]: ] if self.config.extra_cmd_args: for k, v in self.config.extra_cmd_args.items(): - cmd.extend([f"--{k}", str(v)]) + # Handle boolean flags + if isinstance(v, bool): + if v: # Only append flag if True + cmd.append(f"--{k}") + else: + # For non-boolean values, append both flag and value + cmd.extend([f"--{k}", str(v)]) return cmd def run( diff --git a/shinka/llm/dynamic_sampling.py b/shinka/llm/dynamic_sampling.py index 6c038d9fa..eb0cd8cb3 100644 --- a/shinka/llm/dynamic_sampling.py +++ b/shinka/llm/dynamic_sampling.py @@ -28,7 +28,8 @@ def _logdiffexp(a_log, b_log): def _logexpm1(z): z = np.asarray(z, dtype=float) - return np.where(z > 50.0, z, np.log(np.expm1(z))) + with np.errstate(divide='ignore', invalid='ignore'): + return np.where(z > 50.0, z, np.log(np.expm1(z))) class BanditBase(ABC): @@ -433,12 +434,13 @@ def decay(self, factor: float) -> None: if self.use_exponential_scaling and self.asymmetric_scaling: # shrink in exp space to match original score scale s = self.s - log1p_term = np.where( - s > 0.0, - s + np.log(one_minus_factor + np.exp(-s)), - np.log1p(one_minus_factor * np.exp(s)), - ) - self.s = s + np.log(factor) - log1p_term + with np.errstate(divide='ignore', invalid='ignore'): + log1p_term = np.where( + s > 0.0, + s + np.log(one_minus_factor + np.exp(-s)), + np.log1p(one_minus_factor * np.exp(s)), + ) + self.s = s + np.log(factor) - log1p_term if self.adaptive_scale and np.isfinite(self._obs_max): means_log = self._mean() diff --git a/shinka/llm/embedding.py b/shinka/llm/embedding.py index a5c6b07cc..4082ad58b 100644 --- a/shinka/llm/embedding.py +++ b/shinka/llm/embedding.py @@ -1,5 +1,6 @@ import os import openai +import google.generativeai as genai import pandas as pd from typing import Union, List, Optional, Tuple import numpy as np @@ -20,13 +21,23 @@ "azure-text-embedding-3-large", ] +GEMINI_EMBEDDING_MODELS = [ + "gemini-embedding-exp-03-07", + "gemini-embedding-001", +] + OPENAI_EMBEDDING_COSTS = { "text-embedding-3-small": 0.02 / M, "text-embedding-3-large": 0.13 / M, } +# Gemini embedding costs (approximate - check current pricing) +GEMINI_EMBEDDING_COSTS = { + "gemini-embedding-exp-03-07": 0.0 / M, # Experimental model, often free + "gemini-embedding-001": 0.0 / M, # Check current pricing +} -def get_client_model(model_name: str) -> tuple[openai.OpenAI, str]: +def get_client_model(model_name: str) -> tuple[Union[openai.OpenAI, str], str]: if model_name in OPENAI_EMBEDDING_MODELS: client = openai.OpenAI() model_to_use = model_name @@ -38,6 +49,14 @@ def get_client_model(model_name: str) -> tuple[openai.OpenAI, str]: api_version=os.getenv("AZURE_API_VERSION"), azure_endpoint=os.getenv("AZURE_API_ENDPOINT"), ) + elif model_name in GEMINI_EMBEDDING_MODELS: + # Configure Gemini API + api_key = os.getenv("GEMINI_API_KEY") + if not api_key: + raise ValueError("GEMINI_API_KEY environment variable not set for Gemini models") + genai.configure(api_key=api_key) + client = "gemini" # Use string identifier for Gemini + model_to_use = model_name else: raise ValueError(f"Invalid embedding model: {model_name}") @@ -52,9 +71,10 @@ def __init__( Initialize the EmbeddingClient. Args: - model (str): The OpenAI embedding model name to use. + model (str): The OpenAI, Azure, or Gemini embedding model name to use. """ self.client, self.model = get_client_model(model_name) + self.model_name = model_name self.verbose = verbose def get_embedding( @@ -76,6 +96,34 @@ def get_embedding( single_code = True else: single_code = False + # Handle Gemini models + if self.model_name in GEMINI_EMBEDDING_MODELS: + try: + embeddings = [] + total_tokens = 0 + + for text in code: + result = genai.embed_content( + model=f"models/{self.model}", + content=text, + task_type="retrieval_document" + ) + embeddings.append(result['embedding']) + total_tokens += len(text.split()) + + cost = total_tokens * GEMINI_EMBEDDING_COSTS.get(self.model, 0.0) + + if single_code: + return embeddings[0] if embeddings else [], cost + else: + return embeddings, cost + except Exception as e: + logger.error(f"Error getting Gemini embedding: {e}") + if single_code: + return [], 0.0 + else: + return [[]], 0.0 + # Handle OpenAI and Azure models (same interface) try: response = self.client.embeddings.create( model=self.model, input=code, encoding_format="float" diff --git a/shinka/llm/models/pricing.py b/shinka/llm/models/pricing.py index c9c101a2c..91e965c75 100644 --- a/shinka/llm/models/pricing.py +++ b/shinka/llm/models/pricing.py @@ -35,6 +35,10 @@ "input_price": 3.0 / M, "output_price": 15.0 / M, }, + "claude-sonnet-4-5-20250929": { + "input_price": 3.0 / M, + "output_price": 15.0 / M, + }, } OPENAI_MODELS = { @@ -114,6 +118,10 @@ "input_price": 0.05 / M, "output_price": 0.4 / M, }, + "gpt-5.1": { + "input_price": 1.25 / M, + "output_price": 10.0 / M, + }, } @@ -141,6 +149,10 @@ "input_price": 0.1 / M, "output_price": 0.4 / M, }, + "gemini-3-pro-preview" : { + "input_price": 2.0 / M, + "output_price": 12.0 / M, + }, } BEDROCK_MODELS = { @@ -176,6 +188,7 @@ REASONING_CLAUDE_MODELS = [ "claude-3-7-sonnet-20250219", "claude-4-sonnet-20250514", + "claude-sonnet-4-5-20250929", ] REASONING_DEEPSEEK_MODELS = [ @@ -186,6 +199,7 @@ "gemini-2.5-pro", "gemini-2.5-flash", "gemini-2.5-flash-lite-preview-06-17", + "gemini-3-pro-preview", ] REASONING_AZURE_MODELS = [ diff --git a/shinka/llm/query.py b/shinka/llm/query.py index a7288df8e..c88c7d7c3 100644 --- a/shinka/llm/query.py +++ b/shinka/llm/query.py @@ -137,16 +137,13 @@ def sample_model_kwargs( r_effort = random.choice(reasoning_efforts) think_bool = r_effort != "auto" if think_bool: - thinking_tokens = [ - t - for t in THINKING_TOKENS.values() - if t < kwargs_dict["max_tokens"] and t >= 1024 - ] + t = THINKING_TOKENS[r_effort] + thinking_tokens = t if t < kwargs_dict["max_tokens"] else 1024 kwargs_dict["extra_body"] = { "extra_body": { "google": { "thinking_config": { - "thinking_budget": random.choice(thinking_tokens), + "thinking_budget": thinking_tokens, "include_thoughts": True, } } @@ -157,19 +154,17 @@ def sample_model_kwargs( REASONING_CLAUDE_MODELS + REASONING_BEDROCK_MODELS ): kwargs_dict["max_tokens"] = min(random.choice(max_tokens), 16384) - think_bool = random.choice(reasoning_efforts) != "auto" + r_effort = random.choice(reasoning_efforts) + think_bool = r_effort != "auto" if think_bool: # filter thinking tokens to be smaller than max_tokens # not auto THINKING_TOKENS - thinking_tokens = [ - t - for t in THINKING_TOKENS.values() - if t < kwargs_dict["max_tokens"] and t >= 1024 - ] + t = THINKING_TOKENS[r_effort] + thinking_tokens = t if t < kwargs_dict["max_tokens"] else 1024 # sample only from thinking tokens that are valid kwargs_dict["thinking"] = { "type": "enabled", - "budget_tokens": random.choice(thinking_tokens), + "budget_tokens": thinking_tokens, } else: diff --git a/shinka/webui/__init__.py b/shinka/webui/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/tests/test_edit_base.py b/tests/test_edit_base.py index edc0e1178..67c6f2e20 100644 --- a/tests/test_edit_base.py +++ b/tests/test_edit_base.py @@ -161,6 +161,110 @@ def new_func2(): # Should have replaced both evolve blocks with new content +def test_apply_full_patch_full_file_without_markers_extracts_block_only(): + """Full-file patch without EVOLVE markers should not copy immutable code + into the evolve block; only the block payload is replaced.""" + original_content = """# Header line\n# EVOLVE-BLOCK-START\nold_line()\n# EVOLVE-BLOCK-END\n# Footer line\n""" + + # Patch is the entire file content but with the EVOLVE markers omitted. + patch_content = """```python +new_line() +another_new_line() +```""" + + expected = """# Header line +# EVOLVE-BLOCK-START +new_line() +another_new_line() +# EVOLVE-BLOCK-END +# Footer line +""" + + result = apply_full_patch( + patch_str=patch_content, + original_str=original_content, + language="python", + verbose=False, + ) + updated_content, num_applied, output_path, error, patch_txt, diff_path = result + + assert error is None + assert num_applied == 1 + assert updated_content == expected + + +def test_apply_full_patch_patch_with_start_marker_only(): + """Patch has only START marker; original has both markers.""" + original_content = """# Header line +# EVOLVE-BLOCK-START +old_line() +# EVOLVE-BLOCK-END +# Footer line +""" + + patch_content = """```python +# Header line +# EVOLVE-BLOCK-START +new_line() +# Footer line +```""" + + expected = """# Header line +# EVOLVE-BLOCK-START +new_line() +# EVOLVE-BLOCK-END +# Footer line +""" + + result = apply_full_patch( + patch_str=patch_content, + original_str=original_content, + language="python", + verbose=False, + ) + updated_content, num_applied, output_path, error, patch_txt, diff_path = result + + assert error is None + assert num_applied == 1 + assert updated_content == expected + + +def test_apply_full_patch_patch_with_end_marker_only(): + """Patch has only END marker; original has both markers.""" + original_content = """# Header line +# EVOLVE-BLOCK-START +old_line() +# EVOLVE-BLOCK-END +# Footer line +""" + + patch_content = """```python +# Header line +new_line() +# EVOLVE-BLOCK-END +# Footer line +```""" + + expected = """# Header line +# EVOLVE-BLOCK-START +new_line() +# EVOLVE-BLOCK-END +# Footer line +""" + + result = apply_full_patch( + patch_str=patch_content, + original_str=original_content, + language="python", + verbose=False, + ) + updated_content, num_applied, output_path, error, patch_txt, diff_path = result + + assert error is None + assert num_applied == 1 + assert updated_content == expected + + def test_apply_full_patch_no_evolve_blocks(): """Test apply_full_patch with no EVOLVE-BLOCK regions - should error.""" original_content = """# Just regular code @@ -221,6 +325,41 @@ def new_function(): assert updated_content == original_content # Should return original content +def test_apply_full_patch_patch_with_single_marker_ambiguous_multiple_regions(): + """Single marker in patch is ambiguous when original has multiple regions.""" + original_content = """# Header +# EVOLVE-BLOCK-START +func1() +# EVOLVE-BLOCK-END + +# EVOLVE-BLOCK-START +func2() +# EVOLVE-BLOCK-END +# Footer +""" + + # Patch includes only START marker + patch_content = """```python +# Header +# EVOLVE-BLOCK-START +new_code() +# Footer +```""" + + updated_content, num_applied, output_path, error, patch_txt, diff_path = ( + apply_full_patch( + patch_str=patch_content, + original_str=original_content, + language="python", + verbose=False, + ) + ) + + assert num_applied == 0 + assert error is not None + assert "only one EVOLVE-BLOCK marker" in error + + def test_apply_full_patch_invalid_extraction(): """Test apply_full_patch with invalid code extraction.""" original_content = """# EVOLVE-BLOCK-START