Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions openevolve/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -861,6 +861,7 @@ def _calculate_feature_coords(self, program: Program) -> List[int]:
# Use code length as complexity measure
complexity = len(program.code)
bin_idx = self._calculate_complexity_bin(complexity)
program.complexity = bin_idx # Store complexity bin in program
coords.append(bin_idx)
Comment on lines 861 to 865
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning the bin index into Program.complexity/Program.diversity is semantically ambiguous (the dataclass defines these as derived feature values, currently typed as float). Consider either casting to float for consistency, or introducing explicit fields like complexity_bin/diversity_bin to avoid confusing bins with raw feature values.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

elif dim == "diversity":
# Use cached diversity calculation with reference set
Expand All @@ -869,6 +870,7 @@ def _calculate_feature_coords(self, program: Program) -> List[int]:
else:
diversity = self._get_cached_diversity(program)
bin_idx = self._calculate_diversity_bin(diversity)
program.diversity = bin_idx # Store diversity bin in program
coords.append(bin_idx)
Comment on lines 867 to 874
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the cold-start branch (len(self.programs) < 2) diversity’s bin_idx is forced to 0 but program.diversity is not updated, so saved programs may still show the default value rather than the computed bin. Set program.diversity in this branch as well for consistency with the complexity handling.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 860 to 874
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage: this change is intended to persist built-in feature bins into the saved Program, but there’s no test asserting that Program.complexity/diversity are updated after coordinate calculation/add(). Add a unit test that loads/saves a program and verifies these fields are non-default when built-in dimensions are used.

Copilot uses AI. Check for mistakes.
elif dim == "score":
# Use average of numeric metrics
Expand Down
2 changes: 1 addition & 1 deletion openevolve/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ async def evaluate_program(
accuracy = eval_result.metrics["combined_score"]
# Combine with LLM average (70% accuracy, 30% LLM quality)
eval_result.metrics["combined_score"] = (
accuracy * 0.7 + llm_average * 0.3
accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight
)
Comment on lines 213 to 215
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined_score now depends on llm_feedback_weight, but there’s no guard ensuring the weight is within [0.0, 1.0]. If a user misconfigures this, the score can become negative or exceed expected bounds; consider clamping or raising a clear config error before using it here.

Copilot uses AI. Check for mistakes.
Comment on lines +214 to 215
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line exceeds the configured Black line length (100) and is missing spaces around operators (e.g., 1 - weight). Reformatting will improve readability and avoid formatting/lint churn in future diffs.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 208 to 215
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage: the combined_score weighting behavior changed to depend on llm_feedback_weight, but there doesn’t appear to be a unit test asserting the new formula. Adding a focused test (including edge weights like 0.0/1.0) would prevent regressions.

Copilot uses AI. Check for mistakes.

# Store artifacts if enabled and present
Expand Down
Loading