Skip to content

Candidate evaluation issue #402

@Harshith35

Description

@Harshith35

I'm using Openevolve for a custom java program synthesis task, specifically mutating a JUnit test file (TestFile.java) to resolve compilation errors and test failures when paired with a fixed implementation files (multiple CodeFile.java files).

Major Issue is that:
Evolution trace reveals that the LLM is generating new candidate code each iteration, but metrics doesn't change at all despite issues and failures being fixed by LLM. When I debugged the source code, I found that everytime a candidate code being passed to evaluate function, it is always the same initial java program file content. I'm unsure why the updated candidate code generated by LLM not being passed to evaluate function. And because of that the best program is the initial program itself.

Few minor issues/obstacles are:

  1. LLM is not following the evolve start/end blocks constraint, it is updating code outside of those blocks as well.
  2. Sometimes LLM doesn't address the compilation errors though the error message has been sent as an artifact to it. It just completely ignores that and makes some ofher changes.
  3. As multiple java files included for my use case, I think LLM has access to only check the content of TestFile.java, which I'll be passing as initial program. Since it has no idea of what the implementation of methods that are being used in TestFile.java, it's unable to discriminate which is an existing method and which is not.

Information:
I'm using qwen 2.5 32b instructodel and a normal config file with artifacts enabled, cascade evaluation and llm feedback disabled, 3 top programs, 2 diverse programs, 3 islands and I was running for 3 iterations.

Please suggest if there are any other preffered ways of using Openevolve for my use case.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions