feat: add `EvolInstruct` alike methods to `camel/datagen` #1747

ZIYU-DEEP · 2025-03-08T22:56:58Z

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).

Fixes #1737. Changes made in:

./examples/datagen/evol_instruct
./camel/datagen/evol_instruct

Checklist

Go over all the following points, and put an x in all the boxes that apply.

I have read the CONTRIBUTION guide (required)
I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
I have updated the tests accordingly (required for a bug fix or a new feature)
I have updated the documentation if needed:
I have added examples if this is a new feature

Notes for Reviewers

The current data handling of EvolInstruct and SelfInstruct differs and could be improved. Let's discuss how to better align them with a base class?

review-notebook-app · 2025-03-08T22:57:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ZIYU-DEEP · 2025-03-08T23:54:11Z

camel/datagen/evol_instruct/evol_instruct.py

+                # simulate random scores in range (1, 10) for now
+                scores = [random.randint(1, 10) for _ in batch_results[1:]] if keep_original else [random.randint(1, 10) for _ in batch_results]
+            else:
+                # TODO: implement instruction scoring module, e.g., complexity/quality scorer or by reward advantage


left a future feature on scorer which evaluates instructions, that can be rule-based or by a generative agent. some references:

https://arxiv.org/pdf/2312.15685 using instruction complexity (by llm judge) as the score

https://arxiv.org/pdf/2411.00062 using reward advantage as the score

other metric for data selection/sampling: perplexities, reward variance, ...

ZIYU-DEEP · 2025-03-09T00:31:17Z

camel/datagen/evol_instruct/templates.py

+    IN_BREADTH_KEYS = ['persona', 'shift-in', 'shift-out', 'mix', 'abstract'] 
+    IN_DEPTH_KEYS = ['constraints', 'deepening', 'concretizing', 'reasoning', 'expansion']
+
+    EVOL_METHODS = {


notes: we can define more domain-specific templates (e.g., for math/coding/...).

also, currently the evolving happens independently for each prompt (x' ~ LLM( | x, ins)); we should improve this later so that the evolving becomes multi-prompt / group based (x' ~ LLM( | a cluster of x, ins)), where the LLM can crossover and mutate in a group.

regarding the prompt groups -- some time ago, @lightaime mentioned message-passing based sampling. we can also include support for this in our pipeline.

zjrwtx

Great thanks for your work @ZIYU-DEEP ,but some docstring need to be polished

zjrwtx · 2025-03-09T04:27:36Z

camel/datagen/evol_instruct/evol_instruct.py

+        self, 
+        agent: ChatAgent,
+    ):
+        """


Suggested change

"""

r"""

zjrwtx

great thanks for your work @ZIYU-DEEP

Zhangzeyu97 · 2025-03-16T02:28:24Z

Ziyu @ZIYU-DEEP and I had a discussion about the current evol-instruct and identified the following areas for improvement:

Support for user-defined EvolInstructTemplates: The current templates are designed for general instruction. However, applying evol-instruct to specific domains requires modifications to the meta prompt and evol methods accordingly.
Provide a template example based on the advanced math domain.
Implement an LLM-based scorer to evaluate aspects such as complexity and diversity.

We will collaborate to improve these aspects. If you have any ideas, feel free to discuss them with us!

Wendong-Fan

thanks @ZIYU-DEEP for the contribution and sorry for the late review, left some comments below, we also need to add unit test to this feature, please run pre-commit run --all-files locally in your terminal to check the pre commit error now existing~

camel/datagen/evol_instruct/evol_instruct.py

Wendong-Fan · 2025-03-22T09:50:02Z

camel/datagen/evol_instruct/evol_instruct.py

+
+    def _set_method(
+        self, 
+        method: Optional[Union[str, List[str]]] = "uniform",


string could be too general? how about use Literal

method: Optional[Union[Literal["uniform", "in-breadth", "in-depth",....]

camel/datagen/evol_instruct/evol_instruct.py

Wendong-Fan · 2025-03-22T09:52:40Z

camel/datagen/evol_instruct/evol_instruct.py

+    def _generate_single(
+        self, 
+        prompt: str,  # for a single prompt
+        method: str = "uniform",


use Literal?

camel/datagen/evol_instruct/evol_instruct.py

Wendong-Fan · 2025-03-22T09:59:18Z

camel/datagen/evol_instruct/evol_instruct.py

+            else:
+                # TODO: implement instruction scoring module, e.g., complexity/quality scorer or by reward advantage
+                raise NotImplementedError(f"Scorer '{scorer}' is not implemented.")
+
+            # select the prompt with the highest score
+            best_index = scores.index(max(scores))
+            current_prompt = batch_results[best_index + 1][0] if keep_original else batch_results[best_index][0]


seems if best_index is the last element in scores (e.g., if the last generated prompt has the highest score), then best_index + 1 would point to an element beyond the bounds of batch_results?

keep_original will add one more prompt to the list prior to this.

camel/datagen/evol_instruct/evol_instruct.py

Wendong-Fan · 2025-03-22T10:03:20Z

examples/datagen/evol_instruct/evol_instruct.ipynb

could we move the .ipynb under docs/cookbooks/data_generation?

ZIYU-DEEP · 2025-03-24T15:53:14Z

Thanks a lot @Wendong-Fan! just resolved some minor issues. converting this PR a draft and handing over to @Zhangzeyu97 to work on the feat/Intergrate-Evol-Instruct branch under the camel repo with new features!

ZIYU-DEEP added 9 commits March 7, 2025 23:08

start on evolinstruct

b9fe3a9

init dataclass

41ddee3

init evolinstruct

dc74b6b

fix chunk

b289db1

fix chunk

dc79534

fix evol_instruct

ed08f1b

update examples

d3b79fc

Merge branch 'camel-ai:master' into evol-draft

44eac3e

notes on evolinstruct

117bbcf

ZIYU-DEEP added the New Feature label Mar 8, 2025

ZIYU-DEEP requested review from Wendong-Fan, lightaime and old-hallerite March 8, 2025 22:56

ZIYU-DEEP self-assigned this Mar 8, 2025

ZIYU-DEEP commented Mar 8, 2025

View reviewed changes

ZIYU-DEEP commented Mar 9, 2025

View reviewed changes

zjrwtx reviewed Mar 9, 2025

View reviewed changes

Merge branch 'master' into evol-draft

70dec07

Wendong-Fan added this to Project Camel Mar 10, 2025

Wendong-Fan added this to the Sprint 25 milestone Mar 10, 2025

ZIYU-DEEP added 3 commits March 10, 2025 11:44

fix docstrings

2f08fe7

Merge branch 'master' into evol-draft

78fa4d8

Merge branch 'camel-ai:master' into evol-draft

a5c3be5

zjrwtx approved these changes Mar 11, 2025

View reviewed changes

Wendong-Fan changed the title ~~[feat] add EvolInstruct alike methods to camel/datagen~~ feat: add EvolInstruct alike methods to camel/datagen Mar 11, 2025

Wendong-Fan reviewed Mar 22, 2025

View reviewed changes

zjrwtx self-requested a review March 23, 2025 06:29

fix: evol_instruct.py formats

9f9374d

ZIYU-DEEP marked this pull request as draft March 24, 2025 15:37

Zhangzeyu97 mentioned this pull request Mar 25, 2025

feat: add Evol-Instruct alike methods to camel/datagen #1990

Merged

6 tasks

feat: add EvolInstruct alike methods to camel/datagen #1747

Are you sure you want to change the base?

feat: add EvolInstruct alike methods to camel/datagen #1747

Conversation

ZIYU-DEEP commented Mar 8, 2025

Description

Checklist

Notes for Reviewers

Uh oh!

review-notebook-app bot commented Mar 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjrwtx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjrwtx left a comment

Choose a reason for hiding this comment

Uh oh!

Zhangzeyu97 commented Mar 16, 2025

Uh oh!

Wendong-Fan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZIYU-DEEP commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add `EvolInstruct` alike methods to `camel/datagen` #1747

feat: add `EvolInstruct` alike methods to `camel/datagen` #1747