Skip to content

Commit ab458fc

Browse files
authored
Merge pull request #53 from finitearth/feature/RewardTask
Feature/reward task
2 parents 1014ccf + 48dca49 commit ab458fc

61 files changed

Lines changed: 3313 additions & 1068 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.coverage

16 KB
Binary file not shown.

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ jobs:
3434
- name: Run tests with coverage
3535
run: |
3636
poetry run python -m pytest --junitxml=pytest.xml --cov-report=term-missing:skip-covered --cov=. tests/ > pytest-coverage.txt
37-
cat pytest-coverage.txt
3837
3938
- name: Generate coverage report & comment on PR
4039
id: coverageComment

.github/workflows/docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030

3131
- name: Generate notebook examples
3232
run: |
33-
poetry run jupyter nbconvert --to markdown --allow-errors --output-dir docs/examples notebooks/*.ipynb
33+
poetry run jupyter nbconvert --to markdown --allow-errors --output-dir docs/examples tutorials/*.ipynb
3434
3535
- name: Deploy docs
3636
run: |

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ results/
1111
poetry.lock
1212
CLAUDE.md
1313
**/CLAUDE.local.md
14+
.mypy_cache/

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,16 @@ repos:
1818
rev: 5.12.0
1919
hooks:
2020
- id: isort
21+
- repo: https://github.com/pre-commit/mirrors-mypy
22+
rev: v1.8.0
23+
hooks:
24+
- id: mypy
25+
files: ^promptolution/
26+
additional_dependencies:
27+
- types-requests
28+
- pandas-stubs
29+
- numpy
30+
args: [--explicit-package-bases, --config-file=pyproject.toml]
2131
- repo: https://github.com/pycqa/pydocstyle
2232
rev: 6.3.0
2333
hooks:

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
![promptolution](https://github.com/user-attachments/assets/84c050bd-61a1-4f2e-bc4e-874d9b4a69af)
22

3-
![Coverage](https://img.shields.io/badge/Coverage-89%25-green)
3+
![Coverage](https://img.shields.io/badge/Coverage-91%25-brightgreen)
44
[![CI](https://github.com/finitearth/promptolution/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/finitearth/promptolution/actions/workflows/ci.yml)
55
[![Docs](https://github.com/finitearth/promptolution/actions/workflows/docs.yml/badge.svg?branch=main)](https://github.com/finitearth/promptolution/actions/workflows/docs.yml)
66
![Code Style](https://img.shields.io/badge/Code%20Style-black-black)
77
![Python Versions](https://img.shields.io/badge/Python%20Versions-≥3.10-blue)
8-
9-
8+
[![Getting Started](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb)
109

1110
Promptolution is a library that provides a modular and extensible framework for implementing prompt tuning for single tasks and larger experiments. It offers a user-friendly interface to assemble the core components for various prompt optimization tasks.
1211

@@ -36,7 +35,7 @@ to install the necessary dependencies. You might need to install [pipx](https://
3635

3736
## Usage
3837

39-
To get started right away, take a look at our [getting started notebook](https://github.com/finitearth/promptolution/blob/main/notebooks/getting_started.ipynb).
38+
To get started right away, take a look at our [getting started notebook](https://github.com/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb) and our [other demos and tutorials](https://github.com/finitearth/promptolution/blob/main/tutorials).
4039
For more details, a comprehensive **documentation** with API reference is availabe at https://finitearth.github.io/promptolution/.
4140

4241
### Featured Optimizers

docs/examples/getting_started.md

Lines changed: 156 additions & 309 deletions
Large diffs are not rendered by default.
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Getting Started: LLM as a Judge with Promptolution
2+
3+
## Welcome to Promptolution!
4+
5+
Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to one of Promptolution's most advanced features: LLM as a Judge.
6+
7+
While the standard getting_started notebook shows how to optimize for classification tasks, this guide will focus on something different. We'll optimize prompts for a creative task where there's no single "correct" answer: *Finding an optimal argument for a statement*!
8+
9+
## Intro
10+
In traditional machine learning and prompt optimization, we often rely on labeled data. For a classification task, you need an input (x) and a corresponding ground-truth label (y). The goal is to find a prompt that helps the model predict y correctly.
11+
But what if your task is more subjective? How do you "label" things like:
12+
13+
- The quality of a generated argument?
14+
- The creativity of a story?
15+
- The helpfulness of a summary?
16+
- The persuasiveness of an essay?
17+
18+
This is where LLM as a Judge comes in. Instead of relying on a pre-defined dataset of labels, we use another powerful Language Model (the "judge") to score the output of our prompts. The process looks like this:
19+
20+
A candidate prompt is used to generate a response (e.g., an argument).
21+
A "judge" LLM then evaluates this response based on the task provided and assigns a score.
22+
Promptolution's optimizer uses these scores to identify which prompts are best and evolves them to generate even better responses.
23+
24+
The beauty of this approach is its flexibility. While you can provide groundtruths (in case there is a correct answer) and let the LLM judge itself if both the prediction and the correct answer are equivalent - you don't need to.
25+
26+
*New to Promptolution? If you haven't seen our classification tutorial yet, check out `getting_started.ipynb` first! It covers the basics of prompt optimization with simpler tasks like text classification. This notebook builds on those concepts but tackles more complex, subjective tasks.*
27+
28+
## Installation
29+
Install Promptolution with a single command
30+
31+
32+
```python
33+
! pip install promptolution[api]
34+
```
35+
36+
## Imports
37+
38+
39+
```python
40+
import pandas as pd
41+
from promptolution.utils import ExperimentConfig
42+
from promptolution.helpers import run_experiment
43+
import nest_asyncio
44+
45+
nest_asyncio.apply() # Required for notebook environments
46+
```
47+
48+
## Setting Up Your Experiment
49+
50+
### Prepare the data
51+
52+
For this tutorial, we're using IBM's Argument Quality Ranking dataset - a collection of crowd-sourced arguments on controversial topics like capital punishment, abortion rights, and climate change.
53+
54+
Unlike classification tasks where you have clear input-output pairs, here we're working with debate topics that we want to generate compelling arguments for.
55+
56+
57+
```python
58+
df = pd.read_csv("hf://datasets/ibm-research/argument_quality_ranking_30k/dev.csv").sample(300)
59+
```
60+
61+
62+
```python
63+
print("\nSample topics:")
64+
for topic in df["topic"].unique()[:3]:
65+
print(f"- {topic}")
66+
```
67+
68+
69+
Sample topics:
70+
- We should adopt a zero-tolerance policy in schools
71+
- Payday loans should be banned
72+
- Intelligence tests bring more harm than good
73+
74+
75+
Our task: **Given a controversial statement, generate the strongest possible argument supporting that position.**
76+
77+
Let's look at what we're working with:
78+
79+
### Creating Inital Prompts
80+
81+
Here are some starter prompts for generating compelling arguments. Feel free to experiment with your own!
82+
83+
84+
```python
85+
init_prompts = [
86+
"Create a strong argument for this position with clear reasoning and examples:",
87+
"Write a persuasive argument supporting this statement. Include evidence and address counterarguments:",
88+
"Make a compelling case for this viewpoint using logical reasoning and real examples:",
89+
"Argue convincingly for this position. Provide supporting points and evidence:",
90+
"Build a strong argument for this statement with clear structure and solid reasoning:",
91+
"Generate a persuasive argument supporting this position. Use facts and logical flow:",
92+
"Create a well-reasoned argument for this viewpoint with supporting evidence:",
93+
"Write a convincing argument for this position. Include examples and counter opposing views:",
94+
"Develop a strong case supporting this statement using clear logic and evidence:",
95+
"Construct a persuasive argument for this position with solid reasoning and examples:",
96+
]
97+
```
98+
99+
### Configure Your LLM
100+
101+
For this demonstration, we will again use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI by simply changing the `api_url` and `model_id`.
102+
103+
104+
```python
105+
api_key = "YOUR_API_KEY" # Replace with your Promptolution API key
106+
```
107+
108+
Here are the key parameters for LLM-as-a-Judge tasks:
109+
110+
111+
```python
112+
config = ExperimentConfig(
113+
optimizer="evopromptga",
114+
task_description="Given a statement, find the best argument supporting it.",
115+
x_column="topic",
116+
prompts=init_prompts,
117+
n_steps=3,
118+
n_subsamples=10,
119+
subsample_strategy="random_subsample",
120+
api_url="https://api.deepinfra.com/v1/openai",
121+
model_id="meta-llama/Meta-Llama-3-8B-Instruct",
122+
api_key=api_key,
123+
task_type="judge",
124+
)
125+
```
126+
127+
- `task_type="judge"` - This tells Promptolution to use LLM evaluation instead of accuracy metrics
128+
- `x_column="topic"` - We specify which column contains our input (debate topics)
129+
- `optimizer="evopromptga"` - In the classification task we show cased CAPO, here we are using EvoPrompt, a strong evolutionary prompt optimizer.
130+
- No y column needed - the judge will evaluate quality without ground truth labels!
131+
132+
## Run Your Experiment
133+
134+
With everything configured, you're ready to optimize your prompts! The run_experiment function will:
135+
136+
1. Evaluate your initial prompts by generating arguments and having the judge LLM score them
137+
1. Use evolutionary operators (mutation, crossover) to create new prompt variations from the 1. best-performing ones
138+
1. Test these new prompt candidates and select the fittest ones for the next generation
139+
1. Repeat this evolutionary process for the specified number of steps, gradually improving prompt 1. quality
140+
141+
142+
```python
143+
prompts = run_experiment(df, config)
144+
```
145+
146+
🔥 Starting optimization...
147+
148+
149+
You can expect this to take several minutes as the optimizer generates arguments, evaluates them with the judge, and evolves the prompts.
150+
151+
152+
```python
153+
prompts
154+
```
155+
156+
157+
158+
159+
<div>
160+
<style scoped>
161+
.dataframe tbody tr th:only-of-type {
162+
vertical-align: middle;
163+
}
164+
165+
.dataframe tbody tr th {
166+
vertical-align: top;
167+
}
168+
169+
.dataframe thead th {
170+
text-align: right;
171+
}
172+
</style>
173+
<table border="1" class="dataframe">
174+
<thead>
175+
<tr style="text-align: right;">
176+
<th></th>
177+
<th>prompt</th>
178+
<th>score</th>
179+
</tr>
180+
</thead>
181+
<tbody>
182+
<tr>
183+
<th>0</th>
184+
<td>Construct a persuasive argument supporting the given statement, relying on logical coherence and evidence-based reasoning.</td>
185+
<td>0.931500</td>
186+
</tr>
187+
<tr>
188+
<th>1</th>
189+
<td>Develop a strong case supporting this statement using clear logic and evidence:</td>
190+
<td>0.924167</td>
191+
</tr>
192+
<tr>
193+
<th>2</th>
194+
<td>Construct a convincing case supporting the stated argument, providing evidence and responding to potential objections.</td>
195+
<td>0.915833</td>
196+
</tr>
197+
<tr>
198+
<th>3</th>
199+
<td>Develop a well-reasoned argument in favor of the given statement, incorporating reliable examples and addressing potential counterpoints.</td>
200+
<td>0.913333</td>
201+
</tr>
202+
<tr>
203+
<th>4</th>
204+
<td>Write a persuasive argument supporting this statement. Include evidence and address counterarguments:</td>
205+
<td>0.907500</td>
206+
</tr>
207+
<tr>
208+
<th>5</th>
209+
<td>Present a convincing case for this assertion, incorporating logical premises and applicable examples.</td>
210+
<td>0.903333</td>
211+
</tr>
212+
<tr>
213+
<th>6</th>
214+
<td>Fortify the provided statement with a robust and well-reasoned argument, underscoring logical relationships and leveraging empirical support to build a compelling case, while also anticipating and addressing potential counterpoints.</td>
215+
<td>0.902500</td>
216+
</tr>
217+
<tr>
218+
<th>7</th>
219+
<td>Construct a strong claim in support of this statement, employing a logical framework and relevant examples to make a convincing case.</td>
220+
<td>0.891667</td>
221+
</tr>
222+
<tr>
223+
<th>8</th>
224+
<td>Create a well-reasoned argument for this viewpoint with supporting evidence:</td>
225+
<td>0.888333</td>
226+
</tr>
227+
<tr>
228+
<th>9</th>
229+
<td>Extract the most compelling supporting argument for this statement, grounding it in logical reasoning and bolstered by relevant evidence and examples.</td>
230+
<td>0.697500</td>
231+
</tr>
232+
</tbody>
233+
</table>
234+
</div>
235+
236+
237+
238+
The best prompts aren't always the most obvious ones - let the optimizer surprise you with what works!
239+
240+
241+
Happy prompt optimizing! 🚀✨ We can't wait to see what you build with Promptolution! 🤖💡
242+
243+
244+
```python
245+
246+
```

0 commit comments

Comments
 (0)