Evaluation function based on LLM grading #7

msaelices · 2023-05-24T10:42:52Z

Changes

New is_correct() evaluation function, which asks an LLM model to return if a response is correct

Proof-of-life

…eturn if a response is correct

for more information, see https://pre-commit.ci

…e actual accuracy. Replace predicted with expected

edwardmfho · 2023-06-07T11:34:34Z

Test-ran the example cases using gpt-3.5-turbo instead of the default gpt-4 (still waiting for the API). It is a good function to be added in.

edwardmfho

Looks good.

mistercrunch · 2023-06-19T19:37:16Z

I like the idea, the only thing is the implementation is very openai-specific, which is probably fine as long as we make it clear. How about we break down evals.py into evals/__init__.py and evals/openai.py.

Goal would be to import is_correct from promptimize.evals.openai

edwardmfho · 2023-06-19T21:45:45Z

Should we begin with some sort of base eval function/class that could be used for other LLMs?

lain5etf7w · 2023-06-25T10:34:52Z

[email protected]

msaelices and others added 4 commits May 24, 2023 12:37

feat: New is_correct() evaluation function which ask a LLM model to r…

290a0eb

…eturn if a response is correct

[pre-commit.ci] auto fixes from pre-commit.com hooks

ff6ac67

for more information, see https://pre-commit.ci

feat: Improve the prompt to prioritize the expected answer and not th…

2af42a5

…e actual accuracy. Replace predicted with expected

feat: Fixed example after replacing predicted to expected

b77e30c

edwardmfho approved these changes Jun 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation function based on LLM grading #7

Evaluation function based on LLM grading #7

Uh oh!

msaelices commented May 24, 2023

Uh oh!

edwardmfho commented Jun 7, 2023

Uh oh!

edwardmfho left a comment

Uh oh!

mistercrunch commented Jun 19, 2023 •

edited

Loading

Uh oh!

edwardmfho commented Jun 19, 2023

Uh oh!

lain5etf7w commented Jun 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Evaluation function based on LLM grading #7

Are you sure you want to change the base?

Evaluation function based on LLM grading #7

Uh oh!

Conversation

msaelices commented May 24, 2023

Changes

Proof-of-life

Uh oh!

edwardmfho commented Jun 7, 2023

Uh oh!

edwardmfho left a comment

Choose a reason for hiding this comment

Uh oh!

mistercrunch commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edwardmfho commented Jun 19, 2023

Uh oh!

lain5etf7w commented Jun 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mistercrunch commented Jun 19, 2023 •

edited

Loading