Skip to content

Add an Eval harness #1070

@methedude

Description

@methedude

Pre-checks

  • I searched existing issues and discussions

What problem are you trying to solve?

A killer feature would be to have an eval harness to run different models and see how they compare to each other across the most popular evals available.

What would you like NexaSDK to do?

- Add Eval harness that allows the user to select the most popular evals available, as well as custom evals via .json.

Alternatives you've considered


Who does this help, and how much?


Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions