Skip to content

feat: Declarative eval #7315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

CakeCrusher
Copy link

Description

Adds declarative_eval and transform_field_mappings_for_explanation to arize-phoenix-evals

Offline evaluator function designed after llm_classify for declarative evaluations enabling the user to describe evaluations through a Pydantic BaseModel subclass.

This enables free-form extensive evaluations according to the user's needs. Useful when llm_classify or other specialized evaluator functions are insufficient or the user may need multiple evaluations but wants to minimize overhead.

Note: This PR is 45% me learning the repo and dev flow, 45% I find it useful, 10% speculate others may find it useful.

Expample use

(copied from docstring)

# Define a schema with nested models
class Conciseness(BaseModel):
    is_concise: bool = Field(..., description="Whether the output is concise")

class Formatting(BaseModel):
    language: Literal["High", "Average", "Low"] = Field(
        ..., description="The complexity of the formatting used in the output"
    )

class Schema(BaseModel):
    conciseness: Conciseness = Field(..., description="A custom evaluation of the output")
    formatting: Formatting = Field(..., description="A custom evaluation of the output")

# Prepare sample data
data = pd.DataFrame({
    "attributes.llm.input_messages": [
        [{"role": "user", "content": "What is 2+2?"}],
        [{"role": "user", "content": "Who was the first president?"}],
    ],
    "attributes.llm.output_messages": [
        [{"role": "assistant", "content": "Whenever you add those two numbers, you get 4"}],
        [{"role": "assistant", "content": "George Washington"}],
    ],
})

# Define field mappings
field_mappings = {
    "conciseness.label": "conciseness.is_concise",
    "formatting.label": "formatting.language",
}

# Run the evaluation
result = await declarative_eval(
    data=data,
    model=openai_client,
    schema=Schema,
    field_mappings=field_mappings,
)

# Result will be a DataFrame with columns:
# - conciseness.label (containing boolean values)
# - formatting.label (containing "High", "Average", or "Low")
# - execution_seconds (execution time)
# - exceptions (any errors encountered)

Test coverage

  • Success with base fields
  • Success with provide_explanation
  • Failure with incorrect field_mappings

TODO

  • Enable synchronous evaluation of rows (could use generator but will require updating OpenAIModel)
  • Integrate it for use in online "Tasks"
  • Abstraction to upsert eval into project (potentially using Client(...).log_evaluations_sync(...))

@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Apr 28, 2025
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 28, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CakeCrusher CakeCrusher changed the title Declarative eval feat: Declarative eval Apr 28, 2025
@CakeCrusher
Copy link
Author

I see im getting errors with importing the new functions, not sure How concerned I should be:

=================================== ERRORS ====================================
_________ ERROR collecting tests/unit/evals/test_declarative_eval.py __________
ImportError while importing test module 'D:\a\phoenix\phoenix\tests\unit\evals\test_declarative_eval.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
unit\evals\test_declarative_eval.py:12: in <module>
    from phoenix.evals import declarative_eval, transform_field_mappings_for_explanation
E   ImportError: cannot import name 'declarative_eval' from 'phoenix.evals' (D:\a\phoenix\phoenix\.tox\unit_tests\Lib\site-packages\phoenix\evals\__init__.py)

As discussed with @cephalization in https://arize-ai.slack.com/archives/C018252LE1E/p1745724863399289?thread_ts=1745694183.495699&cid=C018252LE1E the import was working as long as arize-phoenix-evals was downloaded from local.
The tests are working with tox run -e unit_tests_local_evals -- PATH_TO/test_declarative_eval.py or tox run -e unit_tests_local_evals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
Status: 📘 Todo
Development

Successfully merging this pull request may close these issues.

5 participants