feat: Declarative eval #7315

CakeCrusher · 2025-04-28T06:30:13Z

Description

Adds declarative_eval and transform_field_mappings_for_explanation to arize-phoenix-evals

Offline evaluator function designed after llm_classify for declarative evaluations enabling the user to describe evaluations through a Pydantic BaseModel subclass.

This enables free-form extensive evaluations according to the user's needs. Useful when llm_classify or other specialized evaluator functions are insufficient or the user may need multiple evaluations but wants to minimize overhead.

Note: This PR is 45% me learning the repo and dev flow, 45% I find it useful, 10% speculate others may find it useful.

Expample use

(copied from docstring)

# Define a schema with nested models
class Conciseness(BaseModel):
    is_concise: bool = Field(..., description="Whether the output is concise")

class Formatting(BaseModel):
    language: Literal["High", "Average", "Low"] = Field(
        ..., description="The complexity of the formatting used in the output"
    )

class Schema(BaseModel):
    conciseness: Conciseness = Field(..., description="A custom evaluation of the output")
    formatting: Formatting = Field(..., description="A custom evaluation of the output")

# Prepare sample data
data = pd.DataFrame({
    "attributes.llm.input_messages": [
        [{"role": "user", "content": "What is 2+2?"}],
        [{"role": "user", "content": "Who was the first president?"}],
    ],
    "attributes.llm.output_messages": [
        [{"role": "assistant", "content": "Whenever you add those two numbers, you get 4"}],
        [{"role": "assistant", "content": "George Washington"}],
    ],
})

# Define field mappings
field_mappings = {
    "conciseness.label": "conciseness.is_concise",
    "formatting.label": "formatting.language",
}

# Run the evaluation
result = await declarative_eval(
    data=data,
    model=openai_client,
    schema=Schema,
    field_mappings=field_mappings,
)

# Result will be a DataFrame with columns:
# - conciseness.label (containing boolean values)
# - formatting.label (containing "High", "Average", or "Low")
# - execution_seconds (execution time)
# - exceptions (any errors encountered)

Test coverage

Success with base fields
Success with provide_explanation
Failure with incorrect field_mappings

TODO

Enable synchronous evaluation of rows (could use generator but will require updating OpenAIModel)
Integrate it for use in online "Tasks"
Abstraction to upsert eval into project (potentially using Client(...).log_evaluations_sync(...))

* docs(client): add general rules for the client * cleanup * Update packages/phoenix-client/.cursor/rules/general.mdc Co-authored-by: Roger Yang <[email protected]> * Update packages/phoenix-client/.cursor/rules/general.mdc Co-authored-by: Roger Yang <[email protected]> * Update packages/phoenix-client/.cursor/rules/general.mdc Co-authored-by: Roger Yang <[email protected]> --------- Co-authored-by: Roger Yang <[email protected]>

review-notebook-app · 2025-04-28T06:30:18Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

CakeCrusher · 2025-04-28T06:51:58Z

I see im getting errors with importing the new functions, not sure How concerned I should be:

=================================== ERRORS ====================================
_________ ERROR collecting tests/unit/evals/test_declarative_eval.py __________
ImportError while importing test module 'D:\a\phoenix\phoenix\tests\unit\evals\test_declarative_eval.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
unit\evals\test_declarative_eval.py:12: in <module>
    from phoenix.evals import declarative_eval, transform_field_mappings_for_explanation
E   ImportError: cannot import name 'declarative_eval' from 'phoenix.evals' (D:\a\phoenix\phoenix\.tox\unit_tests\Lib\site-packages\phoenix\evals\__init__.py)

As discussed with @cephalization in https://arize-ai.slack.com/archives/C018252LE1E/p1745724863399289?thread_ts=1745694183.495699&cid=C018252LE1E the import was working as long as arize-phoenix-evals was downloaded from local.
The tests are working with tox run -e unit_tests_local_evals -- PATH_TO/test_declarative_eval.py or tox run -e unit_tests_local_evals

graphite-app · 2025-05-31T21:34:53Z

packages/phoenix-evals/src/phoenix/evals/declarative.py

+        index=dataframe_index,
+    )
+
+    return results_data


This return results_data statement on line 408 is unreachable code since the function already returns a DataFrame on line 406. This redundant return statement should be removed to avoid confusion.

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

CakeCrusher and others added 12 commits April 26, 2025 02:53

pre dynamic executor

3b0ad5c

prepare for executor integration & logging & cleanup

be26c84

declarative eval tests

4a37d7a

fix: Allow scroll on settings pages (Arize-ai#7284)

663f766

chore(main): release arize-phoenix 8.27.1 (Arize-ai#7285)

854dc2e

chore: update Phoenix version to 8.27.1 in Kustomize (Arize-ai#7286)

1cb5914

update sessions notebook (Arize-ai#7293)

91b6d58

docs: No subject (GITBOOK-1192)

1325c21

docs: phoenix demo updates (GITBOOK-1191)

85e0072

declarative eval formatting and linting

edbc8c1

declarative eval and util docs

ad06fb1

github-project-automation bot moved this to 📘 Todo in phoenix Apr 28, 2025

github-project-automation bot added this to phoenix Apr 28, 2025

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 28, 2025

CakeCrusher changed the title ~~Declarative eval~~ feat: Declarative eval Apr 28, 2025

graphite-app bot reviewed May 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Declarative eval #7315

feat: Declarative eval #7315

Uh oh!

CakeCrusher commented Apr 28, 2025

Uh oh!

review-notebook-app bot commented Apr 28, 2025

Uh oh!

CakeCrusher commented Apr 28, 2025

Uh oh!

graphite-app bot May 31, 2025

Uh oh!

Uh oh!

feat: Declarative eval #7315

Are you sure you want to change the base?

feat: Declarative eval #7315

Uh oh!

Conversation

CakeCrusher commented Apr 28, 2025

Description

Expample use

Test coverage

TODO

Uh oh!

review-notebook-app bot commented Apr 28, 2025

Uh oh!

CakeCrusher commented Apr 28, 2025

Uh oh!

graphite-app bot May 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!