Skip to content

feat: Declarative eval #7315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
2 changes: 1 addition & 1 deletion .release-please-manifest.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{".":"8.27.0","packages/phoenix-evals":"0.20.6","packages/phoenix-otel":"0.9.2","packages/phoenix-client":"1.3.0"}
{".":"8.27.1","packages/phoenix-evals":"0.20.6","packages/phoenix-otel":"0.9.2","packages/phoenix-client":"1.3.0"}
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Changelog

## [8.27.1](https://github.com/Arize-ai/phoenix/compare/arize-phoenix-v8.27.0...arize-phoenix-v8.27.1) (2025-04-25)


### Bug Fixes

* Allow scroll on settings pages ([#7284](https://github.com/Arize-ai/phoenix/issues/7284)) ([c25b071](https://github.com/Arize-ai/phoenix/commit/c25b07143b9c714b75e3d9655ca9db161542acb0))

## [8.27.0](https://github.com/Arize-ai/phoenix/compare/arize-phoenix-v8.26.3...arize-phoenix-v8.27.0) (2025-04-24)


Expand Down
1 change: 0 additions & 1 deletion app/src/pages/settings/SettingsPage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ const settingsPageInnerCSS = css`
width: 100%;
margin-left: auto;
margin-right: auto;
height: 100%;
`;

export function SettingsPage() {
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* [User Guide](user-guide.md)
* [Deployment](deployment.md)
* [Environments](environments.md)
* [Phoenix Demo](https://phoenix-demo.arize.com/projects)

## 🔭 Tracing

Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/llm-evals/agent-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ See our Agent Reflection evaluation template for a more specific example.

See our [Agent Reflection evaluation template](../how-to-evals/running-pre-tested-evals/agent-reflection.md) for a specific example.

## Putting it all together
## Putting it all Together

Through a combination of the evaluations above, you can get a far more accurate picture of how your agent is performing.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Version and track changes made to prompt templates

# Prompt Management

<figure><img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/prompt_management.gif" alt=""><figcaption><p>Iterate on prampts, ship prompts when they are tested</p></figcaption></figure>
<figure><img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/prompt_management.gif" alt=""><figcaption><p>Iterate on prompts, ship prompts when they are tested</p></figcaption></figure>



Expand Down
2 changes: 1 addition & 1 deletion kustomize/base/phoenix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ spec:
value: /mnt/data
- name: PHOENIX_PORT
value: "6006"
image: arizephoenix/phoenix:version-8.27.0
image: arizephoenix/phoenix:version-8.27.1
name: phoenix
ports:
- containerPort: 6006
Expand Down
67 changes: 67 additions & 0 deletions packages/phoenix-client/.cursor/rules/general.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
description:
globs:
alwaysApply: true
---
# General Client Design Guidelines

## Dependancies

The client should be as light-weight as possible as it is meant to be integrated into applications directly with no impact on the runtime. This means it should never depend on the core `phoenix` package and should only depend on things under the `phoenix.client` sub-module. The client must never depend on modules that are related to a server such as `starlette`, `sqlalchamy`, `pg` and so on. For libraries like `pandas`, implement lazy importing (importing within the specific function that requires it) rather than importing at the top-level.

## Syntax

All methods that interact with the server shoud be namespaced via `projects`, `prompts` and so on.

All arguments to the methods MUST use `kwargs` so as to make the signature as self evident as possible.

Do not do:

```python
client.prompts.get("prompt_version_id")
```

Prefer:

```python
client.prompts.get(prompt_version_id="prompt_version_id")
```

Methods should be prefixed with an action:

- `get` - gets the entity. Corrolates to HTTP `GET` a specific entity. E.x. `/projects/1`
- `create` - makes a new entity. Corrolates to HTTP `POST`
- `list` - get a paginated list of an entity. E.g. `GET` a list `/projects`
- `add` - attach an entity to another. E.x. `add_annotation` would be used to attach an annotation to a `span` or `trace`
- `delete` - permanently delete an entity

In addition things can be sent to the platform in bulk.

- `log` - associates a list of entities to something. E.x. `log_annotations` will send a list of annotations to a particular target such as a `span` or a `project`

## Pandas

The client should make affordances to push and pull data from the phoenix server via `pandas` DataFrames. For all bulk operations, the method should be postfixed with `dataframe` so as to make it clear that the input and output is a dataframe.

For example:

```python
client.log_annotations_dataframe(dataframe=dataframe)
df = client.get_spans_dataframe(project_name="default")
```

## Transport

For all IO to the phoenix server, JSON or JSONL over HTTP should be preferred. This is so that clients in other languages can be created (E.g. `TypeScript`), LLMs can easily interpret the data (fine-tunining), and so that non homogenious data can be sent over the wire (e.x. `metatadata` dictionaries).

In the case that a different format is needed (e.x. `DataFrame` or `CSV`), the client should perform the translation (e.g. be a fat client) unless there is a more specific endpoint that supports that MIME type.

For example:

```python
client.log_annotations(annotations=annotations)

# Syntactic surgar to log annotations as a dataframe
# Annotations are still sent over the wire as JSON
client.log_annotations_dataframe(dataframe=df)
```
3 changes: 3 additions & 0 deletions packages/phoenix-evals/src/phoenix/evals/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .classify import llm_classify, run_evals
from .declarative import declarative_eval, transform_field_mappings_for_explanation
from .default_templates import (
CODE_FUNCTIONALITY_PROMPT_RAILS_MAP,
CODE_FUNCTIONALITY_PROMPT_TEMPLATE,
Expand Down Expand Up @@ -92,6 +93,8 @@
"TOOL_CALLING_PROMPT_RAILS_MAP",
"NOT_PARSABLE",
"run_evals",
"declarative_eval",
"transform_field_mappings_for_explanation",
"LLMEvaluator",
"HallucinationEvaluator",
"QAEvaluator",
Expand Down
Loading
Loading