[INTEGRATION] Instructor/Pydantic validation now available in MLflow GenAI Evaluation

Hey Instructor team!

I've submitted a PR to MLflow that integrates Instructor/Pydantic schema validation as first-class scorers in MLflow's GenAI evaluation framework: https://github.com/mlflow/mlflow/pull/20628

## What it does

Lets MLflow users validate LLM structured outputs against Pydantic schemas - all deterministic, no extra LLM calls needed:

```python
from pydantic import BaseModel, Field
from mlflow.genai.scorers.instructor import (
    SchemaCompliance,
    FieldCompleteness,
    TypeValidation,
    ConstraintValidation,
    ExtractionAccuracy,
)

class UserInfo(BaseModel):
    name: str
    email: str
    age: int

class UserInfoWithConstraints(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=0, le=150)
    email: str = Field(pattern=r"^[\w.-]+@[\w.-]+\.\w+$")

# Validate schema compliance
scorer = SchemaCompliance()
result = scorer(
    outputs={"name": "John", "email": "john@example.com", "age": 30},
    expectations={"schema": UserInfo},
)
print(result.value)  # "yes"

# Check field completeness (returns 0.0-1.0)
scorer = FieldCompleteness()
result = scorer(
    outputs={"name": "John", "email": None, "age": 30},
    expectations={"schema": UserInfo},
)
print(result.value)  # 0.67 (2/3 fields complete)

# Validate constraints
scorer = ConstraintValidation()
result = scorer(
    outputs={"name": "", "age": 200, "email": "invalid"},
    expectations={"schema": UserInfoWithConstraints},
)
print(result.value)  # "no"
print(result.rationale)  # "name: String should have at least 1 character; age: Input should be <= 150..."
```

## Use with mlflow.genai.evaluate()

```python
import mlflow
from mlflow.genai.scorers.instructor import SchemaCompliance, FieldCompleteness

eval_data = [
    {
        "inputs": {"query": "Extract user info"},
        "outputs": {"name": "John Doe", "email": "john@example.com", "age": 30},
    },
    {
        "inputs": {"query": "Extract user info"},
        "outputs": {"name": "Jane"},  # Missing fields
    },
]

results = mlflow.genai.evaluate(
    data=eval_data,
    scorers=[
        SchemaCompliance(),
        FieldCompleteness(),
    ],
    expectations={"schema": UserInfo},
)
```

## Available Scorers

| Scorer | Purpose | Return Value |
|--------|---------|--------------|
| `SchemaCompliance` | Validates output matches Pydantic schema | `yes` / `no` |
| `FieldCompleteness` | Checks required fields are present and non-null | `0.0` - `1.0` |
| `TypeValidation` | Verifies field types match schema definitions | `yes` / `no` |
| `ConstraintValidation` | Checks Field validators/constraints pass | `yes` / `no` |
| `ExtractionAccuracy` | Compares extracted fields against ground truth | `0.0` - `1.0` |

## Why this is useful

A lot of folks use Instructor to get structured outputs from LLMs, but then need to evaluate those outputs at scale. This integration lets them:

1. Validate that LLM outputs actually match expected schemas
2. Track validation metrics over time in MLflow
3. Compare different prompts/models on extraction quality
4. Catch schema drift in production

## Why I'm posting here

A couple things:
1. Would love any feedback on the approach if you have time to look at the PR
2. Once merged, would you be interested in adding MLflow to your integrations/ecosystem docs?
3. I'd also be happy to collaborate on a blog post about using Instructor + MLflow together for structured output evaluation

Thanks for building Instructor - the Pydantic-first approach to structured outputs has been valuable for the LLM community!

Scorer	Purpose	Return Value
`SchemaCompliance`	Validates output matches Pydantic schema	`yes` / `no`
`FieldCompleteness`	Checks required fields are present and non-null	`0.0` - `1.0`
`TypeValidation`	Verifies field types match schema definitions	`yes` / `no`
`ConstraintValidation`	Checks Field validators/constraints pass	`yes` / `no`
`ExtractionAccuracy`	Compares extracted fields against ground truth	`0.0` - `1.0`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[INTEGRATION] Instructor/Pydantic validation now available in MLflow GenAI Evaluation #2063

What it does

Use with mlflow.genai.evaluate()

Available Scorers

Why this is useful

Why I'm posting here

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[INTEGRATION] Instructor/Pydantic validation now available in MLflow GenAI Evaluation #2063

Description

What it does

Use with mlflow.genai.evaluate()

Available Scorers

Why this is useful

Why I'm posting here

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions