feat(eval): implement deterministic code-based scorers (exact_string_match, numeric_match)

## Summary

I’d like to contribute a focused improvement to the evaluation module by implementing the deterministic code-based scorers that are currently placeholders:

- `exact_string_match`
- `numeric_match`

This would improve reproducibility for scenarios where deterministic checks are more appropriate than LLM-only judging.

## Current State

At the moment:

- `src/evaluation/scorers/code_based.py` contains `NotImplementedError` for both functions.
- `docs/evaluation.md` marks these scorers as skeletons/placeholders.
- `src/evaluation/tests/test_scorers.py` currently verifies the not-implemented behavior.

## Why this is useful

- Reduces reliance on `llm_judge` for objective scenario types.
- Improves repeatability and debugging of eval results.
- Enables clearer per-scenario scorer routing via `scoring_method`.

## Proposed MVP Scope

1. Implement `exact_string_match` with clear normalization rules (minimal and explicit).
2. Implement `numeric_match` with robust numeric parsing and tolerance support.
3. Register both scorers in the scorer registry.
4. Update tests from placeholder checks to behavior checks.
5. Update `docs/evaluation.md` to reflect availability and usage.

## Out of Scope (for this first PR)

- No multi-judge panel logic.
- No changes to `llm_judge` rubric behavior.
- No large evaluator architecture refactors.

## Acceptance Criteria

- `exact_string_match` and `numeric_match` execute without `NotImplementedError`.
- Scenarios using `scoring_method` route correctly to deterministic scorers.
- `uv run pytest src/evaluation/tests` passes.
- Documentation reflects the implemented scorer status and expected input fields.

## Notes

I can keep this first PR intentionally small and test-focused for quick review.
If maintainers agree with this scope, I can open a draft PR proposal right away.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): implement deterministic code-based scorers (exact_string_match, numeric_match) #348

Summary

Current State

Why this is useful

Proposed MVP Scope

Out of Scope (for this first PR)

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(eval): implement deterministic code-based scorers (exact_string_match, numeric_match) #348

Description

Summary

Current State

Why this is useful

Proposed MVP Scope

Out of Scope (for this first PR)

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions