Add evaluation sample recording tests (first set) #44753

aprilk-ms · 2026-01-19T04:23:22Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

- Add test_samples_evaluations.py with custom preparer and LLM instructions - Add UTF-8 encoding fix to sample_executor.py for Windows compatibility - Add azure_ai_agent_name to servicePreparer in test_base.py - Update assets.json with new recording tag

- sample_model_evaluation.py - sample_agent_response_evaluation.py - sample_evaluations_builtin_with_dataset_id.py - sample_evaluations_builtin_with_inline_data.py

- Add sanitizer for eval dataset timestamp (eval-data-YYYY-MM-DD_HHMMSS_UTC) - Remove sample_evaluations_builtin_with_dataset_id from tests (requires blob storage upload which doesn't work in playback) - All 4 evaluation samples now pass in playback mode

Copilot

Pull request overview

This pull request adds recording tests for evaluation samples, enabling automated testing of evaluation-related sample code. It introduces infrastructure for testing evaluation samples that require agent configuration and adds necessary sanitization patterns for evaluation-specific data in test recordings.

Changes:

Added a new test class TestSamplesEvaluations to test evaluation samples with recording support
Enhanced sample executor with UTF-8 encoding for log file writing
Extended test infrastructure with azure_ai_agent_name parameter for evaluation samples

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
sdk/ai/azure-ai-projects/tests/samples/test_samples_evaluations.py	New test file for evaluation samples with custom validation instructions and test configuration for 4 evaluation samples
sdk/ai/azure-ai-projects/tests/samples/sample_executor.py	Added explicit UTF-8 encoding to file writing operation for better cross-platform compatibility
sdk/ai/azure-ai-projects/tests/test_base.py	Added azure_ai_agent_name parameter to servicePreparer for evaluation sample tests
sdk/ai/azure-ai-projects/tests/conftest.py	Added regex sanitizer for eval dataset names with timestamps to ensure consistent test recordings
sdk/ai/azure-ai-projects/assets.json	Updated test recording asset tag to reference new recordings

Added: - sample_eval_catalog.py - sample_eval_catalog_code_based_evaluators.py - sample_eval_catalog_prompt_based_evaluators.py - sample_evaluation_compare_insight.py - sample_agent_response_evaluation_with_function_tool.py Skipped sample_evaluations_builtin_with_inline_data_oai.py (uses direct OpenAI client with get_bearer_token_provider which doesn't work with mock credentials)

aprilk-ms added 3 commits January 18, 2026 20:00

Add recorded tests for additional evaluation samples

bcfeaf2

- sample_model_evaluation.py - sample_agent_response_evaluation.py - sample_evaluations_builtin_with_dataset_id.py - sample_evaluations_builtin_with_inline_data.py

Copilot AI review requested due to automatic review settings January 19, 2026 04:23

aprilk-ms requested review from dargilco, glharper, howieleung, kingernupur, nick863, trangevi and trrwilson as code owners January 19, 2026 04:23

github-actions bot added the AI Projects label Jan 19, 2026

Copilot started reviewing on behalf of aprilk-ms January 19, 2026 04:24 View session

Copilot AI reviewed Jan 19, 2026

View reviewed changes

aprilk-ms added 2 commits January 19, 2026 17:54

Add documentation for excluded evaluation samples

f1e2580

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add evaluation sample recording tests (first set) #44753

Add evaluation sample recording tests (first set) #44753

aprilk-ms commented Jan 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add evaluation sample recording tests (first set) #44753

Are you sure you want to change the base?

Add evaluation sample recording tests (first set) #44753

Conversation

aprilk-ms commented Jan 19, 2026

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants