Merged
Conversation
enhance CAT AI statistical reporting with pass count and total count
move helper functions to helpers.py
…les and clarify report generation details
Fix: CI missing variable
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR improves the testing framework and error handling by introducing a modular script for generating statistical reports, refactoring test functions for clarity, and consolidating helper functions.
- Extracts the CAT AI statistical report script for workflow modularity
- Adds a new function for JSON schema validation and refactors tests to use helper functions
- Improves test configuration and organization by refactoring redundant functions
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| examples/team_recommender/src/response_matches_json_schema.py | Adds a function to validate JSON responses against a schema |
| examples/team_recommender/tests/helpers.py | Introduces new helper functions (e.g., natural sort, success rate assertion) |
| examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py | Refactors threshold measurement tests and error messages |
| examples/team_recommender/tests/example_7_schema_validators/test_response_has_valid_schema.py | Updates schema validation test with the new helper function |
| .github/workflows/cat-test-examples.yml | Updates workflow step to use the new statistical report script |
| examples/team_recommender/tests/conftest.py | Refactors fixtures and example discovery logic |
Comments suppressed due to low confidence (2)
examples/team_recommender/tests/helpers.py:189
- The expected order in this assertion relies on lexicographical sorting, which does not reflect natural number ordering; consider using the natural_sort_key in the sort or updating the expected order accordingly.
assert [ "example_10_threshold", "example_1_text_response", "example_2_unit", "example_8_retry_network", "example_9_retry_with_open_telemetry", ] == sorted(unsorted), "The list should be sorted by the number in the name"
examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py:85
- [nitpick] The error message 'Expected {expected_success_rate_measured} to be within of the success rate' is unclear; consider revising it to clearly indicate that the success rate is expected to lie within a specific confidence interval.
assert is_within_expected(expected_success_rate_measured, failure_count, sample_size), (
…t_sample and update related references
…ability with constants
The issue is that the condition uses success(), which means the step will only run if all previous steps in the job were successful.
…people_is_allocated for clarity
…n response handling
…tistical analysis tests
correct data type annotation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes aimed at improving the testing framework, refactoring code for better organization, and enhancing error handling and reporting. The most important changes include the extraction of a script for generating statistical reports, the addition of new helper functions and fixtures, and the refactoring of existing test functions for better clarity and structure.
Improvements to testing framework and error handling:
.github/workflows/cat-test-examples.yml: Refactored the step for showing the CAT AI statistical report to use a separate script (show-statistical-report.sh) for better modularity..github/workflows/show-statistical-report.sh: Created a new script to generate a statistical report of test results, improving the readability and maintainability of the workflow.Refactoring and code organization:
examples/team_recommender/src/response_matches_json_schema.py: Added a new functionresponse_matches_json_schemafor validating JSON responses against a schema, improving code reuse and separation of concerns.examples/team_recommender/tests/conftest.py: Added new fixtures and refactored existing functions to improve test configuration and setup. Removed redundant sorting functions and integrated them into the helpers module. [1] [2] [3]Enhancements to existing tests:
examples/team_recommender/tests/example_7_schema_validators/test_response_has_valid_schema.py: Refactored the test for validating JSON schema responses to use the newly addedresponse_matches_json_schemafunction and added retry logic for handling API connection errors. [1] [2] [3]examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py: Removed redundant validation functions and integrated them into the helpers module. Refactored the test for measuring success rates to improve clarity and maintainability. [1] [2] [3]Addition of helper functions:
examples/team_recommender/tests/helpers.py: Added new helper functions for sorting, success rate assertion, and generating test examples. These functions improve code reuse and simplify test logic across multiple test files. [1] [2]