Ci add notice about outcomes by paulz · Pull Request #52 · thisisartium/continuous-alignment-testing

paulz · 2025-03-15T23:48:49Z

This pull request includes several changes aimed at improving the testing framework, refactoring code for better organization, and enhancing error handling and reporting. The most important changes include the extraction of a script for generating statistical reports, the addition of new helper functions and fixtures, and the refactoring of existing test functions for better clarity and structure.

Improvements to testing framework and error handling:

.github/workflows/cat-test-examples.yml: Refactored the step for showing the CAT AI statistical report to use a separate script (show-statistical-report.sh) for better modularity.
.github/workflows/show-statistical-report.sh: Created a new script to generate a statistical report of test results, improving the readability and maintainability of the workflow.

Refactoring and code organization:

examples/team_recommender/src/response_matches_json_schema.py: Added a new function response_matches_json_schema for validating JSON responses against a schema, improving code reuse and separation of concerns.
examples/team_recommender/tests/conftest.py: Added new fixtures and refactored existing functions to improve test configuration and setup. Removed redundant sorting functions and integrated them into the helpers module. [1] [2] [3]

Enhancements to existing tests:

examples/team_recommender/tests/example_7_schema_validators/test_response_has_valid_schema.py: Refactored the test for validating JSON schema responses to use the newly added response_matches_json_schema function and added retry logic for handling API connection errors. [1] [2] [3]
examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py: Removed redundant validation functions and integrated them into the helpers module. Refactored the test for measuring success rates to improve clarity and maintainability. [1] [2] [3]

Addition of helper functions:

examples/team_recommender/tests/helpers.py: Added new helper functions for sorting, success rate assertion, and generating test examples. These functions improve code reuse and simplify test logic across multiple test files. [1] [2]

enhance CAT AI statistical reporting with pass count and total count

…ation

move helper functions to helpers.py

…-examples.yml

…les and clarify report generation details

Fix: CI missing variable

Copilot

Pull Request Overview

This PR improves the testing framework and error handling by introducing a modular script for generating statistical reports, refactoring test functions for clarity, and consolidating helper functions.

Extracts the CAT AI statistical report script for workflow modularity
Adds a new function for JSON schema validation and refactors tests to use helper functions
Improves test configuration and organization by refactoring redundant functions

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
examples/team_recommender/src/response_matches_json_schema.py	Adds a function to validate JSON responses against a schema
examples/team_recommender/tests/helpers.py	Introduces new helper functions (e.g., natural sort, success rate assertion)
examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py	Refactors threshold measurement tests and error messages
examples/team_recommender/tests/example_7_schema_validators/test_response_has_valid_schema.py	Updates schema validation test with the new helper function
.github/workflows/cat-test-examples.yml	Updates workflow step to use the new statistical report script
examples/team_recommender/tests/conftest.py	Refactors fixtures and example discovery logic

Comments suppressed due to low confidence (2)

examples/team_recommender/tests/helpers.py:189

The expected order in this assertion relies on lexicographical sorting, which does not reflect natural number ordering; consider using the natural_sort_key in the sort or updating the expected order accordingly.

assert [ "example_10_threshold", "example_1_text_response", "example_2_unit", "example_8_retry_network", "example_9_retry_with_open_telemetry", ] == sorted(unsorted), "The list should be sorted by the number in the name"

examples/team_recommender/tests/example_9_threshold/test_measurement_is_within_threshold.py:85

[nitpick] The error message 'Expected {expected_success_rate_measured} to be within of the success rate' is unclear; consider revising it to clearly indicate that the success rate is expected to lie within a specific confidence interval.

assert is_within_expected(expected_success_rate_measured, failure_count, sample_size), (

examples/team_recommender/src/response_matches_json_schema.py

…t_sample and update related references

…tices

…ability with constants

The issue is that the condition uses success(), which means the step will only run if all previous steps in the job were successful.

…olean_outcome.py

…people_is_allocated for clarity

…n response handling

…nse handling

…ponse details

…tistical analysis tests

…imeout

correct data type annotation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

austinworks

🙈

paulz and others added 11 commits March 14, 2025 17:54

allow retry of a previous example

68e7a42

refactor: move natural_sort_key function to helpers and clean up tests

d6fcec0

feat: add notice in CI

4832a34

enhance CAT AI statistical reporting with pass count and total count

fix: suppress PyBroadException warning in find_latest_example function

ad661f7

refactor: improve example directory handling and enhance test messages

836c53d

feat: add assert_success_rate fixture and refactor success rate valid…

8783c19

…ation

refactor: extract response_matches_json_schema

20fd6b4

move helper functions to helpers.py

fix: notice syntax

bfcd0b8

feat: add statistical report script and improve reporting in cat-test…

12fdfe8

…-examples.yml

docs: update show-statistical-report.sh to include environment variab…

2c0fcd7

…les and clarify report generation details

Update cat-test-examples.yml

19645ff

Fix: CI missing variable

tkersey requested a review from Copilot March 16, 2025 01:50

Copilot AI reviewed Mar 16, 2025

View reviewed changes

examples/team_recommender/src/response_matches_json_schema.py Outdated Show resolved Hide resolved

paulz added 17 commits March 15, 2025 22:02

feat: add response schema validation tests and improve logging setup

b631702

refactor: split test_not_is_within_expected

3ab7188

refactor: parameterize tests for better readability

86d68d7

refactor: rename analyse_sample_from_test to analyse_measure_from_tes…

80e2de6

…t_sample and update related references

feat: enhance GitHub job summary with detailed success and failure no…

6b69cbf

…tices

refactor: improve next_success_rate calculation and enhance test read…

99130d1

…ability with constants

fix: GitHub job summary not showing the notice

7a4ba6f

feat: check with CAT_AI_SAMPLE_SIZE variable for statistical report

8c2b98c

fix: improve error messages in statistical report script for clarity

e5b7a16

feat: fail faster on invalid number of runs

a95ff90

fix: debug with tmate when ACTIONS_RUNNER_DEBUG

379ba3b

fix: enhance error message for invalid number of rounds in CI workflow

bcccaf6

fix: skip tmate when ACTIONS_RUNNER_DEBUG is not set

d1f286a

refactor: TEST_RESULTS_FOLDER to be passed in environment

5ae8ac7

fix: show usage only for --help or -h

f58aad0

fix: validate ROUNDS input as a valid integer

12b22bb

feat: add script to determine and validate number of runs for CI testing

ea6436d

paulz added 20 commits March 16, 2025 16:33

refactor: move number of runs logic to a separate script

6b08436

fix: correct path to set-number-of-runs script in workflow

cf28757

fix: invoke shell script after checkout

9cae489

fix: debug env

92539f4

fix: only start tmate when debugging workflow

f0cba5d

fix: only start tmate when debugging workflow

6453a73

fix: use boolean values as true strings

1ae0f87

fix: tmate session does not start when debugging

d76936f

The issue is that the condition uses success(), which means the step will only run if all previous steps in the job were successful.

refactor: reuse DEFAULT_RUNS variable

2ff7f90

refactor: set DEFAULT_RUNS to input defaults

9121a36

cleanup: revert setting var to default input value

26ac226

docs: update README to clarify usage of pytest with latest examples

1bb0fc4

refactor: rename test_allocations_unit.py to test_allocations_unit_bo…

9a882d0

…olean_outcome.py

refactor: rename test_allocations to test_at_least_one_of_acceptable_…

efea127

…people_is_allocated for clarity

test: add tests for compute_cosine_similarity and update hallucinatio…

f2c8222

…n response handling

feat: add missing information response and update hallucination respo…

b85c553

…nse handling

fix: update cosine similarity assertion and enhance hallucination res…

5b5b149

…ponse details

fix: enhance figure rendering parameters for consistent output in sta…

cc2e307

…tistical analysis tests

fix: improve matplotlib configuration for consistent rendering in tests

cf09724

fix: skip failure rate bar graph test on CI due to image comparison t…

e188a4a

…imeout

paulz linked an issue Mar 17, 2025 that may be closed by this pull request

CI: - test_failure_rate_bar_graph - Failed: Timeout >10.0s #53

Closed

paulz and others added 2 commits March 17, 2025 11:58

fix: skip failure rate bar graph test on CI due to image comparison t…

2768a22

…imeout

Apply suggestions from code review

f39a012

correct data type annotation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

austinworks approved these changes Mar 17, 2025

View reviewed changes

austinworks merged commit 3794b45 into main Mar 17, 2025
2 checks passed

austinworks deleted the ci-experiment/add-notice branch March 17, 2025 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci add notice about outcomes#52

Ci add notice about outcomes#52
austinworks merged 50 commits intomainfrom
ci-experiment/add-notice

paulz commented Mar 15, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

austinworks left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paulz commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improvements to testing framework and error handling:

Refactoring and code organization:

Enhancements to existing tests:

Addition of helper functions:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

austinworks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paulz commented Mar 15, 2025 •

edited

Loading