WIP: fix: updates tests for bigframes package by chalmerlowe · Pull Request #16525 · googleapis/google-cloud-python

chalmerlowe · 2026-04-01T22:52:10Z

WIP: First crack at resolving a number of the concerns in ISSUE #16489

…ued by bigquery

…s & value fluctuations

…mponents & value fluctuations

gemini-code-assist

Code Review

This pull request updates system tests for K-Means, PCA, and ARIMA models. Key changes include implementing helper functions to sort categorical lists to ensure test stability, increasing relative tolerance (rtol) for PCA tests to accommodate numerical drift, and adding missing metrics to ARIMA score tests. Feedback highlights a potential logic error in the ARIMA test where the expected Mean Absolute Scaled Error is set to 0.0 despite non-zero RMSE, and identifies code duplication between the K-Means and PCA test files, suggesting that the categorical sorting logic should be centralized in a shared utility module.

packages/bigframes/tests/system/small/ml/test_forecasting.py

gemini-code-assist · 2026-04-01T22:55:32Z

packages/bigframes/tests/system/small/ml/test_cluster.py

+    def sort_categorical_lists(val):
+        if isinstance(val, list) and len(val) > 0:
+            return sorted(val, key=lambda x: x["category"])
+        return val


This helper function for sorting categorical lists is duplicated in packages/bigframes/tests/system/small/ml/test_decomposition.py (where it is named sort_categorical). To improve maintainability and ensure consistency across the test suite, consider moving this logic to a shared utility module, such as bigframes.testing.utils.

References

To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary before returning it (e.g., using dict(sorted(metadata.items()))) instead of relying on manual ordering in the code.

gemini-code-assist · 2026-04-01T22:55:33Z

packages/bigframes/tests/system/small/ml/test_decomposition.py

+    def sort_categorical(val):
+        if isinstance(val, list) and len(val) > 0:
+            return sorted(val, key=lambda x: x["category"])
+        return val


This helper function is duplicated in packages/bigframes/tests/system/small/ml/test_cluster.py (where it is named sort_categorical_lists). Consider centralizing this logic in a shared utility module like bigframes.testing.utils.

References

To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary before returning it (e.g., using dict(sorted(metadata.items()))) instead of relying on manual ordering in the code.

gemini-code-assist · 2026-04-01T22:55:33Z

packages/bigframes/tests/system/small/ml/test_decomposition.py

        expected,
        check_exact=False,
-        rtol=0.1,
+        rtol=0.2,  # FIX: Slightly increased rtol for numerical drift (from 0.1)


Increasing the relative tolerance (rtol) from 0.1 to 0.2 is a significant change (100% increase). While numerical drift is common in ML system tests, a 20% relative error tolerance is quite high and might mask regressions. This change is applied throughout this file; consider if a tighter tolerance can be maintained or if the source of the drift can be addressed.

…atch actuals

… check for np.ndarray

tswast · 2026-04-02T21:14:15Z

packages/bigframes/tests/system/small/ml/test_cluster.py

        .reset_index(drop=True)
    )
+
+    # FIX: Helper to ignore row order inside categorical_value lists


Good catch!

IMO, these tests are probably too flakey even after this change. I'd be happy if we just checked to make sure we had the expected columns + expected number of rows. In fact, check if we get at least the expected columns, because BQML has been known to add more on us.

Same goes for the other bigframes.ml tests modified in this PR.

chalmerlowe added 3 commits April 1, 2026 17:55

fix: updates arima tests to account for additional response value iss…

2616d08

…ued by bigquery

fix: resolve sort order issues in K-Means centroids and PCA component…

da549f2

…s & value fluctuations

fix: resolve addt'l sort order issues in K-Means centroids and PCA co…

a54faff

…mponents & value fluctuations

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

fix: re-enable the system tests to confirm whether the edits help

5b2c7ab

This comment was marked as spam.

Sign in to view

chalmerlowe added 9 commits April 1, 2026 22:19

fix: experimenting with system test version

190a47e

experiment: trigger

419bfe2

fix: add functions to ensure ml output order and sign stability

3cad601

fix: update expected values produced by ML model to more accurately m…

541ed2e

…atch actuals

fix: update expected values to accommodate lists OR np.ndarrays

0d5c28e

fix: adjust assertion to account for varying numbers of tables

867cfec

fix: updates import statement to include numpy and refines isinstance…

6ae402e

… check for np.ndarray

chore: removed sentinel change used to temporarily launch system tests

4912949

chore: reformat/linting

e4c9c81

chalmerlowe marked this pull request as ready for review April 2, 2026 20:35

chalmerlowe requested review from a team as code owners April 2, 2026 20:35

tswast reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fix: updates tests for bigframes package#16525

WIP: fix: updates tests for bigframes package#16525
chalmerlowe wants to merge 13 commits intomainfrom
fix-update-tests-for-bigframes-package

chalmerlowe commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

tswast Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chalmerlowe commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

Uh oh!

tswast Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants