Skip to content

WIP: fix: updates tests for bigframes package#16525

Open
chalmerlowe wants to merge 13 commits intomainfrom
fix-update-tests-for-bigframes-package
Open

WIP: fix: updates tests for bigframes package#16525
chalmerlowe wants to merge 13 commits intomainfrom
fix-update-tests-for-bigframes-package

Conversation

@chalmerlowe
Copy link
Copy Markdown
Contributor

WIP: First crack at resolving a number of the concerns in ISSUE #16489

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates system tests for K-Means, PCA, and ARIMA models. Key changes include implementing helper functions to sort categorical lists to ensure test stability, increasing relative tolerance (rtol) for PCA tests to accommodate numerical drift, and adding missing metrics to ARIMA score tests. Feedback highlights a potential logic error in the ARIMA test where the expected Mean Absolute Scaled Error is set to 0.0 despite non-zero RMSE, and identifies code duplication between the K-Means and PCA test files, suggesting that the categorical sorting logic should be centralized in a shared utility module.

Comment on lines +147 to +150
def sort_categorical_lists(val):
if isinstance(val, list) and len(val) > 0:
return sorted(val, key=lambda x: x["category"])
return val
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This helper function for sorting categorical lists is duplicated in packages/bigframes/tests/system/small/ml/test_decomposition.py (where it is named sort_categorical). To improve maintainability and ensure consistency across the test suite, consider moving this logic to a shared utility module, such as bigframes.testing.utils.

References
  1. To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary before returning it (e.g., using dict(sorted(metadata.items()))) instead of relying on manual ordering in the code.

Comment on lines +107 to +110
def sort_categorical(val):
if isinstance(val, list) and len(val) > 0:
return sorted(val, key=lambda x: x["category"])
return val
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This helper function is duplicated in packages/bigframes/tests/system/small/ml/test_cluster.py (where it is named sort_categorical_lists). Consider centralizing this logic in a shared utility module like bigframes.testing.utils.

References
  1. To ensure dictionary keys remain sorted without manual effort, programmatically sort the dictionary before returning it (e.g., using dict(sorted(metadata.items()))) instead of relying on manual ordering in the code.

expected,
check_exact=False,
rtol=0.1,
rtol=0.2, # FIX: Slightly increased rtol for numerical drift (from 0.1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Increasing the relative tolerance (rtol) from 0.1 to 0.2 is a significant change (100% increase). While numerical drift is common in ML system tests, a 20% relative error tolerance is quite high and might mask regressions. This change is applied throughout this file; consider if a tighter tolerance can be maintained or if the source of the drift can be addressed.

avijitsahaaornoy

This comment was marked as spam.

@chalmerlowe chalmerlowe marked this pull request as ready for review April 2, 2026 20:35
@chalmerlowe chalmerlowe requested review from a team as code owners April 2, 2026 20:35
.reset_index(drop=True)
)

# FIX: Helper to ignore row order inside categorical_value lists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

IMO, these tests are probably too flakey even after this change. I'd be happy if we just checked to make sure we had the expected columns + expected number of rows. In fact, check if we get at least the expected columns, because BQML has been known to add more on us.

Same goes for the other bigframes.ml tests modified in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants