Skip to content

Fix output_rows_skew sqllogictest flake#21958

Draft
geoffreyclaude wants to merge 1 commit intoapache:mainfrom
geoffreyclaude:codex/fix-output-rows-skew-slt
Draft

Fix output_rows_skew sqllogictest flake#21958
geoffreyclaude wants to merge 1 commit intoapache:mainfrom
geoffreyclaude:codex/fix-output-rows-skew-slt

Conversation

@geoffreyclaude
Copy link
Copy Markdown
Contributor

@geoffreyclaude geoffreyclaude commented Apr 30, 2026

Which issue does this PR close?

  • None.

Rationale for this change

The output_rows_skew sqllogictest can fail nondeterministically after #21351 added dynamic FileStream work scheduling: reorderable sibling file scan streams may share one queue of unopened files. The failing case expects the four files to be attributed to output partitions as [4, 0, 1, 0], producing output_rows_skew=84.31%; CI has observed the same scan reported as 100% instead.

This was seen on PR #21927 and independently on main, so it is a pre-existing test flake.

What changes are included in this PR?

Adds WITH ORDER (x) to the single CREATE EXTERNAL TABLE skew_parquet statement used by the four-file case in explain_analyze.slt. That makes the scan order-preserving, which disables shared file work stealing for this assertion and keeps the expected per-partition metric deterministic.

Are these changes tested?

Yes: flaky test now passes consistently.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 30, 2026
@geoffreyclaude geoffreyclaude force-pushed the codex/fix-output-rows-skew-slt branch from e3586f7 to 0473e9f Compare April 30, 2026 12:48
@adriangb
Copy link
Copy Markdown
Contributor

I wonder if we wouldn't be better off accepting that scan orders are going to be mostly non-deterministic and either add an ORDER BY to the queries themselves or classify this metric via #21160

@geoffreyclaude
Copy link
Copy Markdown
Contributor Author

I wonder if we wouldn't be better off accepting that scan orders are going to be mostly non-deterministic and either add an ORDER BY to the queries themselves or classify this metric via #21160

@adriangb I'll admit I didn't look too much in detail at this, my Codex just randomly fixed it while working on another task. So if you have a cleaner fix please go ahead!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants