Skip to content

Commit 1210ef6

Browse files
kgpaimeta-codesync[bot]
authored andcommitted
build: Split ubuntu-debug into separate build and test jobs (facebookincubator#16938)
Summary: This PR makes two improvements to the Linux CI workflow: ### 1. Split BUILD/TEST status jobs for failure attribution Split CI job failures into clearly labeled **BUILD** and **TEST** status jobs across all Linux CI workflows, so contributors immediately know what failed from the GitHub Actions UI. Each job (adapters, ubuntu-debug, fedora-debug) now: 1. Tags build/test steps with `id: build` / `id: tests` 2. Uses `continue-on-error: true` on test steps so the job always completes 3. Exports `build-outcome` and `test-outcome` via job outputs 4. Has lightweight status jobs (`BUILD:` / `TEST:` prefixed) for instant recognition ### 2. Rebalance exec test groups for ~29% faster test execution The `velox_exec_test` suite uses `velox_add_grouped_tests()` which batches source files positionally into groups of 10. On `main`, the alphabetical ordering creates severely imbalanced groups — the worst group combines IndexLookupJoinTest (971s) + HashTableTest (60s) + MergeJoinTest (111s) = **~1,145s**. **Changes:** - **Split IndexLookupJoinTest.cpp** into IndexLookupJoinTest.cpp (~783s) and IndexLookupJoinTestExtra.cpp (~188s) to break up the 971s monolith - **Split HashJoinTest.cpp** by redistributing 20 MultiThreadedHashJoinTest tests to HashJoinTestExtra.cpp (renamed from HashJoinTestSpill.cpp), balancing ~575s vs ~585s instead of 919s vs 242s - **Reorder source files** using greedy bin-packing with measured EC2 timings so each group has roughly equal execution time ### Test time improvement | Metric | main (alphabetical) | This PR (optimized) | Improvement | |--------|----------------------|---------------------|-------------| | Slowest group | ~1,145s (19 min) | 817s (13.6 min) | **29% faster** | | Total wall time (ctest -j24) | ~1,145s | 911s (15.2 min) | **20% faster** | | Empty groups (wasted parallelism) | 2 groups (~5s each) | 0 | All groups utilized | **CI-measured group timings (this PR):** | Group | Anchor Test(s) | CI Time | |-------|---------------|---------| | 0 | IndexLookupJoinTest | 745s | | 1 | MultiFragmentTest | 493s | | 2 | HashJoinTestExtra | 817s | | 3 | SpillerTest | 532s | | 4 | HashJoinTest | 402s | | 5 | TableWriter + OutputBufMgr + TopNRowNumber + StreamingAgg | 480s | | 6 | TableScan + Aggregation + OrderBy + HashTable + RowNumber | 556s | | 7 | IndexLookupJoinExtra + MergeJoin + ScaleWriter + Exchange | 486s | **100% tests passed (530/530), 0 failures.** Pull Request resolved: facebookincubator#16938 Test Plan: - [x] All 8 exec test groups build and link successfully - [x] All tests pass on EC2 (standalone validation per group) - [x] CI run validates end-to-end: 530 tests, 0 failures, 911s total - [x] BUILD/TEST status labels appear correctly in Actions UI - [x] HashJoinTest split validated: group4 (34 MultiThreaded tests) = 402s, group2 (20 MultiThreaded + 75 HashJoinTest) = 817s - [x] IndexLookupJoinTest split validated: group0 = 745s, group7 = 486s Reviewed By: pratikpugalia Differential Revision: D99000024 Pulled By: kgpai fbshipit-source-id: 0b503728bb9f3223ed2a9f95f7c5fee69045cdeb
1 parent 5a3a35f commit 1210ef6

File tree

6 files changed

+9853
-9466
lines changed

6 files changed

+9853
-9466
lines changed

.github/workflows/linux-build-base.yml

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ jobs:
9494
USE_CLANG: "${{ inputs.use-clang && 'true' || 'false' }}"
9595
outputs:
9696
cudf-changes: ${{ steps.changes.outputs.cudf }}
97+
build-outcome: ${{ steps.build.outcome }}
98+
test-outcome: ${{ steps.tests.outcome }}
9799
steps:
98100
- uses: actions/checkout@v5
99101
with:
@@ -139,6 +141,7 @@ jobs:
139141
ccache -sz
140142
141143
- name: Make Release Build
144+
id: build
142145
env:
143146
MAKEFLAGS: NUM_THREADS=32 MAX_HIGH_MEM_JOBS=12 MAX_LINK_JOBS=12
144147
CUDA_ARCHITECTURES: 70
@@ -184,6 +187,9 @@ jobs:
184187
key: ccache-linux-adapters-${{ inputs.use-clang && 'clang' || 'gcc' }}
185188

186189
- name: Run Tests
190+
id: tests
191+
continue-on-error: true
192+
if: steps.build.outcome == 'success'
187193
env:
188194
LIBHDFS3_CONF: ${{ github.workspace }}/scripts/ci/hdfs-client.xml
189195
working-directory: _build/release
@@ -198,7 +204,7 @@ jobs:
198204
# need to run clang-tidy for that.
199205
# Let's also run this as last step so that if skipped it doesn't affect subsequent steps.
200206
- name: Install and run clang-tidy
201-
if: ${{ ! inputs.use-clang && needs.get-changes.outputs.run-clang-tidy == 'true' }}
207+
if: ${{ steps.build.outcome == 'success' && ! inputs.use-clang && needs.get-changes.outputs.run-clang-tidy == 'true' }}
202208
env:
203209
FILES: ${{ needs.get-changes.outputs.changed-files }}
204210
RANGE: ${{ needs.get-changes.outputs.diff-range }}
@@ -282,6 +288,41 @@ jobs:
282288
_build/release/velox/experimental/cudf/cudf-libs.tar
283289
retention-days: ${{ env.RETENTION }}
284290

291+
adapters-build-status:
292+
if: always()
293+
needs: adapters
294+
runs-on: ubuntu-latest
295+
name: "BUILD: Linux adapters release"
296+
steps:
297+
- run: |
298+
if [[ "$BUILD_OUTCOME" != "success" ]]; then
299+
echo "Build failed or was cancelled."
300+
exit 1
301+
fi
302+
echo "Build succeeded."
303+
env:
304+
BUILD_OUTCOME: ${{ needs.adapters.outputs.build-outcome }}
305+
306+
adapters-test-status:
307+
if: always()
308+
needs: adapters
309+
runs-on: ubuntu-latest
310+
name: "TEST: Linux adapters release"
311+
steps:
312+
- run: |
313+
if [[ "$BUILD_OUTCOME" != "success" ]]; then
314+
echo "Build failed — tests did not run."
315+
exit 1
316+
fi
317+
if [[ "$TEST_OUTCOME" != "success" ]]; then
318+
echo "Tests failed."
319+
exit 1
320+
fi
321+
echo "Tests passed."
322+
env:
323+
BUILD_OUTCOME: ${{ needs.adapters.outputs.build-outcome }}
324+
TEST_OUTCOME: ${{ needs.adapters.outputs.test-outcome }}
325+
285326
cudf-tests:
286327
runs-on: 4-core-ubuntu-gpu-t4
287328
needs: adapters
@@ -352,6 +393,9 @@ jobs:
352393
env:
353394
CCACHE_DIR: ${{ github.workspace }}/ccache
354395
USE_CLANG: ${{ inputs.use-clang && 'true' || 'false' }}
396+
outputs:
397+
build-outcome: ${{ steps.build.outcome }}
398+
test-outcome: ${{ steps.tests.outcome }}
355399
defaults:
356400
run:
357401
shell: bash
@@ -401,6 +445,7 @@ jobs:
401445
ccache -sz
402446
403447
- name: Make Debug Build
448+
id: build
404449
env:
405450
VELOX_DEPENDENCY_SOURCE: SYSTEM
406451
ICU_SOURCE: SYSTEM
@@ -430,6 +475,7 @@ jobs:
430475
EXTRA_CMAKE_FLAGS+=("-DVELOX_ENABLE_FAISS=ON")
431476
EXTRA_CMAKE_FLAGS+=("-DVELOX_ENABLE_REMOTE_FUNCTIONS=ON")
432477
fi
478+
export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS[*]}"
433479
make debug
434480
435481
- name: CCache after
@@ -442,10 +488,48 @@ jobs:
442488
key: ccache-ubuntu-debug-default-${{ inputs.use-clang && 'clang' || 'gcc' }}
443489

444490
- name: Run Tests
491+
id: tests
492+
continue-on-error: true
493+
if: steps.build.outcome == 'success'
445494
run: |
446495
ulimit -n 65536
447496
cd _build/debug && ctest -j 24 --timeout 1800 --output-on-failure --no-tests=error
448497
498+
ubuntu-debug-build-status:
499+
if: always()
500+
needs: ubuntu-debug
501+
runs-on: ubuntu-latest
502+
name: "BUILD: Ubuntu debug"
503+
steps:
504+
- run: |
505+
if [[ "$BUILD_OUTCOME" != "success" ]]; then
506+
echo "Build failed or was cancelled."
507+
exit 1
508+
fi
509+
echo "Build succeeded."
510+
env:
511+
BUILD_OUTCOME: ${{ needs.ubuntu-debug.outputs.build-outcome }}
512+
513+
ubuntu-debug-test-status:
514+
if: always()
515+
needs: ubuntu-debug
516+
runs-on: ubuntu-latest
517+
name: "TEST: Ubuntu debug"
518+
steps:
519+
- run: |
520+
if [[ "$BUILD_OUTCOME" != "success" ]]; then
521+
echo "Build failed — tests did not run."
522+
exit 1
523+
fi
524+
if [[ "$TEST_OUTCOME" != "success" ]]; then
525+
echo "Tests failed."
526+
exit 1
527+
fi
528+
echo "Tests passed."
529+
env:
530+
BUILD_OUTCOME: ${{ needs.ubuntu-debug.outputs.build-outcome }}
531+
TEST_OUTCOME: ${{ needs.ubuntu-debug.outputs.test-outcome }}
532+
449533
fedora-debug:
450534
runs-on: 32-core-ubuntu
451535
container: ghcr.io/facebookincubator/velox-dev:fedora
@@ -454,6 +538,8 @@ jobs:
454538
name: Fedora debug
455539
env:
456540
CCACHE_DIR: ${{ github.workspace }}/ccache
541+
outputs:
542+
build-outcome: ${{ steps.build.outcome }}
457543
defaults:
458544
run:
459545
shell: bash
@@ -505,6 +591,7 @@ jobs:
505591
ccache -sz
506592
507593
- name: Make Debug Build
594+
id: build
508595
env:
509596
VELOX_DEPENDENCY_SOURCE: SYSTEM
510597
faiss_SOURCE: BUNDLED
@@ -530,3 +617,18 @@ jobs:
530617
with:
531618
path: ${{ env.CCACHE_DIR }}
532619
key: ccache-fedora-debug-default-gcc
620+
621+
fedora-debug-build-status:
622+
if: always()
623+
needs: fedora-debug
624+
runs-on: ubuntu-latest
625+
name: "BUILD: Fedora debug"
626+
steps:
627+
- run: |
628+
if [[ "$BUILD_OUTCOME" != "success" ]]; then
629+
echo "Build failed or was cancelled."
630+
exit 1
631+
fi
632+
echo "Build succeeded."
633+
env:
634+
BUILD_OUTCOME: ${{ needs.fedora-debug.outputs.build-outcome }}

velox/exec/tests/CMakeLists.txt

Lines changed: 89 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -41,87 +41,113 @@ target_link_libraries(
4141
GTest::gtest_main
4242
)
4343

44-
# Split velox_exec_test into individual test binaries for parallel execution.
44+
# Sources are ordered via greedy bin-packing by measured EC2 execution time.
45+
# With VELOX_TESTS_PER_GROUP=10, CMake takes files positionally in batches of
46+
# 10 (files 1-10 = group0, 11-20 = group1, etc.).
47+
#
48+
# Measured per-file EC2 timings (sequential, 8-core, 30GB):
49+
# IndexLookupJoinTest 783s, MultiFragmentTest 599s,
50+
# HashJoinTestExtra ~585s, SpillerTest 584s, HashJoinTest ~575s,
51+
# TableWriterTest 225s, TableScanTest 189s, IndexLookupJoinTestExtra 188s,
52+
# MergeJoinTest 111s, AggregationTest 105s, OutputBufferManagerTest 98s,
53+
# ScaleWriterLocalPartitionTest 96s, OrderByTest 71s, TopNRowNumberTest 64s,
54+
# HashTableTest 60s, StreamingAggregationTest 58s, ExchangeClientTest 54s,
55+
# RowNumberTest 42s.
56+
#
57+
# Max group ~784s (bounded by IndexLookupJoinTest.cpp).
4558
set(
4659
VELOX_EXEC_TEST_SOURCES
60+
# group0 (~784s): IndexLookupJoinTest 783 + 9 lightweight
61+
IndexLookupJoinTest.cpp
62+
ConcatFilesSpillMergeStreamTest.cpp
63+
MixedUnionWithTableScanTest.cpp
64+
MemoryReclaimerTest.cpp
65+
EnforceDistinctTest.cpp
66+
TraceUtilTest.cpp
67+
HashPartitionFunctionTest.cpp
68+
SpatialIndexTest.cpp
69+
ValuesTest.cpp
70+
ParallelProjectTest.cpp
71+
# group1 (~599s): MultiFragmentTest 599 + 9 lightweight
72+
MultiFragmentTest.cpp
73+
EnforceSingleRowTest.cpp
74+
FilterToExpressionTest.cpp
75+
ScaledScanControllerTest.cpp
76+
HilbertIndexTest.cpp
77+
OperatorTraceTest.cpp
78+
LimitTest.cpp
79+
SplitListenerTest.cpp
4780
AddressableNonNullValueListTest.cpp
48-
AggregationTest.cpp
49-
AggregateFunctionRegistryTest.cpp
5081
ArrowStreamTest.cpp
51-
AssignUniqueIdTest.cpp
52-
AsyncConnectorTest.cpp
53-
ConcatFilesSpillMergeStreamTest.cpp
54-
ContainerRowSerdeTest.cpp
82+
# group2 (~585s): HashJoinTestExtra ~585 + 9 lightweight
83+
HashJoinTestExtra.cpp
84+
AggregateFunctionRegistryTest.cpp
85+
RoundRobinPartitionFunctionTest.cpp
5586
ColumnStatsCollectorTest.cpp
56-
CustomJoinTest.cpp
57-
EnforceSingleRowTest.cpp
58-
ExchangeClientTest.cpp
87+
MixedUnionTest.cpp
5988
ExpandTest.cpp
60-
FilterProjectTest.cpp
61-
FilterToExpressionTest.cpp
6289
FunctionResolutionTest.cpp
63-
HashBitRangeTest.cpp
64-
CountingJoinTest.cpp
65-
HashJoinBridgeTest.cpp
66-
HashJoinTest.cpp
67-
HashPartitionFunctionTest.cpp
68-
HashTableTest.cpp
69-
IndexLookupJoinTest.cpp
70-
LimitTest.cpp
71-
LocalPartitionTest.cpp
72-
MarkDistinctTest.cpp
90+
SpillStatsTest.cpp
7391
MarkSortedTest.cpp
74-
EnforceDistinctTest.cpp
75-
MemoryReclaimerTest.cpp
76-
MergeJoinTest.cpp
77-
MergeTest.cpp
78-
MergerTest.cpp
79-
MixedUnionTest.cpp
80-
MixedUnionWithTableScanTest.cpp
81-
MultiFragmentTest.cpp
82-
NestedLoopJoinTest.cpp
83-
OrderByTest.cpp
84-
OperatorTraceTest.cpp
85-
OutputBufferManagerTest.cpp
86-
ParallelProjectTest.cpp
8792
PartitionedOutputTest.cpp
88-
PlanNodeSerdeTest.cpp
89-
PlanNodeStatsTest.cpp
90-
PlanNodeToStringTest.cpp
91-
PlanNodeToSummaryStringTest.cpp
92-
PrefixSortTest.cpp
93-
PrintPlanWithStatsTest.cpp
94-
ProbeOperatorStateTest.cpp
95-
TraceUtilTest.cpp
96-
RoundRobinPartitionFunctionTest.cpp
97-
RowContainerTest.cpp
98-
RowNumberTest.cpp
99-
ScaledScanControllerTest.cpp
100-
ScaleWriterLocalPartitionTest.cpp
101-
SortBufferTest.cpp
102-
SpatialIndexTest.cpp
103-
HilbertIndexTest.cpp
104-
SpillStatsTest.cpp
93+
# group3 (~584s): SpillerTest 584 + 9 lightweight
10594
SpillerTest.cpp
106-
SpillTest.cpp
107-
SplitListenerTest.cpp
95+
PlanNodeToSummaryStringTest.cpp
96+
CountingJoinTest.cpp
97+
CustomJoinTest.cpp
98+
TaskListenerTest.cpp
10899
SplitTest.cpp
109100
SqlTest.cpp
110-
StreamingAggregationTest.cpp
101+
WindowFunctionRegistryTest.cpp
111102
StreamingEnforceDistinctTest.cpp
112-
TableScanTest.cpp
103+
HashBitRangeTest.cpp
104+
# group4 (~575s): HashJoinTest ~575 + 9 lightweight
105+
HashJoinTest.cpp
106+
SpillTest.cpp
107+
WindowTest.cpp
108+
PrefixSortTest.cpp
109+
MergerTest.cpp
110+
LocalPartitionTest.cpp
111+
PrintPlanWithStatsTest.cpp
112+
ProbeOperatorStateTest.cpp
113+
MarkDistinctTest.cpp
114+
MergeTest.cpp
115+
# group5 (~445s): TableWriterTest 225 + OutputBufMgr 98
116+
# + TopNRowNumber 64 + StreamingAgg 58
113117
TableWriterTest.cpp
114-
TaskListenerTest.cpp
115-
ThreadDebugInfoTest.cpp
118+
OutputBufferManagerTest.cpp
116119
TopNRowNumberTest.cpp
120+
StreamingAggregationTest.cpp
121+
ContainerRowSerdeTest.cpp
122+
RowContainerTest.cpp
117123
TopNTest.cpp
124+
WriterFuzzerUtilTest.cpp
125+
PlanNodeStatsTest.cpp
126+
ThreadDebugInfoTest.cpp
127+
# group6 (~467s): TableScanTest 189 + AggregationTest 105
128+
# + OrderBy 71 + HashTable 60 + RowNumber 42
129+
TableScanTest.cpp
130+
AggregationTest.cpp
131+
OrderByTest.cpp
132+
HashTableTest.cpp
133+
RowNumberTest.cpp
118134
UnnestTest.cpp
119-
UnorderedStreamReaderTest.cpp
120-
ValuesTest.cpp
135+
PlanNodeSerdeTest.cpp
136+
HashJoinBridgeTest.cpp
137+
SortBufferTest.cpp
121138
VectorHasherTest.cpp
122-
WindowFunctionRegistryTest.cpp
123-
WindowTest.cpp
124-
WriterFuzzerUtilTest.cpp
139+
# group7 (~449s): IndexLookupJoinTestExtra 188
140+
# + MergeJoin 111 + ScaleWriter 96 + Exchange 54
141+
IndexLookupJoinTestExtra.cpp
142+
MergeJoinTest.cpp
143+
ScaleWriterLocalPartitionTest.cpp
144+
ExchangeClientTest.cpp
145+
NestedLoopJoinTest.cpp
146+
UnorderedStreamReaderTest.cpp
147+
PlanNodeToStringTest.cpp
148+
AssignUniqueIdTest.cpp
149+
FilterProjectTest.cpp
150+
AsyncConnectorTest.cpp
125151
)
126152

127153
set(

0 commit comments

Comments
 (0)