Skip to content

test_kmeans failed Mismatched elements #858

@pxLi

Description

@pxLi

first seen in spark-rapids-ml_nightly, run: 646. branch-25.02

the same commit passed in previous runs, could be non-deterministic case

12:46:22  =========================== short test summary info ============================
12:46:22  SKIPPED [2] tests/test_logistic_regression.py:2346: test_sparse_one_gpu_zeroes requires at least 2 GPUs
12:46:22  FAILED tests/test_kmeans.py::test_kmeans[10000-float64-(1000, 20)-vector] - assert [-6.200210094...58984375, ...] == approx([-5.78...66 ± 2.8e-03])
12:46:22    
12:46:22    comparison failed. Mismatched elements: 20 / 20:
12:46:22    Max absolute difference: 0.6937029361724854
12:46:22    Max relative difference: 0.2938384018578477
12:46:22    Index | Obtained            | Expected                     
12:46:22    0     | -6.200210094451904  | -5.787682056427002 ± 5.8e-03 
12:46:22    1     | 1.857926368713379   | 2.359571933746338 ± 2.4e-03  
12:46:22    2     | -4.016595363616943  | -3.8533520698547363 ± 3.9e-03
12:46:22    3     | -1.3698656558990479 | -1.5132267475128174 ± 1.5e-03
12:46:22    4     | -5.473804473876953  | -5.518653392791748 ± 5.5e-03 
12:46:22    5     | 7.518157958984375   | 7.688424587249756 ± 7.7e-03  
12:46:22    6     | 2.779451608657837   | 2.814058303833008 ± 2.8e-03  
12:46:22    7     | 8.553401947021484   | 8.606598854064941 ± 8.6e-03  
12:46:22    8     | 1.556957721710205   | 1.5148929357528687 ± 1.5e-03 
12:46:22    9     | 0.781173586845398   | 1.0107123851776123 ± 1.0e-03 
12:46:22    10    | -3.5970232486724854 | -2.9033203125 ± 2.9e-03      
12:46:22    11    | 4.501270771026611   | 4.528228282928467 ± 4.5e-03  
12:46:22    12    | -4.595738887786865  | -4.439876556396484 ± 4.4e-03 
12:46:22    13    | 2.8565192222595215  | 2.3768301010131836 ± 2.4e-03 
12:46:22    14    | 1.706239104270935   | 1.6383565664291382 ± 1.6e-03 
12:46:22    15    | -2.456780433654785  | -2.5442116260528564 ± 2.5e-03
12:46:22    16    | -5.350193977355957  | -5.256896495819092 ± 5.3e-03 
12:46:22    17    | -2.0196115970611572 | -1.8588895797729492 ± 1.9e-03
12:46:22    18    | -8.099149703979492  | -8.18447208404541 ± 8.2e-03  
12:46:22    19    | 3.167498826980591   | 2.8390140533447266 ± 2.8e-03
12:46:22  FAILED tests/test_kmeans.py::test_kmeans[10000-float64-(1000, 20)-array] - assert [-6.200210094...82147217, ...] == approx([-5.78...43 ± 2.8e-03])
12:46:22    
12:46:22    comparison failed. Mismatched elements: 20 / 20:
12:46:22    Max absolute difference: 0.6937034130096436
12:46:22    Max relative difference: 0.29383850962008007
12:46:22    Index | Obtained            | Expected                     
12:46:22    0     | -6.200210094451904  | -5.787683010101318 ± 5.8e-03 
12:46:22    1     | 1.8579264879226685  | 2.359572172164917 ± 2.4e-03  
12:46:22    2     | -4.016595363616943  | -3.8533520698547363 ± 3.9e-03
12:46:22    3     | -1.3698655366897583 | -1.513227105140686 ± 1.5e-03 
12:46:22    4     | -5.473804473876953  | -5.518652439117432 ± 5.5e-03 
12:46:22    5     | 7.518157482147217   | 7.6884236335754395 ± 7.7e-03 
12:46:22    6     | 2.779451608657837   | 2.814059257507324 ± 2.8e-03  
12:46:22    7     | 8.5534029006958     | 8.606597900390625 ± 8.6e-03  
12:46:22    8     | 1.5569578409194946  | 1.514892816543579 ± 1.5e-03  
12:46:22    9     | 0.7811737060546875  | 1.0107126235961914 ± 1.0e-03 
12:46:22    10    | -3.5970232486724854 | -2.903319835662842 ± 2.9e-03 
12:46:22    11    | 4.5012712478637695  | 4.528229713439941 ± 4.5e-03  
12:46:22    12    | -4.595738887786865  | -4.439876079559326 ± 4.4e-03 
12:46:22    13    | 2.8565189838409424  | 2.3768298625946045 ± 2.4e-03 
12:46:22    14    | 1.706239104270935   | 1.6383564472198486 ± 1.6e-03 
12:46:22    15    | -2.456780433654785  | -2.5442111492156982 ± 2.5e-03
12:46:22    16    | -5.350195407867432  | -5.256896018981934 ± 5.3e-03 
12:46:22    17    | -2.0196115970611572 | -1.8588895797729492 ± 1.9e-03
12:46:22    18    | -8.099149703979492  | -8.184473037719727 ± 8.2e-03 
12:46:22    19    | 3.167498826980591   | 2.839015007019043 ± 2.8e-03
12:46:22  FAILED tests/test_kmeans.py::test_kmeans[10000-float64-(1000, 20)-multi_cols] - assert [-6.200211524...35821533, ...] == approx([-5.78...64 ± 2.8e-03])
12:46:22    
12:46:22    comparison failed. Mismatched elements: 20 / 20:
12:46:22    Max absolute difference: 0.6937038898468018
12:46:22    Max relative difference: 0.2938389045069418
12:46:22    Index | Obtained            | Expected                     
12:46:22    0     | -6.200211524963379  | -5.787683010101318 ± 5.8e-03 
12:46:22    1     | 1.857926845550537   | 2.3595714569091797 ± 2.4e-03 
12:46:22    2     | -4.016595363616943  | -3.8533520698547363 ± 3.9e-03
12:46:22    3     | -1.3698657751083374 | -1.5132272243499756 ± 1.5e-03
12:46:22    4     | -5.473804473876953  | -5.518653392791748 ± 5.5e-03 
12:46:22    5     | 7.518158435821533   | 7.688422203063965 ± 7.7e-03  
12:46:22    6     | 2.779451608657837   | 2.8140580654144287 ± 2.8e-03 
12:46:22    7     | 8.5534029006958     | 8.606599807739258 ± 8.6e-03  
12:46:22    8     | 1.5569579601287842  | 1.5148930549621582 ± 1.5e-03 
12:46:22    9     | 0.7811734676361084  | 1.0107126235961914 ± 1.0e-03 
12:46:22    10    | -3.5970234870910645 | -2.9033195972442627 ± 2.9e-03
12:46:22    11    | 4.501271724700928   | 4.528229713439941 ± 4.5e-03  
12:46:22    12    | -4.595738410949707  | -4.439875602722168 ± 4.4e-03 
12:46:22    13    | 2.8565189838409424  | 2.3768298625946045 ± 2.4e-03 
12:46:22    14    | 1.706239104270935   | 1.63835608959198 ± 1.6e-03   
12:46:22    15    | -2.456780195236206  | -2.5442116260528564 ± 2.5e-03
12:46:22    16    | -5.350193977355957  | -5.256896018981934 ± 5.3e-03 
12:46:22    17    | -2.0196115970611572 | -1.8588894605636597 ± 1.9e-03
12:46:22    18    | -8.099148750305176  | -8.184473037719727 ± 8.2e-03 
12:46:22    19    | 3.16749906539917    | 2.839014768600464 ± 2.8e-03
12:46:22  ==== 3 failed, 688 passed, 2 skipped, 108391 warnings in 3375.98s (0:56:15) ====
[2025-03-04T04:46:22.002Z] a_clusters = [[-6.823820114135742, 1.3099526166915894, 9.101048469543457, -1.0847231149673462, 9.752960205078125, 9.887101173400879...6747751236, -3.943955898284912, 6.282511234283447, -7.419498443603516, 5.677783489227295, -8.57911205291748, ...], ...]
[2025-03-04T04:46:22.002Z] b_clusters = [[-6.823819160461426, 1.3099526166915894, 9.10104751586914, -1.0847229957580566, 9.752960205078125, 9.887102127075195,...57744598, -3.943956136703491, 6.2825117111206055, -7.419497966766357, 5.677781105041504, -8.579113006591797, ...], ...]
[2025-03-04T04:46:22.002Z] tolerance = 0.001
...
[2025-03-04T04:46:22.003Z] a_clusters = [[-6.8238205909729, 1.3099524974822998, 9.10104751586914, -1.0847231149673462, 9.752961158752441, 9.887101173400879, ....71947860718, -3.943956136703491, 6.282512187957764, -7.419497966766357, 5.67778205871582, -8.57911205291748, ...], ...]
[2025-03-04T04:46:22.003Z] b_clusters = [[-6.823820114135742, 1.3099526166915894, 9.101048469543457, -1.084722876548767, 9.752962112426758, 9.887102127075195,...57744598, -3.943956136703491, 6.2825117111206055, -7.419498920440674, 5.677781105041504, -8.579113960266113, ...], ...]
[2025-03-04T04:46:22.003Z] tolerance = 0.001
...
[2025-03-04T04:46:22.004Z] a_clusters = [[-6.823819160461426, 1.3099524974822998, 9.101049423217773, -1.0847231149673462, 9.752959251403809, 9.887102127075195...6896762848, -3.943955659866333, 6.282510280609131, -7.419498920440674, 5.677780628204346, -8.57911205291748, ...], ...]
[2025-03-04T04:46:22.004Z] b_clusters = [[-6.823818683624268, 1.3099526166915894, 9.101048469543457, -1.0847231149673462, 9.752958297729492, 9.887103080749512...437976837, -3.9439563751220703, 6.282511234283447, -7.419497966766357, 5.677781581878662, -8.57911205291748, ...], ...]
[2025-03-04T04:46:22.004Z] tolerance = 0.001
[2025-03-04T03:33:18.967Z] +-----------------------------------------------------------------------------------------+
[2025-03-04T03:33:18.967Z] | NVIDIA-SMI 570.86.10              Driver Version: 570.86.10      CUDA Version: 12.8     |
[2025-03-04T03:33:18.967Z] |-----------------------------------------+------------------------+----------------------+
[2025-03-04T03:33:18.967Z] | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
[2025-03-04T03:33:18.967Z] | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
[2025-03-04T03:33:18.967Z] |                                         |                        |               MIG M. |
[2025-03-04T03:33:18.967Z] |=========================================+========================+======================|
[2025-03-04T03:33:18.967Z] |   0  NVIDIA H100 NVL                On  |   00000000:C1:00.0 Off |                    0 |
[2025-03-04T03:33:18.967Z] | N/A   43C    P0             59W /  310W |       1MiB /  95830MiB |      0%      Default |
[2025-03-04T03:33:18.967Z] |                                         |                        |             Disabled |
[2025-03-04T03:33:18.967Z] +-----------------------------------------+------------------------+----------------------+
[2025-03-04T03:33:18.967Z]                                                                                          
[2025-03-04T03:33:18.967Z] +-----------------------------------------------------------------------------------------+
[2025-03-04T03:33:18.968Z] | Processes:                                                                              |
[2025-03-04T03:33:18.968Z] |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
[2025-03-04T03:33:18.968Z] |        ID   ID                                                               Usage      |
[2025-03-04T03:33:18.968Z] |=========================================================================================|
[2025-03-04T03:33:18.968Z] |  No running processes found                                                             |
[2025-03-04T03:33:18.968Z] +-----------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions