-
Notifications
You must be signed in to change notification settings - Fork 282
[BUG] GPU JSON reader incorrectly returns null/drops rows for non-timestamp values after isTimestamp validation change in incompatible date formats path #14532
Copy link
Copy link
Open
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybot_watchSlack bot watched issue for LLM analyzerSlack bot watched issue for LLM analyzerbugSomething isn't workingSomething isn't working
Description
Describe the bug
Build: rapids_it-non-utc-dev/641
42 pytest tests in json_test.py failed during NON_UTC_TZ (Asia/Shanghai) integration test on Spark 3.3.0. GPU returns None where CPU returns valid values (integers, booleans, strings, floats), and GPU produces fewer rows than CPU in multiple test variants of test_json_round_trip, test_json_infer_schema_round_trip, and test_json_input_meta. The failures are consistent across both v1 and v2 source list variants. The pattern suggests the GPU JSON reader is incorrectly nullifying or dropping valid values introduced after commit 19e6502 (PR #14502) which added isTimestamp validation in the incompatible date formats path of GpuToTimestamp.
Error logs:
FAILED json_test.py::test_json_round_trip[-Byte] - AssertionError: GPU (None) and CPU (42) int values are different at [873, 'a']
FAILED json_test.py::test_json_round_trip[-Short] - AssertionError: GPU (None) and CPU (-30153) int values are different at [726, 'a']
FAILED json_test.py::test_json_round_trip[-Integer] - AssertionError: GPU (None) and CPU (-1599471000) int values are different at [535, 'a']
FAILED json_test.py::test_json_round_trip[-Long] - AssertionError: GPU (None) and CPU (9067314308808974443) int values are different at [363, 'a']
FAILED json_test.py::test_json_round_trip[-Boolean] - AssertionError: GPU (None) and CPU (True) boolean values are different at [726, 'a']
FAILED json_test.py::test_json_round_trip[-String0] - AssertionError: CPU and GPU list have different lengths at [] CPU: 2048 GPU: 1484
FAILED json_test.py::test_json_round_trip[-Double0] - AssertionError: GPU (None) and CPU (-1.8979207445002143e-109) float values are different at [325, 'a']
FAILED json_test.py::test_json_round_trip[-Float0] - AssertionError: CPU and GPU list have different lengths at [] CPU: 2048 GPU: 1814
FAILED json_test.py::test_json_round_trip[-String2] - AssertionError: GPU (None) and CPU (INfINiTy) string values are different at [575, 'a']
FAILED json_test.py::test_json_input_meta[] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1970 GPU: 1822
Environment details
- Spark version: 3.3.0
- Scala: 2.12
- Test mode: NON_UTC_TZ (Asia/Shanghai)
- Commit: 19e6502
- DATAGEN_SEED: 1775157023
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybot_watchSlack bot watched issue for LLM analyzerSlack bot watched issue for LLM analyzerbugSomething isn't workingSomething isn't working