Skip to content

[BUG] GPU JSON reader incorrectly returns null/drops rows for non-timestamp values after isTimestamp validation change in incompatible date formats path #14532

@pxLi

Description

@pxLi

Describe the bug
Build: rapids_it-non-utc-dev/641

42 pytest tests in json_test.py failed during NON_UTC_TZ (Asia/Shanghai) integration test on Spark 3.3.0. GPU returns None where CPU returns valid values (integers, booleans, strings, floats), and GPU produces fewer rows than CPU in multiple test variants of test_json_round_trip, test_json_infer_schema_round_trip, and test_json_input_meta. The failures are consistent across both v1 and v2 source list variants. The pattern suggests the GPU JSON reader is incorrectly nullifying or dropping valid values introduced after commit 19e6502 (PR #14502) which added isTimestamp validation in the incompatible date formats path of GpuToTimestamp.

Error logs:

FAILED json_test.py::test_json_round_trip[-Byte] - AssertionError: GPU (None) and CPU (42) int values are different at [873, 'a']
FAILED json_test.py::test_json_round_trip[-Short] - AssertionError: GPU (None) and CPU (-30153) int values are different at [726, 'a']
FAILED json_test.py::test_json_round_trip[-Integer] - AssertionError: GPU (None) and CPU (-1599471000) int values are different at [535, 'a']
FAILED json_test.py::test_json_round_trip[-Long] - AssertionError: GPU (None) and CPU (9067314308808974443) int values are different at [363, 'a']
FAILED json_test.py::test_json_round_trip[-Boolean] - AssertionError: GPU (None) and CPU (True) boolean values are different at [726, 'a']
FAILED json_test.py::test_json_round_trip[-String0] - AssertionError: CPU and GPU list have different lengths at [] CPU: 2048 GPU: 1484
FAILED json_test.py::test_json_round_trip[-Double0] - AssertionError: GPU (None) and CPU (-1.8979207445002143e-109) float values are different at [325, 'a']
FAILED json_test.py::test_json_round_trip[-Float0] - AssertionError: CPU and GPU list have different lengths at [] CPU: 2048 GPU: 1814
FAILED json_test.py::test_json_round_trip[-String2] - AssertionError: GPU (None) and CPU (INfINiTy) string values are different at [575, 'a']
FAILED json_test.py::test_json_input_meta[] - AssertionError: CPU and GPU list have different lengths at [] CPU: 1970 GPU: 1822

Environment details

  • Spark version: 3.3.0
  • Scala: 2.12
  • Test mode: NON_UTC_TZ (Asia/Shanghai)
  • Commit: 19e6502
  • DATAGEN_SEED: 1775157023

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybot_watchSlack bot watched issue for LLM analyzerbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions