Skip to content

null structarrays are poorly handled by cast #37072

Open
@spenczar

Description

@spenczar

Describe the bug, including details regarding any error messages, version, and platform.

This code should work:

import pyarrow as pa
struct_type = pa.struct([pa.field("x", pa.int32(), nullable=False)])
nulls = pa.nulls(5, struct_type)

# The following is an error:
nulls = nulls.cast(struct_type)

The error message is:

ArrowInvalid: Can't view array of type struct<x: int32 not null> as struct<x: int32 not null>: nulls in input cannot be viewed as non-nullable

Indeed, if we print(nulls), it contains null values in the non-nullable field x:

-- is_valid:
  [
    false,
    false,
    false,
    false,
    false
  ]
-- child 0 type: int32
  [
    null,
    null,
    null,
    null,
    null
  ]

But those are all invalid at the top-level anyway, so there's no reason cast ought to care. Either that, or it should be impossible to call pa.nulls on a struct with a non-nullable field anywhere in its hierarchy of fields, but that seems wrong too. That would imply that if any field is non-nullable then the whole struct would be non-nullable, which clearly is not the intent. You should be able to have a null struct with non-nullable fields.

Ultimately, this is a C++ issue; Python is merely calling those functions.

Version

12.0.1

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions