Skip to content

[Python] Exercise fallback case on tests for parquet.read_table in case dataset is not available #46373

Open
@raulcd

Description

@raulcd

Describe the enhancement requested

While reviewing:

I've realized that the following code path is never tested:

except ImportError:
# fall back on ParquetFile for simple cases when pyarrow.dataset
# module is not available
if filters is not None:
raise ValueError(
"the 'filters' keyword is not supported when the "
"pyarrow.dataset module is not available"
)
if partitioning != "hive":
raise ValueError(
"the 'partitioning' keyword is not supported when the "
"pyarrow.dataset module is not available"
)
if schema is not None:
raise ValueError(
"the 'schema' argument is not supported when the "
"pyarrow.dataset module is not available"
)
filesystem, path = _resolve_filesystem_and_path(source, filesystem)
if filesystem is not None:
source = filesystem.open_input_file(path)
# TODO test that source is not a directory or a list
dataset = ParquetFile(
source, read_dictionary=read_dictionary,
memory_map=memory_map, buffer_size=buffer_size,
pre_buffer=pre_buffer,
coerce_int96_timestamp_unit=coerce_int96_timestamp_unit,
decryption_properties=decryption_properties,
thrift_string_size_limit=thrift_string_size_limit,
thrift_container_size_limit=thrift_container_size_limit,
page_checksum_verification=page_checksum_verification,
)

I've tried to enable Parquet locally on the minimal Python example build but this would require to build with snappy and others. It would be good to have a test that exercises this code path.

This was tested before due to the use of legacy_dataset flag.
See:

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions