Skip to content

Commit e8c126e

Browse files
committed
fic batch parquet read
1 parent f6cca76 commit e8c126e

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

src/yandex_cloud_ml_sdk/_utils/pyarrow.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ def get_next() -> RecordType | None:
2525

2626

2727
def read_dataset_records_sync(path: str, batch_size: int | None) -> Iterator[RecordType]:
28-
import pyarrow.dataset as pd # pylint: disable=import-outside-toplevel
28+
import pyarrow.parquet as pq # pylint: disable=import-outside-toplevel
2929

3030
# we need use kwargs method to preserve original default value
3131
kwargs = {}
3232
if batch_size is not None:
3333
kwargs['batch_size'] = batch_size
34-
dataset = pd.dataset(source=path, format='parquet')
35-
for batch in dataset.to_batches(**kwargs): # type: ignore[arg-type]
36-
yield from batch.to_pylist()
34+
with pq.ParquetFile(path) as reader:
35+
for batch in reader.iter_batches(**kwargs):
36+
yield from batch.to_pylist()

0 commit comments

Comments
 (0)