Skip to content

read_values (and read_pandas) fail on large data or large dataframe #1523

@shcheklein

Description

@shcheklein

We are getting,

Job failed with error RuntimeError: UDF function/class data is too large!(view logs)

for the code like this:

data_sample = parsed_df
input_df = dc.read_pandas(data_sample).select("test")

if data is large enough (>10M limit)

We need to rewrite read_values to avoid using gen and do direct operations.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions