Skip to content

Commit 58c6ded

Browse files
committed
materialize-iceberg: do not limit result set size
We must handle arbitrarily large result sets for load queries, and the default 1 GiB limit is not compatible with that requirement. This change removes the limit for result set size. It is possible that the Spark driver itself will OOM if the result set is too large, and in that case the user will need to set higher limits for the EMR Application.
1 parent 9beaea7 commit 58c6ded

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

materialize-iceberg/emr.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ func (e *emrClient) runJob(ctx context.Context, input any, entryPointUri, pyFile
145145
ExecutionRoleArn: aws.String(e.cfg.ExecutionRoleArn),
146146
JobDriver: &emrTypes.JobDriverMemberSparkSubmit{
147147
Value: emrTypes.SparkSubmit{
148-
SparkSubmitParameters: aws.String(fmt.Sprintf("--py-files %s", pyFilesCommonURI)),
148+
SparkSubmitParameters: aws.String(fmt.Sprintf("--py-files %s --conf spark.driver.maxResultSize=0", pyFilesCommonURI)),
149149
EntryPoint: aws.String(entryPointUri),
150150
EntryPointArguments: args,
151151
},

0 commit comments

Comments
 (0)