OOM with Secondary Index Enabled

### Bug Description

**What happened:**
  Out of Memory (OOM) errors occur when building SI on large tables with Secondary Index enabled. The error manifests during metadata table write operations:


**What you expected:**
Can we avoid populating in-memory hash-maps and lists and then return iterator? We can directly use iterator and avoid building memory pressure?  
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L200 

**Steps to reproduce:**
Build SI for 100GB table with parquet files of size 100MB+

### Environment

  - Hudi Version: 1.0.0+ (any version with Secondary Index support)
  - Spark Version: 3.5.x
  - Table Type: MOR or COW
  - Table Size: 10M+ records, 100+ files
  - Heap Size: Standard executor memory (not enough for non-streaming approach)



### Logs and Stack Trace

```
  java.lang.OutOfMemoryError: Java heap space
      at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
      at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
      at org.apache.avro.io.BufferedBinaryEncoder.flushBuffer(BufferedBinaryEncoder.java:96)
      at org.apache.hudi.avro.HoodieAvroUtils.indexedRecordToBytesStream(HoodieAvroUtils.java:152)
      at org.apache.hudi.common.util.HFileUtils.serializeRecordsToLogBlock(HFileUtils.java:221)
      at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:501)
      at org.apache.hudi.io.HoodieAppendHandle.flushToDiskIfRequired(HoodieAppendHandle.java:681)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM with Secondary Index Enabled #14077

Bug Description

Environment

Logs and Stack Trace

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM with Secondary Index Enabled #14077

Description

Bug Description

Environment

Logs and Stack Trace

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions