Skip to content

Bug: Session reduce - Reached heap limit Allocation failed - JavaScript heap out of memory #31

@BulkBeing

Description

@BulkBeing

While running examples/accumulator, udf crashes with out of memory:

Received datum 35815: event_time=Wed Dec 10 2025 00:07:46 GMT+0000 (Coordinated Universal Time), watermark=Wed Dec 10 2025 00:05:23 GMT+0000 (Coordinated Universal Time), value=123,34,117,115,101,114,95,105,100,34,58,34,56,51,52,101,48,49,50,48,45,98,101,102,54,45,52,55,54,48,45,56,100,102,99,45,52,101,55,49,102,99,52,53,97,101,49,52,34,44,34,112,97,103,101,95,105,100,34,58,34,57,50,54,54,101,52,57,98,45,57,99,99,50,45,52,52,51,102,45,57,51,52,97,45,55,48,101,53,54,54,97,98,100,101,97,56,34,44,34,97,100,95,105,100,34,58,34,48,98,97,56,51,49,48,100,45,100,50,100,56,45,52,102,97,54,45,57,53,98,52,45,50,101,102,53,99,49,54,56,100,49,54,101,34,44,34,97,100,95,116,121,112,101,34,58,34,98,97,110,110,101,114,34,44,34,101,118,101,110,116,95,116,121,112,101,34,58,34,118,105,101,119,34,44,34,101,118,101,110,116,95,116,105,109,101,34,58,34,49,55,54,53,51,50,53,50,54,54,57,55,51,34,44,34,105,112,95,97,100,100,114,101,115,115,34,58,34,56,48,46,51,53,46,50,52,53,46,55,34,125
Buffer size: 106809
Received datum 35686: event_time=Wed Dec 10 2025 00:07:46 GMT+0000 (Coordinated Universal Time), watermark=Wed Dec 10 2025 00:05:23 GMT+0000 (Coordinated Universal Time), value=123,34,117,115,101,114,95,105,100,34,58,34,50,50,50,54,53,102,48,56,45,97,54,50,97,45,52,55,100,98,45,57,100,53,57,45,102,98,51,98,54,51,49,51,55,57,56,53,34,44,34,112,97,103,101,95,105,100,34,58,34,100,49,51,53,56,98,98,97,45,53,101,56,54,45,52,99,55,50,45,98,97,102,102,45,102,97,50,55,101,52,97,100,57,49,48,50,34,44,34,97,100,95,105,100,34,58,34,97,100,102,50,49,52,56,97,45,101,48,52,55,45,52,49,49,55,45,97,100,97,101,45,98,52,50,102,98,50,49,54,99,55,52,98,34,44,34,97,100,95,116,121,112,101,34,58,34,98,97,110,110,101,114,34,44,34,101,118,101,110,116,95,116,121,112,101,34,58,34,118,105,101,119,34,44,34,101,118,101,110,116,95,116,105,109,101,34,58,34,49,55,54,53,51,50,53,50,54,54,52,53,55,34,44,34,105,112,95,97,100,100,114,101,115,115,34,58,34,54,56,46,49,52,55,46,51,51,46,49,51,57,34,125
Buffer size: 106810
Received datum 35727: event_time=Wed Dec 10 2025 00:07:46 GMT+0000 (Coordinated Universal Time), watermark=Wed Dec 10 2025 00:05:23 GMT+0000 (Coordinated Universal Time), value=123,34,117,115,101,114,95,105,100,34,58,34,97,101,57,97,52,100,57,98,45,56,51,49,53,45,52,100,48,57,45,97,57,49,54,45,102,99,99,101,49,98,98,51,56,57,49,99,34,44,34,112,97,103,101,95,105,100,34,58,34,55,100,50,51,97,102,55,50,45,48,101,52,99,45,52,56,98,98,45,56,102,48,99,45,54,101,57,57,51,50,52,55,97,53,101,55,34,44,34,97,100,95,105,100,34,58,34,99,48,97,101,102,51,102,55,45,97,56,48,52,45,52,51,102,99,45,97,102,57,97,45,55,100,101,101,51,50,100,52,54,52,52,51,34,44,34,97,100,95,116,121,112,101,34,58,34,98,97,110,110,101,114,34,44,34,101,118,101,110,116,95,116,121,112,101,34,58,34,118,105,101,119,34,44,34,101,118,101,110,116,95,116,105,109,101,34,58,34,49,55,54,53,51,50,53,50,54,54,57,55,53,34,44,34,105,112,95,97,100,100,114,101,115,115,34,58,34,57,50,46,56,46,57,48,46,49,57,53,34,125
Buffer size: 106811
Received datum 35816: event_time=Wed Dec 10 2025 00:07:46 GMT+0000 (Coordinated Universal Time), watermark=Wed Dec 10 2025 00:05:23 GMT+0000 (Coordinated Universal Time), value=123,34,117,115,101,114,95,105,100,34,58,34,97,101,57,97,52,100,57,98,45,56,51,49,53,45,52,100,48,57,45,97,57,49,54,45,102,99,99,101,49,98,98,51,56,57,49,99,34,44,34,112,97,103,101,95,105,100,34,58,34,55,100,50,51,97,102,55,50,45,48,101,52,99,45,52,56,98,98,45,56,102,48,99,45,54,101,57,57,51,50,52,55,97,53,101,55,34,44,34,97,100,95,105,100,34,58,34,99,48,97,101,102,51,102,55,45,97,56,48,52,45,52,51,102,99,45,97,102,57,97,45,55,100,101,101,51,50,100,52,54,52,52,51,34,44,34,97,100,95,116,121,112,101,34,58,34,98,97,110,110,101,114,34,44,34,101,118,101,110,116,95,116,121,112,101,34,58,34,118,105,101,119,34,44,34,101,118,101,110,116,95,116,105,109,101,34,58,34,49,55,54,53,51,50,53,50,54,54,57,55,53,34,44,34,105,112,95,97,100,100,114,101,115,115,34,58,34,57,50,46,56,46,57,48,46,49,57,53,34,125
Buffer size: 106812

<--- Last few GCs --->
= [1:0xc58f000]    12286 ms: Mark-Compact (reduce) 255.6 (256.7) -> 254.9 (256.9) MB, pooled: 0 MB, 11.57 / 0.00 ms  (+ 1.3 ms in 104 steps since start of marking, biggest step 0.1 ms, walltime since start of marking 54 ms) (average mu = 0.760, current mu =[1:0xc58f000]    12377 ms: Mark-Compact (reduce) 255.9 (256.9) -> 255.1 (257.2) MB, pooled: 0 MB, 69.56 / 0.00 ms  (+ 0.3 ms in 50 steps since start of marking, biggest step 0.1 ms, walltime since start of marking 84 ms) (average mu = 0.473, current mu = 
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x72be1c node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0xb9dc10  [node]
 3: 0xb9dcff  [node]
 4: 0xe367e5  [node]
 5: 0xe4796c  [node]
 6: 0xe1d5d3  [node]
 7: 0xdf3270  [node]
 8: 0x12e90f8  [node]
 9: 0x19f2636  [node]

Numa logs:

2025-12-10T00:38:18.441918Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Unaligned reducer is in shutdown mode, ignoring the message
2025-12-10T00:38:18.441920Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Unaligned reducer is in shutdown mode, ignoring the message
2025-12-10T00:38:18.441923Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Unaligned reducer is in shutdown mode, ignoring the message
2025-12-10T00:38:18.441971Z  INFO numaflow_core::reduce::wal::segment::append: FileWriterActor spawned and running.
2025-12-10T00:38:18.441985Z  INFO numaflow_core::reduce::wal::segment::append: Stopping, doing a final flush and rotate! self.wal_type=Data
2025-12-10T00:38:18.442423Z  INFO numaflow_core::reduce::wal::segment::append: Rotating WAL segment file current_size=939314 file_name="/var/numaflow/pbq/compaction_10_1765327098411654.wal"
2025-12-10T00:38:18.442580Z  INFO numaflow_core::reduce::wal::segment::append: rename successful self.current_file_name="/var/numaflow/pbq/compaction_10_1765327098411654.wal" to_file_name="/var/numaflow/pbq/compaction_10_1765327098411654.wal.frozen"
2025-12-10T00:38:18.442716Z  INFO numaflow_core::reduce::wal::segment::append: Stopping, doing a final flush and rotate! self.wal_type=Compact
2025-12-10T00:38:18.442731Z  INFO numaflow_core::reduce::wal::segment::compactor: Compaction task completed
2025-12-10T00:38:18.442737Z  INFO numaflow_core::reduce::pbq: PBQ streaming read completed
2025-12-10T00:38:18.442752Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Unaligned Reduce component is shutting down, waiting for active reduce tasks to complete
2025-12-10T00:38:18.442760Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Waiting for 1 active reduce tasks to complete
2025-12-10T00:38:18.442766Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Reduce task for window completed window_id="GLOBAL_SLOT"
2025-12-10T00:38:18.442774Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: All reduce tasks completed
2025-12-10T00:38:18.442789Z  INFO numaflow_core::reduce::wal::segment::append: Stopping, doing a final flush and rotate! self.wal_type=Gc
2025-12-10T00:38:18.442798Z  INFO numaflow_core::reduce::wal::segment::append: Rotating WAL segment file current_size=67 file_name="/var/numaflow/pbq/gc_0_1765327027552152.wal"
2025-12-10T00:38:18.442813Z  INFO numaflow_core::reduce::reducer::unaligned::reducer: Unaligned Reduce component successfully completed status=Err(Grpc(Status { code: Unknown, message: "h2 protocol error: error reading a body from connection", source: Some(hyper::Error(Body, Error { kind: Io(Kind(ConnectionReset)) })) }))
2025-12-10T00:38:18.442872Z ERROR numaflow_core::pipeline::forwarder::reduce_forwarder: Error in reducer e=Grpc(Status { code: Unknown, message: "h2 protocol error: error reading a body from connection", source: Some(hyper::Error(Body, Error { kind: Io(Kind(ConnectionReset)) })) })
2025-12-10T00:38:18.442911Z  INFO numaflow_core::metrics: Stopped the Lag-Reader Expose tasks
2025-12-10T00:38:18.442897Z  INFO numaflow_core::reduce::wal::segment::append: rename successful self.current_file_name="/var/numaflow/pbq/gc_0_1765327027552152.wal" to_file_name="/var/numaflow/pbq/gc_0_1765327027552152.wal.frozen"
2025-12-10T00:38:18.442960Z ERROR numaflow_core: Pipeline failed because of UDF failure error=Status { code: Unknown, message: "h2 protocol error: error reading a body from connection", source: Some(hyper::Error(Body, Error { kind: Io(Kind(ConnectionReset)) })) }
2025-12-10T00:38:18.443210Z  INFO numaflow_core: Gracefully Exiting...
2025-12-10T00:38:18.443225Z  INFO numaflow: Exited.

UDF crashing doesn't cause numa to terminate immediately. Numa exits only after some time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions