Skip to content

[Bug]: FlinkRunner never calls finish_bundle and OOM eventually #34178

Open
@muyangyuapple

Description

@muyangyuapple

What happened?

Hi Beam community,

I am using Flink 1.19 + Beam 2.61.0(via FlinkRunner) to process data. But I notice that the memory usage on workers (Flink task managers) keep going up linearly v.s. time. An eventually OOM.

I believe Beam should flush data from memory to disk at the end of each bundle, so I try setting max_bundle_size=10 and add logs to start_bundle() and finish_bundle() of my DoFN's.

But the memory usage still accumulates and the logs in start_bundle() are printed only once and logs at finish_bundle() is never printed.

So you have idea what may be the issue?

Thanks,
Muyang

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions