Open
Description
What happened?
Hi Beam community,
I am using Flink 1.19 + Beam 2.61.0(via FlinkRunner) to process data. But I notice that the memory usage on workers (Flink task managers) keep going up linearly v.s. time. An eventually OOM.
I believe Beam should flush data from memory to disk at the end of each bundle, so I try setting max_bundle_size=10
and add logs to start_bundle()
and finish_bundle()
of my DoFN's.
But the memory usage still accumulates and the logs in start_bundle()
are printed only once and logs at finish_bundle()
is never printed.
So you have idea what may be the issue?
Thanks,
Muyang
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner