Skip to content

[Bug]: Flink Job manager metaspace classloader leak #32372

Open
@gli-marc-hurabielle

Description

@gli-marc-hurabielle

What happened?

Some similar issues has been reported and fixed in beam: #29890 #25510 and in flink https://issues.apache.org/jira/browse/FLINK-28248

We are using Apache Beam with the Python SDK to submit some batch jobs with the Flink REST API.

  • beam: apache-beam==2.58.0
  • flink: 1.18.1

The metaspace memory is only growing in flink (after each submit):

Screenshot 2024-08-30 at 18 21 35

At some point, it is not possible to submit new jobs (the Flink api hang).

I am not really use to debug JVM memory leak. We tried to run jmap -clstats 1 there is a lot of duplicate class like: org.apache.beam.vendor.grpc.v1p60p1.com.google.protobuf.DescriptorProtos$DescriptorProto
org.apache.beam.vendor.grpc.v1p60p1.com.google.protobuf.Descriptors$Descriptor

but those size are really small, so I am not sure it is a problem.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions