Skip to content

[BUG] HadoopFileIO does not close the FileSystem causing a thread leak of threads sdk-ScheduledExecutor #5351

@psantos-denodo

Description

@psantos-denodo

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • [x ] Kernel
  • Other (fill in here)

Describe the problem

Steps to reproduce

Use DefaultFileSystemClient to do any of the operations with data in S3 using latest version of hadoop-aws 3.4.2

Observed results

If you perform a threaddump you will see several threads called "sdk-ScheduledExecutor".
Those are generated by the hadoop FileSystem and they remain because the FileSystems are never closed

Expected results

No thread leak

Environment information

  • Delta Lake version: master but also in previous versions. For instance in 3.1.0 the FileSystems created in DefaultJsonHandler and DefaultFileSystemClient are never close.
  • DefaultTableClient should be Closeable and the close method close the resources in the related classes.

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions