Skip to content

Cache Hadoop Filesystem instance on Gravitino server to improve the performance #6561

Open
@yuqi1129

Description

@yuqi1129

Currently , all Gravitino File system providers use the following code

(Take HDFSFileSystemProvider for example)

Image

FileSystem.instance will always create a new Filesystem everytime even though they are the same mostly. In fact Hadoop FileSystem did have cache mechanism, If we use FileSystem.get, cache mechanism in FileSystem will works. Due to the fact the Gravitino virtual FileSystem (GVFS) client also shares FileSystemProviders and supports credentials for each unique path, we should be cautious when planning to enable cache in the file system. in all

  • In Gravitno server side, we can enable cache in FileSystem level
  • In GVFS, we need to disable it FileSystem level and cache file system instacen in GVFS level

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions