Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[File system] Add Google Cloud Storage integration #631

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

gkatzioura
Copy link

Purpose

Linked issue: close #630

Brief change log

Added the GSFileSystem based on the given Hadoop Google Cloud Storage file system

Tests

API and Format

No

Documentation

It adds support for the Google Cloud Storage Filesystem

@gkatzioura
Copy link
Author

Will add integration tests. Putting temporary to draft

@gkatzioura gkatzioura marked this pull request as draft March 19, 2025 07:37
@polyzos polyzos changed the title 630 [Feature] Add GSFileSystem Support Mar 19, 2025
@polyzos polyzos changed the title [Feature] Add GSFileSystem Support 630 Mar 19, 2025
@gkatzioura gkatzioura marked this pull request as ready for review March 20, 2025 09:20
@gkatzioura gkatzioura changed the title 630 [File system] Add Google Cloud Storage integration Mar 20, 2025
@gkatzioura
Copy link
Author

gkatzioura commented Mar 20, 2025

Added an integration test based on FileSystemBehaviorTestSuite.
For the integration test to run it requires a Google Cloud Storage bucket declared on the variable IT_CASE_GS_BUCKET
and a path of the service account in the IT_CASE_GS_ACCESS_KEY. The service account file needs to be mounted during the CI process.

Since the service account key is in a file form and not in a string form I see two options.

  1. Changing the ci.yaml to read the service account file (base64) from secrets and store it to a path.
  2. During the integration test initialization read the service account file (base64) and store it to a path.

In the pr I followed the second approach.

@wuchong
Copy link
Member

wuchong commented Mar 22, 2025

@luoyuxia could you help to review it?

@wuchong
Copy link
Member

wuchong commented Mar 22, 2025

@gkatzioura , could you check how Flink tests the gs-filesystem in CI?

@gkatzioura
Copy link
Author

gkatzioura commented Apr 4, 2025

Hi @wuchong , after checking there are not tests for the actual filesystem integration.
Will proceed on adding a mock server thus avoiding the need for integrating with an actual service account.

@wuchong
Copy link
Member

wuchong commented Apr 4, 2025

Thank you @gkatzioura , please ping me when you have finished the mock server.

@gkatzioura
Copy link
Author

Hi @wuchong & @luoyuxia
I implemented a server simulating Google Cloud Storage, thus the now the test runs pointing to that server. No credentials or actual GCP account needed.
The test is similar to the ones found on the FileSystemTest.java.
Should we need to bring back the Google Cloud Storage integration test can do on demand.

<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>9</source>
<target>9</target>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this? It breaks the CI build. Currently, we still need to ship artifacts build in Java8.

Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project fluss-fs-gs: Fatal error compiling: invalid target release: 9 -> [Help 1]

https://github.com/alibaba/fluss/actions/runs/14344481121/job/40238125178?pr=631

@gkatzioura gkatzioura marked this pull request as draft April 14, 2025 06:27
@gkatzioura gkatzioura marked this pull request as ready for review April 14, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[File system] Add Google Cloud Storage integration
3 participants