Skip to content

Parallel download testsuit refactor#1235

Merged
yaozile123 merged 1 commit intoGoogleCloudPlatform:mainfrom
yaozile123:parallel-download-testsuit-refactor
Mar 23, 2026
Merged

Parallel download testsuit refactor#1235
yaozile123 merged 1 commit intoGoogleCloudPlatform:mainfrom
yaozile123:parallel-download-testsuit-refactor

Conversation

@yaozile123
Copy link
Copy Markdown
Collaborator

@yaozile123 yaozile123 commented Feb 26, 2026

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind feature

/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind flake

What this PR does / why we need it:
This PR introduces the foundational infrastructure to dynamically generate GKE CSI Integration tests directly from the upstream GCSFuse repository's canonical source of truth (test_config.yaml).

Historically, GCSFuse and GKE CSI driver teams maintained duplicated test suites, leading to synchronization delays and engineering toil. This migration resolves that by fetching and parsing the native test_config.yaml to dynamically build the Ginkgo test tree, ensuring both teams are always testing the exact same configurations.

Specifically, this PR:

  1. Fetches GCSFuse Version Before Ginkgo Starts: Previously, the GCSFuse version was fetched in SyncBeforeTestSuits. However, this was executed after Ginkgo had already generated the test cases. Since we now need the GCSFuse version first in order to fetch test_config.yaml and dynamically generate the test tree, the version fetching logic was moved to handler.go to run before Ginkgo starts. The fetched version is then exported as an environment variable for test suites to use.
  2. Defines YAML Data Structures: Adds TestPackage, TestConfig, and TestBucketType to correctly model the upstream YAML.
  3. Implements Config Fetching: Implements LoadTestConfig() and ParseTestConfig() to fetch the configuration securely during the test initialization hook (prior to Ginkgo tree construction).
  4. Translates Mount Options: Adds flag parsing utilities (ParseConfigFlags, ExtractOnlyDirFromMountOptions) to safely extract and map raw GCSFuse arguments into CSI driver mount options and capabilities.
  5. Optimizes HNS Testing: Skips redundant HNS test suites when Zonal Buckets (ZB) are enabled, since enabling ZB inherently covers identical HNS testing paths.
  6. Introduces Test Generators: Sets up the scaffolding (gcsfuseIntegrationFileCacheTestNew) to begin generating dynamic ginkgo.It() blocks based on the parsed config.
  7. Backward Compatibility Fallback: Includes logic to gracefully fall back to the manually hardcoded test cases if the tested GCSFuse version does not fulfill the requirements for the new dynamic test_config.yaml based test execution. Currently the test_config.yaml is only support in GCSFuse v3.7+.
  8. Fix parallel e2e test output interceptor failures: Prevented Ginkgo's output interceptor from getting stuck during parallel execution by updating cmd.Stdout and cmd.Stderr to ginkgo.GinkgoWriter.

Which issue(s) this PR fixes:

Fixes b/483387111
Fixes b/483387489
Fixes b/483387565
Fixed b/483388635

Special notes for your reviewer:
This is the first major step in migrating our E2E framework away from duplicated ginkgo.It() blocks. The primary focus here is establishing the robust parsing and translation layer for test_config.yaml. Subsequent PRs will focus on fully migrating specific test suites (like File Cache test suit and regular GCSFuse test suit) to utilize this new dynamic framework. Although this PR is large, it provides the complete implementation required to establish the new dynamic infrastructure and test it end-to-end.

Tested on standard cluster with GKE version 1.35.0-gke.3047002, 18 e2-standard-4 non-managed driver

make e2e-test   E2E_TEST_USE_GKE_MANAGED_DRIVER=false   E2E_TEST_BUILD_DRIVER=true  
BUILD_GCSFUSE_FROM_SOURCE=true   STAGINGVERSION=prow-gob-internal-boskos-metrics-1
REGISTRY=$REGISTRY


Ran 359 of 495 Specs in 10644.592 seconds
SUCCESS! -- 359 Passed | 0 Failed | 0 Pending | 136 Skipped


Ginkgo ran 1 suite in 2h57m51.7923277s
Test Suite Passed
make e2e-test   E2E_TEST_USE_GKE_MANAGED_DRIVER=false   E2E_TEST_BUILD_DRIVER=true 
BUILD_GCSFUSE_FROM_SOURCE=true   GCSFUSE_TAG="v3.7.0"   E2E_TEST_FOCUS="gcsfuseIntegration" 
REGISTRY=$REGISTRY
Ran 178 of 495 Specs in 11938.206 seconds
SUCCESS! -- 178 Passed | 0 Failed | 0 Pending | 317 Skipped

Tested on standard cluster with GKE version 1.35.0-gke.3047002, 10 e2-standard-4 managed driver

make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true E2E_TEST_FOCUS="gcsfuseIntegration"
Ran 178 of 495 Specs in 8376.762 seconds
SUCCESS! -- 178 Passed | 0 Failed | 0 Pending | 317 Skipped
make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true E2E_TEST_FOCUS="gcsfuseIntegration" ENABLE_ZB=true
Ran 86 of 401 Specs in 4658.316 seconds
SUCCESS! -- 86 Passed | 0 Failed | 0 Pending | 315 Skipped

Tested on standard cluster with GKE version 1.34.4-gke.1047000, GCSFuse Version 3.5.6-gke.0
10 e2-standard-4 managed driver

make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true E2E_TEST_FOCUS="gcsfuseIntegration"

Ran 164 of 481 Specs in 7928.124 seconds
SUCCESS! -- 164 Passed | 0 Failed | 0 Pending | 317 Skipped


Ginkgo ran 1 suite in 2h12m35.30571964s
Test Suite Passed

Tested on AP cluster with GKE version 1.35.0-gke.3047002

make e2e-test   E2E_TEST_USE_GKE_AUTOPILOT=true   E2E_TEST_USE_GKE_MANAGED_DRIVER=true
E2E_TEST_FOCUS=failedMount

Ran 33 of 495 Specs in 254.546 seconds
SUCCESS! -- 33 Passed | 0 Failed | 0 Pending | 462 Skipped


Ginkgo ran 1 suite in 4m21.826440647s
Test Suite Passed

Does this PR introduce a user-facing change?:

NONE

@google-oss-prow
Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@google-oss-prow
Copy link
Copy Markdown

@yaozile123: The label(s) kind/feature cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind feature

/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind flake

What this PR does / why we need it:
This PR introduces the foundational infrastructure to dynamically generate GKE CSI Integration tests directly from the upstream GCSFuse repository's canonical source of truth (test_config.yaml).

Historically, GCSFuse and GKE CSI driver teams maintained duplicated test suites, leading to synchronization delays and engineering toil. This migration resolves that by fetching and parsing the native test_config.yaml to dynamically build the Ginkgo test tree, ensuring both teams are always testing the exact same configurations.

Specifically, this PR:

  1. Defines YAML Data Structures: Adds TestPackage, TestConfig, and TestBucketType to correctly model the upstream YAML.
  2. Implements Config Fetching: Implements LoadTestConfig() and ParseTestConfig() to fetch the configuration securely during the test initialization hook (prior to Ginkgo tree construction).
  3. Translates Mount Options: Adds flag parsing utilities (ParseConfigFlags, ExtractOnlyDirFromMountOptions) to safely extract and map raw GCSFuse arguments into CSI driver mount options and capabilities.
  4. Optimizes HNS Testing: Skips redundant HNS test suites when Zonal Buckets (ZB) are enabled, since enabling ZB inherently covers identical HNS testing paths.
  5. Introduces Test Generators: Sets up the scaffolding (gcsfuseIntegrationFileCacheTestNew) to begin generating dynamic ginkgo.It() blocks based on the parsed config.
  6. Backward Compatibility Fallback: Includes logic to gracefully fall back to the manually hardcoded test cases if the tested GCSFuse version does not fulfill the requirements for the new dynamic test_config.yaml based test execution. Currently the test_config.yaml is only support in GCSFuse v3.5+.

Which issue(s) this PR fixes:

Fixes b/483387111
Fixes b/483387489
Fixes b/483387565
Fixed b/483388635

Special notes for your reviewer:
This is the first major step in migrating our E2E framework away from duplicated ginkgo.It() blocks. The primary focus here is establishing the robust parsing and translation layer for test_config.yaml. Subsequent PRs will focus on fully migrating specific test suites (like File Cache test suit and regular GCSFuse test suit) to utilize this new dynamic framework. Additionally, this PR ensures that test suites safely fall back to legacy hardcoded test cases if the tested GCSFuse version is incompatible with dynamic generation.

Does this PR introduce a user-facing change?:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the GKE CSI Integration tests to dynamically generate test cases from the upstream GCSFuse test_config.yaml. This change aims to reduce engineering toil and ensure synchronization between GCSFuse and GKE CSI driver teams by always testing against the exact same configurations. It establishes the necessary parsing and translation layers for GCSFuse flags and includes a fallback mechanism to maintain compatibility with older GCSFuse versions, setting the stage for a full migration of test suites to this new dynamic framework.

Highlights

  • Dynamic Test Generation: Introduced foundational infrastructure to dynamically generate GKE CSI Integration tests directly from the upstream GCSFuse repository's test_config.yaml, eliminating duplicated test suites.
  • YAML Data Structures and Parsing: Defined Go YAML data structures (TestPackage, TestConfig, TestBucketType) and implemented LoadTestConfig() and ParseTestConfig() to fetch and parse the upstream configuration.
  • Mount Option Translation: Added utility functions (ParseConfigFlags, ExtractOnlyDirFromMountOptions) to safely extract and map raw GCSFuse arguments into CSI driver mount options and capabilities.
  • Optimized HNS Testing: Implemented logic to skip redundant HNS test suites when Zonal Buckets (ZB) are enabled, as ZB inherently covers identical HNS testing paths.
  • Backward Compatibility: Ensured graceful fallback to manually hardcoded test cases if the tested GCSFuse version is older than v3.5+ and does not support the new dynamic test_config.yaml based test execution.
Changelog
  • test/e2e/e2e_test.go
    • Reordered imports for better organization.
    • Commented out the LoadTestConfig call, indicating future activation.
    • Added conditional logic to skip HNS test suites when Zonal Buckets are enabled.
  • test/e2e/testsuites/gcsfuse_integration.go
    • Defined a new constant gkeTempDir for consistent temporary directory paths.
  • test/e2e/testsuites/gcsfuse_integration_file_cache_parallel_downloads.go
    • Refactored existing static test cases into a dedicated generateStaticTests function.
    • Introduced gcsfuseIntegrationFileCacheTestNew and generateDynamicTests functions to support dynamic test generation from test_config.yaml.
    • Implemented logic to switch between dynamic and static test generation based on the detected GCSFuse version.
  • test/e2e/utils/iam_utils.go
    • Corrected klog.Warning and klog.Info calls to use klog.Warningf and klog.Infof for proper formatted string logging.
  • test/e2e/utils/utils.go
    • Imported io, net/http, and gopkg.in/yaml.v3 for network requests and YAML parsing.
    • Added new constants including MinGCSFuseTestConfigVersion and various flag prefixes.
    • Defined Go structs (TestPackage, TestConfig, TestBucketType, TestPackages, ParsedConfig) to represent the structure of test_config.yaml.
    • Implemented LoadTestConfig and ParseTestConfig functions to fetch and unmarshal the test configuration.
    • Added utility functions IsParallelDownloadsEnabled, IsFileCacheEnabled, IsOnlyDirEnabled, ParseConfigFlags, IsReadFromTestConfig, and ExtractOnlyDirFromMountOptions for flag processing and version checks.
  • test/e2e/utils/utils_test.go
    • Imported github.com/google/go-cmp/cmp for enhanced test comparisons.
    • Added TestParseConfigFlags to validate the new flag parsing utility function.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the GKE CSI Integration tests for GCSFuse, moving from hardcoded test suites to dynamically generating them from the upstream test_config.yaml. This is a great step towards reducing maintenance overhead and ensuring test synchronization. My review focuses on the robustness of the new parsing and test generation logic. I've identified a critical issue in the flag parsing logic that could lead to test failures, and a medium-severity issue related to string manipulation of mount options. Addressing these will make the new test framework more reliable.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch 7 times, most recently from a111baf to 8138dfb Compare March 10, 2026 03:54
@yaozile123
Copy link
Copy Markdown
Collaborator Author

/gemini summary

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

This pull request significantly refactors the GKE CSI Integration test framework by introducing dynamic test generation. It addresses the historical issue of duplicated test suites between GCSFuse and the GKE CSI driver teams, which led to synchronization delays. By fetching and parsing the canonical test_config.yaml from the upstream GCSFuse repository, the system can now dynamically build the Ginkgo test tree, ensuring consistent testing configurations across both teams. This change lays the groundwork for a more efficient and synchronized E2E testing environment.

Highlights

  • Dynamic Test Generation: Established infrastructure to dynamically generate GKE CSI Integration tests from the upstream GCSFuse test_config.yaml, eliminating duplicated test suites and ensuring synchronization.
  • GCSFuse Version Pre-fetching: Moved GCSFuse version fetching to run before Ginkgo starts, enabling dynamic test tree generation based on the version. The fetched version is exported as an environment variable.
  • YAML Configuration Handling: Introduced Go data structures (TestPackage, TestConfig, TestBucketType) and functions (LoadTestConfig, ParseTestConfig) to model, fetch, and parse the upstream test_config.yaml.
  • Mount Option Translation: Developed flag parsing utilities (ParseConfigFlags, ExtractOnlyDirFromMountOptions) to accurately map raw GCSFuse arguments to CSI driver mount options and capabilities.
  • HNS Test Optimization: Implemented logic to skip redundant HNS test suites when Zonal Buckets (ZB) are enabled, as ZB inherently covers similar HNS testing paths.
  • Backward Compatibility: Included a fallback mechanism to use manually hardcoded test cases if the GCSFuse version does not meet the requirements for dynamic test_config.yaml based test execution (e.g., versions older than v3.7+).
Changelog
  • test/e2e/e2e_test.go
    • Removed unused context import.
    • Added gomega import.
    • Moved testing.Init() call for earlier execution.
    • Assigned specs.GetGCSFuseVersion() to testsuites.GCSFuseVersionStr before test suite definition.
    • Removed the ginkgo.SynchronizedBeforeSuite block that previously fetched the GCSFuse version.
    • Added conditional logic to skip HNS test suites if Zonal Buckets are enabled.
  • test/e2e/specs/specs.go
    • Removed klog/v2 and clientcmd imports.
    • Removed the GcsfuseVersionVarName constant.
    • Refactored GetGCSFuseVersion to retrieve the GCSFuse version directly from an environment variable, removing the previous logic that deployed a temporary pod.
    • Updated GCSFuseVersionAndBranch to no longer require a context.Context parameter.
  • test/e2e/specs/testdriver.go
    • Updated the call to GCSFuseVersionAndBranch to remove the ctx parameter, aligning with the refactored function signature.
  • test/e2e/testsuites/failed_mount.go
    • Updated calls to specs.GetGCSFuseVersion to remove the ctx parameter and consistently use the global GCSFuseVersionStr.
  • test/e2e/testsuites/gcsfuse_integration.go
    • Added a new constant gkeTempDir.
    • Changed the package-level variable gcsfuseVersionStr to GCSFuseVersionStr for global access.
    • Updated calls to isKernelParamSupported and skipTestOrProceedWithBranch to remove the ctx parameter and utilize the global GCSFuseVersionStr.
    • Introduced logic to skip rename_symlink tests based on GCSFuse version and branch.
  • test/e2e/testsuites/gcsfuse_integration_file_cache.go
    • Updated calls to specs.GetGCSFuseVersion to remove the ctx parameter and consistently use the global GCSFuseVersionStr.
  • test/e2e/testsuites/gcsfuse_integration_file_cache_parallel_downloads.go
    • Added klog/v2 import.
    • Removed direct usage of gcsfuseVersionStr in gcsfuseIntegrationFileCacheTest.
    • Introduced gcsfuseIntegrationFileCacheTestNew and generateDynamicTests functions to support dynamic test generation from test_config.yaml.
    • Refactored existing static test cases into a generateStaticTests function.
    • Implemented conditional logic to switch between dynamic and static test generation based on the detected GCSFuse version.
  • test/e2e/testsuites/kernel_params.go
    • Updated calls to skipIfKernelParamsNotSupported and specs.GCSFuseVersionAndBranch to remove the ctx parameter.
  • test/e2e/testsuites/mount.go
    • Updated calls to skipIfKernelParamsNotSupported to remove the ctx parameter.
  • test/e2e/utils/handler.go
    • Added imports for context, path/filepath, clientset, and clientcmd.
    • Implemented logic to dynamically fetch the GCSFuse version from the cluster using a temporary pod and export it as an environment variable (GcsfuseVersionVarName) before Ginkgo test execution.
  • test/e2e/utils/iam_utils.go
    • Corrected logging format specifiers for klog.Warningf and klog.Infof.
  • test/e2e/utils/utils.go
    • Added imports for context, io, net/http, webhook, ginkgo/v2, corev1, metav1, ptr, and yaml.v3.
    • Introduced new constants including MinGCSFuseTestConfigVersion, GcsfuseVersionVarName, and various flag prefixes for parsing.
    • Defined Go structs (TestPackage, TestConfig, TestBucketType, TestPackages, ParsedConfig) to model the structure of test_config.yaml.
    • Implemented LoadTestConfig and ParseTestConfig functions to fetch and unmarshal the test configuration from a URL.
    • Added utility functions: IsParallelDownloadsEnabled, IsFileCacheEnabled, IsOnlyDirEnabled, ParseConfigFlags (for translating GCSFuse flags), IsReadFromTestConfig (to check test_config.yaml support), ExtractOnlyDirFromMountOptions, and FetchGCSFuseVersion (to get GCSFuse version via a temporary pod).
    • Modified runCommand to direct output to ginkgo.GinkgoWriter.
  • test/e2e/utils/utils_test.go
    • Added github.com/google/go-cmp/cmp import for enhanced test comparisons.
    • Added a new test suite for validating the ParseConfigFlags utility function.
Activity
  • The google-oss-prow[bot] skipped CI for the draft pull request and noted that the kind/feature label could not be applied.
  • The google-oss-prow[bot] also indicated that the pull request is not approved and requires approval from an OWNERS file approver.
  • gemini-code-assist[bot] provided an initial summary of changes and a changelog.
  • gemini-code-assist[bot] raised a critical review comment regarding the ParseConfigFlags function, specifically about the file-cache-max-size-mb flag not being translated to the config file format for mount options.
  • yaozile123 responded to the critical comment, clarifying that file-cache-max-size-mb is handled by the CSI driver and automatically translated, and other mentioned flags are valid GCSFuse CLI flags.
  • gemini-code-assist[bot] also provided a medium priority review comment suggesting a more robust method for filtering logging:severity:info from mount options, recommending splitting and joining the string instead of strings.ReplaceAll.

@yaozile123
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR is a significant and well-thought-out refactoring of the e2e test suite infrastructure. By dynamically generating tests from the upstream test_config.yaml, it effectively eliminates test duplication and ensures synchronization with the canonical GCSFuse tests. The changes, including moving the GCSFuse version fetching, introducing YAML data structures, and implementing a backward compatibility fallback, are robust and set a strong foundation for future test migrations. I've included a few suggestions to improve efficiency, correctness, and thread-safety in the tests.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch from 8138dfb to 379a523 Compare March 10, 2026 16:36
@yaozile123 yaozile123 marked this pull request as ready for review March 10, 2026 16:37
@amacaskill
Copy link
Copy Markdown
Collaborator

amacaskill commented Mar 11, 2026

@yaozile123 I see you have tested by running:

make e2e-test   E2E_TEST_USE_GKE_MANAGED_DRIVER=false   E2E_TEST_BUILD_DRIVER=true  
BUILD_GCSFUSE_FROM_SOURCE=true   STAGINGVERSION=prow-gob-internal-boskos-metrics-1
REGISTRY=$REGISTRY

Can you please also test other cases so we know this won't break any of our testgrids? You don't need to run entire testsuite , just the relevant tests

  1. Test with AP cluster + E2E_TEST_FOCUS=gcsfuseIntegrationFileCacheParallelDownloads.* : You will need to create an AP cluster (this has the driver enabled by default and it can't be disabled) , then run the tests following the documentation: I believe all you would need is make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true E2E_TEST_FOCUS=gcsfuseIntegrationFileCacheParallelDownloads.*
  2. Test with ZB enabled: I believe now that ZB is GA, you should be able to pass ENABLE_ZB to make e2e-test and this will create ZB instead of regional buckets: https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/blob/main/test/README.md#run-end-to-end-test. I believe they should have removed allowlisting now that the feature is GA. This can be with non-managed or managed. For this, please also run HNS testsuites to ensure you skip them when ZB is enabled


func runCommand(action string, cmd *exec.Cmd) error {
cmd.Stdout = os.Stdout
cmd.Stdout = ginkgo.GinkgoWriter
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see PR description says

"Prevented Ginkgo's output interceptor from getting stuck during parallel execution by updating cmd.Stdout and cmd.Stderr to ginkgo.GinkgoWriter."

Just curious, what do you mean it was getting stuck? which change caused that? I don't remember us having that issue on the testgrids

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the message I got from ginkgo when we were using os.Stdout:

When running in parallel, Ginkgo captures stdout and stderr output
  and attaches it to the running spec.  It looks like that process is getting
  stuck for this suite.

  This usually happens if you, or a library you are using, spin up an external
  process and set cmd.Stdout = os.Stdout and/or cmd.Stderr = os.Stderr.  This
  causes the external process to keep Ginkgo's output interceptor pipe open and
  causes output interception to hang.

  Ginkgo has detected this and shortcircuited the capture process.  The specs
  will continue running after this message however output from the external
  process that caused this issue will not be captured.

  You have several options to fix this.  In preferred order they are:

  1. Pass GinkgoWriter instead of os.Stdout or os.Stderr to your process.
  2. Ensure your process exits before the current spec completes.  If your
  process is long-lived and must cross spec boundaries, this option won't
  work for you.
  3. Pause Ginkgo's output interceptor before starting your process and then
  resume it after.  Use PauseOutputInterception() and ResumeOutputInterception()
  to do this.
  4. Set --output-interceptor-mode=none when running your Ginkgo suite.  This will
  turn off all output interception but allow specs to run in parallel without this
  issue.  You may miss important output if you do this including output from Go's
  race detector.

Copy link
Copy Markdown
Collaborator

@amacaskill amacaskill Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thanks for explaining. Do the test logs still look the same as the current test output on the testgrids? I just want to make sure this doesn't make it harder to debug failing tests in any way

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logs look the same

// It removed the leading '-' from each flag and appends the flag to MountOptions.
func ParseConfigFlags(flagStr string) ParsedConfig {
parsed := ParsedConfig{
FileCacheCapacity: "50Mi",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hardcoding FileCacheCapacity?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that as a default fallback value just in case the file-cache-max-size-mb flag is missing.

Copy link
Copy Markdown
Collaborator

@amacaskill amacaskill Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be hardcoding anything as that will change what the test does. Setting file-cache-max-size-mb will enable the file cache, which will change what the test does. https://docs.cloud.google.com/storage/docs/cloud-storage-fuse/cli-options#file-cache-max-size-mb

When is file-cache-max-size-mb be missing when it should be set?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the test_config.yaml didn't set file-cache-max-size-mb in some of the read_cache test_cases

continue
}

// TODO: Remove this block after boolean flags formatting bug is fixed.
Copy link
Copy Markdown
Collaborator

@amacaskill amacaskill Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uriel-guzman Please address this once your PR is merged: #1231

return v, fmt.Sprintf(gcsfuseReleaseBranchFormat, v.Major(), v.Minor(), v.Patch())
}

type TestPackage struct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://raw.githubusercontent.com/GoogleCloudPlatform/gcsfuse/v3.7.1/tools/integration_tests/test_config.yaml, I see rapid_appends test package defines a secondary_flags property under its configs list for tests that use dual mounts. Are we going to do that in a follow up PR? If so, how will you handle it in TestPackage? Will it be another field like SecondaryFlags?

Do we expect GCSFuse team to add new Flags like secondary_flags in the future? If so, how can we make sure that we don't need to do manual changes each time? Or how can we make the manual changes each time as easy as possible?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the secondary_flags field to the struct. Currently the yaml parser only processes fields defined in the struct and ignores others without failing, but behavioral changes (like secondary mounts) will still require manual updates as GCSFuse evolves.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there any any changes in the GCS Fuse repo config struct will it fail on our end as we haven't made equivalent changes in our struct?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it won't failed, the yaml parser will neglect any field that are not defined in our struct.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan to add tests with mounted_directory_secondary? I don't see this being used anywhere? Is this for the append tests?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mounted_directory_secondary will be used at test package rapid append.It will be included during the flat gcsfuse_integration testsuit refactoring.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch from 6b674aa to a52eb74 Compare March 13, 2026 17:57

// Filter out duplicate logging:severity:info from testdriver set up
mo := l.volumeResource.VolSource.CSI.VolumeAttributes["mountOptions"]
mo = strings.ReplaceAll(mo, "logging:severity:info", fmt.Sprintf("logging:severity:%v", config.LogSeverity))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update testdriver setup to not add these duplicates?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default logging:severity is info and the gcsfuse team added their own logging:severity to the test config YAML. We only gonna see duplicates at gcsfuse integration testsuits.

}
// The YAML parser treats test package as a list because of the '-' syntax.
// But there is only one configuration item under each package, so we take the first element.
pkg := pkgList[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check with GCS Fuse team if they are willing to fix this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check with them. A map-based structure for test pakcage would be cleaner, but on the gcsfuse side they currently expect lists. Transitioning would require a significant refactor of the upstream integration test framework.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch 2 times, most recently from 5fe37f2 to 94e9e2e Compare March 13, 2026 23:06

BINDIR ?= $(shell pwd)/bin

ifeq ($(ENABLE_ZB), true)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we set the variable while launching the internal testgrid, do we need this here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this for internal testgrid but just for manual testing.


// disallowedFlagsMapping maps the disallowed flags to their config file representation.
// See: https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/blob/585b8addb42335e0742be7059fd6570c78b62bc6/pkg/sidecar_mounter/sidecar_mounter_config.go#L83
// Note: If you add more disallowed flags in sidecar_mounter_config.go or use new ones in gcsfuse tests,
Copy link
Copy Markdown
Collaborator

@amacaskill amacaskill Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this same note to here

var disallowedFlags = map[string]bool{
as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer we use the same ma in both cases so we don't accidentially update one and not the other

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the note and will switch to use the same map in the following pr for file cache testsuit refactoring.

Comment on lines +277 to +278
// They must be handled specifically: either parsed manually to configure the test pod,
// or translated into the GCSFuse config file representation (x:y).
Copy link
Copy Markdown
Collaborator

@amacaskill amacaskill Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "parsed manually to configure the test pod" mean?

And can you please explain why "translated into the GCSFuse config file representation (x:y)" is a workaround. I think is because the sidecar disallowed flag map only allows the CLI format flag, not the config file flag, so we try to work around it in the testsuite by using the config file format instead. Is that correct? If so, then how does this help for the flags that are already in config file format, like cache-dir.

From my other comment, #1235 (comment), I'm wondering what happens when GCSFuse adds a new test that uses one of these flags. How can we prevent that in the first place, or make sure it doesn't require a manual update? If we can't, then I want to understand what cases will require a manual update, so we can document it somewhere as mentioned here

Copy link
Copy Markdown
Collaborator Author

@yaozile123 yaozile123 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct. For cache-dir the sidecar-mounter will automatically populate the cache-dir in its config file map when file cache is enabled.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GCSFuse adds a new test that uses a disallowed flag not yet handled in our specific cases, the flag will be swallowed in the default block of ParseConfigFlags. We've added a klog to this block to surface these instances. Let me write a short doc about this.

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amacaskill, yaozile123

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch from 94e9e2e to cf9f88a Compare March 17, 2026 21:43
@google-oss-prow google-oss-prow bot removed the lgtm label Mar 17, 2026
@google-oss-prow
Copy link
Copy Markdown

New changes are detected. LGTM label has been removed.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch 4 times, most recently from 60f7b41 to 1db06e8 Compare March 19, 2026 20:38
@yaozile123
Copy link
Copy Markdown
Collaborator Author

We noticed that some read cache test cases do not set --file-cache-max-size-mb. As discussed with @uriel-guzman and @amacaskill , enabling the file cache works differently in GCSFuse CSI compared to standalone GCSFuse. While standalone GCSFuse only requires setting --cache-dir, the CSI driver requires --file-cache-max-size-mb to be explicitly set. When it is, the sidecar mounter detects it and automatically configures the internal cache-dir.

Currently, file-cache-max-size-mb in GCSFuse defaults to -1. In our file-cache related tests (where cache-dir is present in the test flags), we had hardcoded file-cache-max-size-mb to -1.

@yaozile123 yaozile123 force-pushed the parallel-download-testsuit-refactor branch from 0d43013 to 667d926 Compare March 20, 2026 16:26
@yaozile123 yaozile123 merged commit 3e28452 into GoogleCloudPlatform:main Mar 23, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants