Skip to content

GCP Batch: Possible to remove Bucket Prefix in Inputs + Outputs? #1272

@lbeckman314

Description

@lbeckman314

Overview 🌀

Currently submitting GCP Storage job via Funnel requires that users add a "bucket prefix" to all inputs and outputs.

Expected Behavior ✔️

{
  "name": "Input/Output Test",
  "inputs": [{
    "url": "gs://tes-batch-integration/README.md.sha256",
    "path": "/README.md"          // <---- Arbitrary path (e.g. root) with no "bucket prefix" ✔️
  }],
  "outputs": [{
    "url": "gs://tes-batch-integration/README.md.sha256",
    "path": "/README.md.sha256"   // <---- Arbitrary path (e.g. root) with no "bucket prefix" ✔️
  }],
  "executors": [{
    "image": "alpine",
    "command": ["sha256sum", "/README.md | tee /README.md.sha256"]
  }]
}

Actual Behavior ❌

{
  "name": "Input/Output Test",
  "inputs": [{
    "url": "gs://tes-batch-integration/README.md.sha256",
    "path": "/mnt/disks/tes-batch-integration/README.md"          // <---- Path includes bucket prefix ❌
  }],
  "outputs": [{
    "url": "gs://tes-batch-integration/README.md.sha256",
    "path": "/mnt/disks/tes-batch-integration/README.md.sha256"   // <---- Path includes bucket prefix ❌
  }],
  "executors": [{
    "image": "alpine",
    "command": ["sha256sum", "/mnt/disks/tes-batch-integration/README.md | tee /mnt/disks/tes-batch-integration/README.md.sha256"]
  }]
}

Next Steps ⚙️

  • Check if it's possible for Google Storage inputs/outputs to be mounted at arbitrary (i.e. non-prefixed) paths in their volumes.
  • Update gcp_batch/backend to be able to support converting user provided paths (e.g. /) to Google Storage paths (e.g. /mnt/disks/<BUCKET>/).

Additional Resources 📚

  • GCP's API Sample demonstrating the expected "Mount Path" → MOUNT_PATH=/mnt/disks/gcs

Tip

Nextflow does have a toContainerMount function that handles converting user provided paths (e.g. /) to Google Storage paths (e.g. /mnt/disks/<BUCKET>/)

Metadata

Metadata

Assignees

Type

Projects

Status

To-Do

Relationships

None yet

Development

No branches or pull requests

Issue actions