Skip to content

Adding Google Storage Requester pays feature to Golang SDK. #33236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

LeoCBS
Copy link

@LeoCBS LeoCBS commented Nov 27, 2024

Setting UserProject on Google Storage Bucket operations to enable requester pays feature.

Requester pays project ID will come from environment variable named BILLING_PROJECT_ID

More information about Google storage requester pays feature here https://cloud.google.com/storage/docs/requester-pays

fixes #30747

Setting UserProject on Google Storage Bucket operations to enable requester pays
feature.

Requester pays project ID will come from environment variable named
`BILLING_PROJECT_ID`

More information about Google storage requester pays feature here https://cloud.google.com/storage/docs/requester-pays
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

github-actions bot commented Dec 5, 2024

Reminder, please take a look at this pr: @lostluck @chamikaramj

Copy link
Contributor

github-actions bot commented Dec 9, 2024

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 57.41%. Comparing base (ad8545c) to head (bfc9077).
Report is 1018 commits behind head on master.

Files with missing lines Patch % Lines
sdks/go/pkg/beam/io/filesystem/gcs/gcs.go 93.75% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #33236      +/-   ##
============================================
- Coverage     58.92%   57.41%   -1.51%     
+ Complexity     3112     1475    -1637     
============================================
  Files          1133      970     -163     
  Lines        174889   154431   -20458     
  Branches       3343     1076    -2267     
============================================
- Hits         103045    88661   -14384     
+ Misses        68495    63569    -4926     
+ Partials       3349     2201    -1148     
Flag Coverage Δ
go ∅ <93.75%> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This largely looks good, and it does seem that the storage package code handles an empty string there properly, despite not being in the documentation.

https://pkg.go.dev/cloud.google.com/go/storage#BucketHandle.UserProject

The main concern I have is whether the environment variable is automatically set in GCE VMs or not.

Have you tried this on Google Cloud Dataflow? (it's OK if you haven't, or your usage of Beam doesn't require it, but it's something I'll need to verify to ensure the feature works well...).

Once you get back to me on that, I'm happy to merge this in.

@lostluck
Copy link
Contributor

I think the only thing I don't really love is using an environment variables for this, as it's not necessarily going to be set or not, and there doesn't seem to be any standardization around it within Beam or GCP (outside of a few scattered unrelated examples).

It would probably be better to do something as a configurable hook at Pipeline submission time, that is triggered when a worker is initialized.

@LeoCBS
Copy link
Author

LeoCBS commented Dec 12, 2024

Hi @lostluck, thank you for your feedback. I will change the implementation to get the billing project from the pipeline variable instead of using the environment variable.

@lostluck
Copy link
Contributor

If it were me, I'd add a function to the GCS package that sets the value when the fs is constructed. eg. SetRequesterBillingProject(project string)

Then do something like the harness opts do for the side input cache:

Defines and installs the hook URN.

hooks.RegisterHook("beam:go:hook:sideinputcache:capacity", hf)

Has a method to be called at before pipeline submission to enable the hook, and call SetRequesterBillingProject with the value stored in the hook. Note, it just needs to be an arbitrary string (which could then be any serialized data that you like). This gets stored as a pipeline option, so it can be retrieved worker side.

https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/harnessopts/cache.go

This would affect the whole pipeline (just as this change does), but if a pipeline author wants to do trickier bits where the billing requester is changing, then we can figure out other options.
Ultimately, the goal is to enable most uses, but not all uses.

Copy link
Contributor

Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98

@LeoCBS
Copy link
Author

LeoCBS commented Dec 22, 2024

@lostluck check if i understand your suggestion:

in file sdks/go/pkg/beam/io/filesystem/gcs/gcs.go i will register a hook to get billing project information:

var (
    billingProject string = ""
)

func init() {
	hf := func(opts []string) hooks.Hook {
		return hooks.Hook{
			Init: func(ctx context.Context) (context.Context, error) {
				if len(opts) == 0 {
					return ctx, nil
				}
				if len(opts) > 1 {
					return ctx, fmt.Errorf("expected 1 option, got %v: %v", len(opts), opts)
				}

				billingProject = opts[0]
				return ctx, nil
			},
		}
	}
	hooks.RegisterHook("beam:go:hook:gcstorage:requesterbillingproject", hf)
}

and on beam/sdks/go/pkg/beam/util/harnessopts/gcs.go i will create a function like this?

const (
	billingProjectHook = "beam:go:hook:requesterbillingproject:capacity"
)

func SetRequesterBillingProject(billingProject string) error {
	return hooks.EnableHook(billingProjectHook, billingProject)
}

I have some questions:

* who will call function SetRequesterBillingProject?
* How will i test if hook was called correcty?
* Will i passing billing project by CLI args? like the runner value, `--runner dataflow`?

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.
R: @damondouglas for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@lostluck
Copy link
Contributor

@LeoCBS Thanks for your patience over this holiday season.

  1. That looks about right!
  2. There's no need to put the enabling function in the harnessopts package. The harness options were split up for package privacy reasons. Due to how tightly bound this feature is to GCS, please keep enabling function in the GCS filesystem package.
  3. Pipelines that would need to have that set will need to call it in the before pipeline submission in the binary's 'main'. It must be before pipeline submission, but doesn't have to be before beam.Init. Note that whatever is doing the call for this is allowed to use whatever environment variable you like. You can automate it that way if that's how you prefer. (eg. A final user specific package to call it at package init time, that looks for your environment variable, and then calls the enabler method.) (or as you suggest, a flag, that gets set by the binary, and then is used when calling the enabling function).
  4. You can add a test in the GCS filesystem package, that calls hooks.RunInitHooks -> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/hooks/hooks.go#L115 That will execute the enabled hooks. You should be able to verify that the billingProject package variable was set as expected there.

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just swapping this to requestChanges from approved since the change is being made anyway.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Reminder, please take a look at this pr: @lostluck @damondouglas

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

Copy link
Contributor

Reminder, please take a look at this pr: @jrmccluskey @chamikaramj

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.
R: @damondouglas for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@lostluck
Copy link
Contributor

waiting on author

@LeoCBS
Copy link
Author

LeoCBS commented Feb 16, 2025

@lostluck sorry for the delay, I was on vacation. I'm getting back to work on this MR :)

Copy link
Contributor

Reminder, please take a look at this pr: @lostluck @Abacn

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@lostluck
Copy link
Contributor

waiting on author

@claudevdm
Copy link
Collaborator

Hi @LeoCBS is this PR still being worked on?

@LeoCBS
Copy link
Author

LeoCBS commented Apr 21, 2025

Hi @LeoCBS is this PR still being worked on?

Hi @claudevdm

I'm working on this issue, I should be finishing it soon.

Copy link
Contributor

Reminder, please take a look at this pr: @jrmccluskey @chamikaramj

Copy link
Contributor

github-actions bot commented May 1, 2025

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@LeoCBS
Copy link
Author

LeoCBS commented May 11, 2025

Hi @lostluck , sorry for the delay to update this PR, could you check if my changes makes sense? I'm just missing one last point, how can i assert that the value set by hook was correctly set?

if err != nil {
t.Errorf("error to init hooks = %v", err)
}
//TODO how assert project configured by hook?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lostluck how can i assert project set by hook?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call: hooks.IsEnabled. It will return if the hook is enabled, and the options that were set.

https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam/core/util/hooks#IsEnabled

// See the License for the specific language governing permissions and
// limitations under the License.

package harness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put this into the harness package. It can live in the GCS package, registered to the the hook. The harness already triggers hooks on worker startup.

harnessopts does things in the harness package because they're options for the harness, but that separation doesn't need to exist for the gcs package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lostluck could you review changes again?

if err != nil {
t.Errorf("error to init hooks = %v", err)
}
//TODO how assert project configured by hook?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call: hooks.IsEnabled. It will return if the hook is enabled, and the options that were set.

https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam/core/util/hooks#IsEnabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Access GCS bucket with requester pays
3 participants