Propagate custom `env` variables declared in the Buildkite pipeline to the VM

See also: D161538#3028347-code

## Current state

Currently [the bash script that is generated by `hostmgr generate buildkite-job`](https://github.com/Automattic/hostmgr/blob/trunk/Sources/hostmgr/commands/generate/GenerateBuildkiteJobScript.swift) then [passed to the VM via SSH to kick off the build within the VM](https://github.com/Automattic/buildkite-ci/blob/1086ae34ea00af5a7c1c33ff28789a513eb503cd/src/agents/macos-hosts/resources/buildkite-hooks/command#L35-L41) only [exports env vars that are prefixed with `BUILDKITE_`](https://github.com/Automattic/hostmgr/blob/trunk/Sources/hostmgr/commands/generate/GenerateBuildkiteJobScript.swift#L40) (and [overrides some of them](https://github.com/Automattic/hostmgr/blob/trunk/Sources/hostmgr/commands/generate/GenerateBuildkiteJobScript.swift#L14-L24) + [filters out some others](https://github.com/Automattic/hostmgr/blob/trunk/Sources/hostmgr/commands/generate/GenerateBuildkiteJobScript.swift#L26-L32) to adjust to the VM environment being different from the host)

This is done on purpose, for security reasons, and because we don't want to leak/overwrite unwanted/unexpected env vars (`SHELL`, `USER`, …) from the host into the VM. So we should definitively keep such a filtering (as opposed to export all existing env vars from the host blindly).

## The issue

That being said, sometimes it'd be useful to have some specific env vars being transferred to the VM so that they can be resolved when they are used in the `.buildkite/commands/*.sh` scripts we call from the `command:` attribute of our pipelines.

One example of that is env vars we use in our ReleaseV2 scenario to pass values like `RELEASE_VERSION` to the pipeline.
1. Currently, if those env vars are referenced by the `command:` attribute of the `.yml` pipeline directly there's no issue, because those env vars are then resolved when `buildkite-agent pipeline upload` parses the pipeline [and interpolate those values at that time](https://buildkite.com/docs/pipelines/environment-variables#variable-interpolation), and by the time the command to run is passed over to the VM via `hostmgr generate buildkite-job`, that value has already been resolved, so the VM will receive the value of the env var, not the reference to it
2. But if those env vars are referenced in a `.buildkite/command/*.sh` that is called by the `command:` attribute of the `.yml` (in other words, we have an additional level of indirection), then that env var will only be evaluated when the VM will run the `BUILDKITE_COMMAND` that tells it to call that `.buildkite/command/*.sh` script, but the env var will not be available in the VM itself.

## Workaround in the meantime

Until we fix the issue, we should rely on the fact that point 1 above (env vars referenced by the `command` attribute in the `.yml` works (because they are interpolated at `pipeline upload`-time), and that the limitation only applies to env vars referenced to the `.sh` scripts run by the VM.

So this means that typically if your `.sh` scripts need to access some `FOO` env vars, you should instead:
 - Make the `.sh` script take that value that you need as an input parameter to the script (`$1`, `$2`, …)
    - You could then start your `.sh` script with `FOO="$1"` etc to re-assign those parameters locally within the script
 - Then in your `pipeline.yml`, pass the env var's value in the `command:` attribute calling the script (e.g. `.buildkite/commands/myscript.sh "$FOO"`

That way, the `$FOO` env var will be interpolated by the uploaded agent during `pipeline upload`, and would already be resolved to its real value when it's passed to the script that will be run by the VM (`.buildkite/commands/myscript.sh "value-of-foo"`), avoiding the problem altogether.

---

Of course, in the long run, we want to fix the core of the issue, so:

## Proposed Solution

It could be worth checking if there's a way to access the list of env vars listed in the `env:` attribute of the `step` being run by the YAML pipeline (+ the ones declared in `env:` at the root of the pipeline and that apply to all steps). If so, we could make `hostmgr` also export those env vars in the `hostmgr generate buildkite-job` script passed by SSH to the VM, allowing us to reference those from within our `.buildkite/commands/*.sh` scripts.

The nice thing with that approach is that we'd still keep the security aspect of not transferring all env vars blindly but only the ones explicitly declared for that `step`, using the `env` attribute as an allowlist of env vars to transfer to the VM for that job.

---

## Potential technical solution

I found multiple ways to get the list of env vars known to a job:
 - Reading the file at path [`$BUILDKITE_ENV_FILE`](https://buildkite.com/docs/pipelines/environment-variables#BUILDKITE_ENV_FILE)
 - Using [the Rest API](https://buildkite.com/docs/apis/rest-api/jobs#get-a-jobs-environment-variables)
 - Using [the Job api (Unix Socket)](https://buildkite.com/docs/agent/v3#promoted-experiments-job-api)[^1]
 - ~Using [`buildkite-agent env dump`](https://buildkite.com/docs/agent/v3/cli-env)~ — nevermind, this dumps the env of the `buildkite-agent` process itself (including `HOME`, `USER`, etc…) not the env of the job.

[^1]: note that the `job-api-experiment` [has been promoted to official feature in agent version `3.64`](https://buildkite.com/docs/agent/v3#promoted-experiments-job-api)—and [we currently use `3.65`](https://github.com/Automattic/buildkite-ci/blob/1086ae34ea00af5a7c1c33ff28789a513eb503cd/src/agents/macos-hosts/group_vars/all.yml#L16-L17). So this should already be available and working on our macOS hosts.

All those contain more env vars than just the ones declared on the `env:` attribute of the step. In particular it seems to also contain:

1. `env:` vars passed when calling the API to trigger a new build (in particular: the `PIPELINE=` we pass when we want to trigger a different pipeline than the default one via API call)
2. `env:` vars provided at the pipeline root level (like we often do for `IMAGE_ID`, instead of repeating that one on each `step`)
3. `env:` vars provided at the step level
4. `BUILDKITE_*` env vars declared by Buildkite itself

But it should be easy to make `GenerateBuildkiteJobScript` filter out the `BUILDKITE_*` ones from that list and only call `addEnvironmentVariable(name:,value:)` for the remaining ones[^2].

[^2]: One might think that we could also just keep the `BUILDKITE_*` ones from that list, and remove the call to `copyEnvironementVariables(prefixedBy:)` in our script instead. But that would not be equivalent, because `copyEnvironmentVariables` [is based on the list of env vars from `ProcessInfo.processInfo.environment`](https://github.com/Automattic/hostmgr/blob/trunk/Sources/libhostmgr/BuildkiteScriptBuilder.swift#L41-L43), which includes additional `BUILDKITE_*` env vars that are exposed to the agent itself (e.g. `BUILDKITE_AGENT_ACCESS_TOKEN `, etc) not just the ones exposed to the job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propagate custom `env` variables declared in the Buildkite pipeline to the VM #110

Current state

The issue

Workaround in the meantime

Proposed Solution

Potential technical solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Propagate custom env variables declared in the Buildkite pipeline to the VM #110

Description

Current state

The issue

Workaround in the meantime

Proposed Solution

Potential technical solution

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Propagate custom `env` variables declared in the Buildkite pipeline to the VM #110