Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[communicator] add proxy_command support to connection block #36643

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mattlqx
Copy link

@mattlqx mattlqx commented Mar 5, 2025

I have a Terraform use-case where I am using an AWS EC2 Instance Connect Endpoint to access instances in different AWS accounts that do not have direct network access from the network location in which Terraform runs so that SSH connections can be established for use on remote-exec provisioner blocks. AWS's connection guide offers a few different methods:

  1. Using a pipe to aws ec2-instance-connect open-tunnel via an ssh -o ProxyCommand= argument.
  2. Running aws ec2-instance-connect open-tunnel --local-port as a background process to be used as a TCP proxy.
  3. Using aws ec2-instance-connect ssh to open an interactive ssh session.

Of these, using an out-of-the-box Terraform I attempted to get number 2 to work. It was hacky, using terraform_data with local-exec provisioners to control starting and stopping of the process in a detached tmux. I had some success with it but am running into issues with handling Terraform resource provisioners leaving the processes hanging. It would simply be much cleaner to implement the ability to use a ProxyCommand-like attribute on the Terraform ssh connections.

This PR implements a proxy_command attribute on the connection block that will pipe SSH communication through an exec'd process. This enables the ability to use aws ec2-instance-connect as described in number 1. No workarounds are then required to use Terraform with instances that are only reachable through EC2 Instance Connect or potentially any number of other proxy methods.

Target Release

1.12.x

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@crw
Copy link
Contributor

crw commented Mar 5, 2025

Thanks for this submission! I will raise it in triage.

@crw
Copy link
Contributor

crw commented Mar 12, 2025

In triage we were not sure if this was necessary due to ProxyJump (-J) being a more modern version of ProxyCommand. However, subsequent investigation revealed that ProxyJump, in not allowing for running arbitrary commands, does not solve the problem at hand which requires running the aws cli.

That said, this being related to enabling a provisioner falls under the general guidance around changes to Provisioners found in the Contributing guide: https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. I will raise this in triage again to see if this is something we would be willing to review in the short term. Thanks again for the submission!

@mattlqx
Copy link
Author

mattlqx commented Mar 13, 2025

Thanks for the update. Can you help explain a little more what the recommendation would be for preparing/communicating with an instance created by Terraform? It seems like in the docs you're pushed towards userdata, but the lack of direct feedback to the Terraform run via that mechanism isn't desirable. While I understand the desire to not use these general provisioners, they are a core part of Terraform's flexibility when dealing with operating systems on the resources being created (at minimum to kickstart or ensure operation of other management systems in real time).

I'm trying to envision what the alternative would be... a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I appreciate any more direction here.

@crw
Copy link
Contributor

crw commented Mar 13, 2025

a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I

Possibly? I agree that is the direction we seem to be pushing users when these types of use cases come up. When we review this again (next Tuesday) I will also raise this question.

Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the proposal still needs to go though triage, a few things caught my eye on a quick read through so I'm dropping notes in here for whomever ends up reviewing the proposal.

return nil, fmt.Errorf("Error parsing address: %s", err)
}

command := strings.Replace(proxyCommand, "%h", host, -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is injecting config values into an arbitrary command it does set up the possibility for command injection. Something to consider for a security review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after reading through the proposal, I'd probably opt to leave out this special replacement syntax since it's OpenSSH specific, and just make use of normal string interpolation for the arguments.


// Set up the command to run in its own process group
cmd.SysProcAttr = &syscall.SysProcAttr{
Setpgid: true, // Create a new process group
Copy link
Member

@jbardin jbardin Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a separate process group? It shouldn't be detached from the parent process, should it? This also won't compile on windows, so would need to be separated into a file with appropriate build constraints if it is actually required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I was battling here was with the aws ec2-instance-connect open-tunnel command specifically. The process would be left in a detached zombied state after running here. After further investigation, it seems to be an issue with that command not properly cleaning up its websockets and hanging around after we close the pipe to it. This PR over there fixes that issue.

But my goal here was still to cleanup any process that we spawn here and a process group is used to identify its child processes as well. Like in the case I was actually executing a shell script (mostly to avoid escaping issues) that performed an aws assume-role and then ran the instance connect command.


// Read reads data from the connection (the command's stdout)
func (c *proxyCommandConn) Read(b []byte) (int, error) {
if c.closed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a data race, closed must always be protected under the same mutex.

if err != nil {
log.Printf("[ERROR] Error getting process group ID: %s", err)
// Fall back to killing just the process if we can't get the process group
if err := c.cmd.Process.Signal(syscall.SIGHUP); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the proxy command is meant to be generic, we can't count on it knowing what to do with a HUP, and windows has no HUP. Since this is a last resort it should probably use TERM/KILL signal. The entire process could also use CommandContext

if err := c.stdinPipe.Close(); err != nil {
log.Printf("[ERROR] Error closing stdin pipe: %s", err)
}
if err := c.stdoutPipe.Close(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. You normally don't need to call close here, because you can only read output, and it's incorrect to call Wait until those reads have completed anyway.

@crw crw added enhancement waiting-response An issue/pull request is waiting for a response from the community provisioners labels Mar 19, 2025
@crw
Copy link
Contributor

crw commented Mar 19, 2025

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

@jbardin
Copy link
Member

jbardin commented Mar 19, 2025

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

@mattlqx
Copy link
Author

mattlqx commented Mar 21, 2025

Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks!

Thanks, I'll start addressing the feedback today.

@mattlqx
Copy link
Author

mattlqx commented Mar 21, 2025

Something else to consider, could the AWS provider create a ec2-instance-connect ephemeral resource which does this? An ssh tunnel was one of the canonical examples for ephemeral resources after all.

I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with.

@crw
Copy link
Contributor

crw commented Mar 21, 2025

@mattlqx I suspect GitHub notifications doesn't always notify me on certain PR actions, so give me a mention when this is ready for review again. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement provisioners waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants