-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[communicator] add proxy_command
support to connection block
#36643
base: main
Are you sure you want to change the base?
Conversation
Thanks for this submission! I will raise it in triage. |
In triage we were not sure if this was necessary due to ProxyJump ( That said, this being related to enabling a provisioner falls under the general guidance around changes to Provisioners found in the Contributing guide: https://github.com/hashicorp/terraform/blob/main/.github/CONTRIBUTING.md#provisioners. I will raise this in triage again to see if this is something we would be willing to review in the short term. Thanks again for the submission! |
Thanks for the update. Can you help explain a little more what the recommendation would be for preparing/communicating with an instance created by Terraform? It seems like in the docs you're pushed towards userdata, but the lack of direct feedback to the Terraform run via that mechanism isn't desirable. While I understand the desire to not use these general provisioners, they are a core part of Terraform's flexibility when dealing with operating systems on the resources being created (at minimum to kickstart or ensure operation of other management systems in real time). I'm trying to envision what the alternative would be... a provider that we implement a custom communication method (in this case EC2 Instance Connect) to perform a set of commands like the provisioners would take anyway? I appreciate any more direction here. |
Possibly? I agree that is the direction we seem to be pushing users when these types of use cases come up. When we review this again (next Tuesday) I will also raise this question. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the proposal still needs to go though triage, a few things caught my eye on a quick read through so I'm dropping notes in here for whomever ends up reviewing the proposal.
return nil, fmt.Errorf("Error parsing address: %s", err) | ||
} | ||
|
||
command := strings.Replace(proxyCommand, "%h", host, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is injecting config values into an arbitrary command it does set up the possibility for command injection. Something to consider for a security review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, after reading through the proposal, I'd probably opt to leave out this special replacement syntax since it's OpenSSH specific, and just make use of normal string interpolation for the arguments.
|
||
// Set up the command to run in its own process group | ||
cmd.SysProcAttr = &syscall.SysProcAttr{ | ||
Setpgid: true, // Create a new process group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a separate process group? It shouldn't be detached from the parent process, should it? This also won't compile on windows, so would need to be separated into a file with appropriate build constraints if it is actually required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I was battling here was with the aws ec2-instance-connect open-tunnel
command specifically. The process would be left in a detached zombied state after running here. After further investigation, it seems to be an issue with that command not properly cleaning up its websockets and hanging around after we close the pipe to it. This PR over there fixes that issue.
But my goal here was still to cleanup any process that we spawn here and a process group is used to identify its child processes as well. Like in the case I was actually executing a shell script (mostly to avoid escaping issues) that performed an aws assume-role and then ran the instance connect command.
|
||
// Read reads data from the connection (the command's stdout) | ||
func (c *proxyCommandConn) Read(b []byte) (int, error) { | ||
if c.closed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a data race, closed
must always be protected under the same mutex.
if err != nil { | ||
log.Printf("[ERROR] Error getting process group ID: %s", err) | ||
// Fall back to killing just the process if we can't get the process group | ||
if err := c.cmd.Process.Signal(syscall.SIGHUP); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the proxy command is meant to be generic, we can't count on it knowing what to do with a HUP, and windows has no HUP. Since this is a last resort it should probably use TERM/KILL signal. The entire process could also use CommandContext
if err := c.stdinPipe.Close(); err != nil { | ||
log.Printf("[ERROR] Error closing stdin pipe: %s", err) | ||
} | ||
if err := c.stdoutPipe.Close(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. You normally don't need to call close here, because you can only read output, and it's incorrect to call Wait
until those reads have completed anyway.
Hi @mattlqx, we'd be happy to consider this change pending the above code review requests. Thanks! |
Something else to consider, could the AWS provider create a |
Thanks, I'll start addressing the feedback today. |
I'm not familiar with how that would work at a provider level. The command does need to be specified per-host however as the destination is determined by the arguments you execute it with. |
@mattlqx I suspect GitHub notifications doesn't always notify me on certain PR actions, so give me a mention when this is ready for review again. Thanks! |
I have a Terraform use-case where I am using an AWS EC2 Instance Connect Endpoint to access instances in different AWS accounts that do not have direct network access from the network location in which Terraform runs so that SSH connections can be established for use on
remote-exec
provisioner blocks. AWS's connection guide offers a few different methods:aws ec2-instance-connect open-tunnel
via anssh -o ProxyCommand=
argument.aws ec2-instance-connect open-tunnel --local-port
as a background process to be used as a TCP proxy.aws ec2-instance-connect ssh
to open an interactive ssh session.Of these, using an out-of-the-box Terraform I attempted to get number 2 to work. It was hacky, using
terraform_data
withlocal-exec
provisioners to control starting and stopping of the process in a detached tmux. I had some success with it but am running into issues with handling Terraform resource provisioners leaving the processes hanging. It would simply be much cleaner to implement the ability to use a ProxyCommand-like attribute on the Terraform ssh connections.This PR implements a
proxy_command
attribute on theconnection
block that will pipe SSH communication through an exec'd process. This enables the ability to useaws ec2-instance-connect
as described in number 1. No workarounds are then required to use Terraform with instances that are only reachable through EC2 Instance Connect or potentially any number of other proxy methods.Target Release
1.12.x
CHANGELOG entry