-
Notifications
You must be signed in to change notification settings - Fork 707
reconnect when agent is offline #1142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,6 +103,7 @@ public long check(EC2Computer c) { | |
| long currentTime = this.clock.millis(); | ||
|
|
||
| if (currentTime > nextCheckAfter) { | ||
| attemptReconnectIfOffline(c); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to do this even if
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean I can see it both ways. My opinion is that this is more along the lines of handling network instability versus what to do with a node if it is idle. |
||
| long intervalMins = internalCheck(c); | ||
| nextCheckAfter = currentTime + TimeUnit.MINUTES.toMillis(intervalMins); | ||
| return intervalMins; | ||
|
|
@@ -248,6 +249,17 @@ private long internalCheck(EC2Computer computer) { | |
| return CHECK_INTERVAL_MINUTES; | ||
| } | ||
|
|
||
| private void attemptReconnectIfOffline(EC2Computer computer) { | ||
| if (computer.isOffline()) { | ||
| LOGGER.warning("EC2Computer " + computer.getName() + " is offline"); | ||
| if (!computer.isConnecting()) { | ||
| // Keep retrying connection to agent until the job times out | ||
| LOGGER.warning("Attempting to reconnect EC2Computer " + computer.getName()); | ||
| computer.connect(false); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /* | ||
| * Checks if there are any items in the queue that are waiting for this node explicitly. | ||
| * This prevents a node from being taken offline while there are Ivy/Maven Modules waiting to build. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after rebasing this PR to master, I discovered that the offline check does not happen if the EC2 agent is set to never be terminated, i.e.
idleTerminationMinutes=0.I moved it here to guarantee the offline check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just use
git pullinside a PR branch and avoid force-pushing, which breaks incremental review in most cases. (Unlikely to matter for a PR with such a short diff, but a good habit.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad choice of words. I meant after i clicked the
update PRbutton. But yes, noted