Skip to content

Conversation

@vdemeester
Copy link
Member

Changes

This PR fixes an intermittent flake in the authenticating-git-commands e2e test that occurs with Kubernetes native sidecar support.

Problem

The test was experiencing intermittent "Permission denied (publickey)" failures when attempting git clone operations. Analysis of the failure logs revealed:

  • SSH client attempted password authentication (not publickey)
  • Server logs showed "Failed password" errors
  • This indicated SSH wasn't finding/using the expected private key

Root Cause

The entrypoint's CopyCredsToHome() function recursively copies credentials from /tekton/creds/.ssh/ to $HOME/.ssh/ file-by-file:

  1. id_ssh-key-for-git (private key)
  2. config (SSH config pointing to the key)
  3. known_hosts (if present in secret)

The race condition: The script started executing while the file-by-file copy was still in progress. When the script ran git clone, SSH couldn't find the config file that tells it to use the non-standard key name id_ssh-key-for-git. Without this config, SSH tried default key names, found nothing, and fell back to password authentication, which failed.

Solution

Add a wait loop that explicitly checks for both required files (id_ssh-key-for-git AND config) before proceeding with git operations:

while [ ! -f /root/.ssh/id_ssh-key-for-git ] || [ ! -f /root/.ssh/config ]; do
  # wait up to 30 seconds with diagnostic output on timeout
done

This ensures SSH has everything it needs for authentication before attempting git clone.

Testing

  • The fix applies to both v1 and beta versions of the test for consistency
  • Added detailed error messages showing expected files and directory contents if timeout occurs
  • 30-second timeout is generous for normal operation (credentials copy in < 1 second) while catching genuine issues

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
    • N/A - internal test fix only
  • Has Tests included if any functionality added or changed
    • N/A - this IS the test fix
  • pre-commit Passed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
    • Will add /kind flake after creation
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release
    • N/A - no user-facing changes

Release Notes

NONE

@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Dec 18, 2025
@vdemeester
Copy link
Member Author

/kind flake

@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/flake Categorizes issue or PR as related to a flakey test labels Dec 18, 2025
@waveywaves waveywaves self-assigned this Dec 18, 2025
- Wait for both private key and SSH config files before git operations
- Entrypoint copies credentials asynchronously from /tekton/creds to $HOME
- Without config file, SSH doesn't know to use non-standard key name
- Resolves "Failed password" errors from incomplete credential copy

Signed-off-by: Vincent Demeester <[email protected]>
@vdemeester vdemeester force-pushed the fix/authenticating-git-commands-race-condition branch from ff8db0b to 42f6aef Compare December 19, 2025 10:04
@vdemeester
Copy link
Member Author

Thanks for catching that @waveywaves! You're absolutely right - the timeout calculation was off.

Fixed in the latest commit by changing sleep 0.1 to sleep 1 to match the increment. Now it correctly waits up to 30 seconds (not 3 seconds) for the SSH credentials to be copied.

Copy link
Member

@waveywaves waveywaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: waveywaves

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/flake Categorizes issue or PR as related to a flakey test release-note-none Denotes a PR that doesnt merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants