Skip to content

Implement git-clone#45

Open
tnevrlka wants to merge 10 commits intokonflux-ci:mainfrom
tnevrlka:git-clone
Open

Implement git-clone#45
tnevrlka wants to merge 10 commits intokonflux-ci:mainfrom
tnevrlka:git-clone

Conversation

@tnevrlka
Copy link
Copy Markdown
Contributor

Rewrite of Bash logic of the git-clone task in Go.

See individual commits, they should form logical chunks.

Reviewer notes: I'm sorry, I tried

@tnevrlka tnevrlka requested a review from a team as a code owner January 30, 2026 08:11
@snyk-io
Copy link
Copy Markdown

snyk-io bot commented Jan 30, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it to sparseCheckout so far. Surprisingly reviewable! 🌮

Having a separate package for a command makes a lot of sense, I might steal that for the build command as well 👍

Comment on lines +116 to +122
"user-home": {
Name: "user-home",
EnvVarName: "KBC_GIT_CLONE_USER_HOME",
TypeKind: reflect.String,
DefaultValue: "/tekton/home",
Usage: "Absolute path to the user's home directory. Set this explicitly if you are running the image as a non-root user.",
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid making the CLI tekton-specific. Can this default to $HOME? In the Task, we can set /tekton/home

EnvVarName: "KBC_GIT_CLONE_MERGE_SOURCE_DEPTH",
TypeKind: reflect.Int,
DefaultValue: "0",
Usage: "Perform a shallow fetch of the target branch, fetching only the most recent N commits. If empty, fetches the full history of the target branch.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 0 mean full history or shallow clone? If full history, then the description should mention that. If shallow clone, then it's inconsistent with the other depth option :harold:

Comment on lines +51 to +64
// Retry logic for network operations
var lastErr error
for attempt := 1; attempt <= maxAttempts; attempt++ {
if attempt > 1 {
l.Logger.Infof("Retrying fetch (attempt %d of %d)", attempt, maxAttempts)
}

err := c.CliWrappers.GitCli.FetchWithRefspec(checkoutDir, "origin", refspec, c.Params.Depth)
if err == nil {
return nil
}
lastErr = err
l.Logger.Warnf("Fetch attempt %d failed: %v", attempt, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our generic retryer API not work here?

Comment on lines +32 to +42
entries, err := os.ReadDir(checkoutDir)
if err != nil {
return fmt.Errorf("failed to read checkout directory: %w", err)
}

for _, entry := range entries {
entryPath := filepath.Join(checkoutDir, entry.Name())
if err := os.RemoveAll(entryPath); err != nil {
return fmt.Errorf("failed to remove %s: %w", entryPath, err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about why it's done this way rather than os.RemoveAll(checkoutDir)?

I assume checkoutDir="." would cause problems otherwise

RemoteAdd(workdir, name, url string) (string, error)
FetchWithRefspec(workdir, remote, refspec string, depth int) error
Checkout(workdir, ref string) error
SubmoduleUpdate(workdir string, init bool, paths string) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the API, paths should probably be a []string and the caller should decide how it splits CLI arguments into an array

RemoteAdd(workdir, name, url string) (string, error)
FetchWithRefspec(workdir, remote, refspec string, depth int) error
Checkout(workdir, ref string) error
SetSparseCheckout(workdir, sparseCheckoutDirectories string) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as submodule paths

Comment on lines +86 to +90
// setupBasicAuth sets up git credentials from a basic-auth workspace.
// Supports two formats:
// 1. .git-credentials and .gitconfig files (copied directly)
// 2. username and password files (kubernetes.io/basic-auth secret format)
func (c *GitClone) setupBasicAuth() error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if some parts of this function could be reused in the prefetch command, as it was apparently copied from the git-clone bash script:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind; let's refactor this later

EnvVarName: "KBC_GIT_CLONE_TARGET_BRANCH",
TypeKind: reflect.String,
DefaultValue: "main",
Usage: "The target branch to merge into the revision (if mergeTargetBranch is true).",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: since we're not making the params camelCase anymore, seeing mergeTargetBranch here looks a little odd. There are other params that have the same issue in their usage notes.

Name: "subdirectory",
EnvVarName: "KBC_GIT_CLONE_SUBDIRECTORY",
TypeKind: reflect.String,
DefaultValue: "source",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not using a subdirectory by default is a more intuitive use case here, although I'm looking this from the perspective of a generic user, not only our current build pipelines use case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, coupled with my other comment about delete-existing, I'd make it: not use a subdirectory by default, but also to not clean up the checkout dir by default.

Name: "delete-existing",
EnvVarName: "KBC_GIT_CLONE_DELETE_EXISTING",
TypeKind: reflect.Bool,
DefaultValue: "true",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also probably a dangerous default. If I used it on my home dir with a subdirectory "" , then I would be very sad.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, there seems a logic problem. It defaults to true, but there is no way to disable it.

DefaultValue: "",
Usage: "Opt out of proxying HTTP/HTTPS requests.",
},
"verbose": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this param being used for anything? Do we plan to use it for anything?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'd keep only loglevel, if you want verbose, set debug level

@brunoapimentel
Copy link
Copy Markdown
Contributor

Commit message for eca8c2e:

git-clone: Sparse checkout
If enabled, checkout only repositories that are provided

I think you meant s/repositories/directories/.

Copy link
Copy Markdown
Contributor

@tkdchen tkdchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go format formats these files:

	modified:   pkg/commands/git_clone/git_clone_test.go
	modified:   pkg/commands/git_clone/mocks_test.go
	modified:   pkg/commands/git_clone/results.go

Comment on lines +46 to +52
"depth": {
Name: "depth",
EnvVarName: "KBC_GIT_CLONE_DEPTH",
TypeKind: reflect.Int,
DefaultValue: "1",
Usage: "Perform a shallow clone, fetching only the most recent N commits.",
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 can be passed to parameter depth, it causes a clone with full commit history. It would be good to describe the behavior explicitly.

Name: "delete-existing",
EnvVarName: "KBC_GIT_CLONE_DELETE_EXISTING",
TypeKind: reflect.Bool,
DefaultValue: "true",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, there seems a logic problem. It defaults to true, but there is no way to disable it.

l "github.com/konflux-ci/konflux-build-cli/pkg/logger"
)

func (c *GitClone) checkSymlinks(checkoutDir string) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not reference anything from GitClone. Let's make it a common function. I'm also think it should be possible to reuse it for SearchDockerfile.

Comment on lines +43 to +50
err = c.CliWrappers.GitCli.ConfigLocal(c.getCheckoutDir(), "user.email", "[email protected]")
if err != nil {
return err
}
err = c.CliWrappers.GitCli.ConfigLocal(c.getCheckoutDir(), "user.name", "Tekton Git Clone Task")
if err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be good to have a different name to replace tekton in the email and name. Probably "Konflux CI Build". I'm not sure honestly. Any idea?

Nitpick: make constants for the email and user name.

Comment on lines +60 to +66
"ssl-verify": {
Name: "ssl-verify",
EnvVarName: "KBC_GIT_CLONE_SSL_VERIFY",
TypeKind: reflect.Bool,
DefaultValue: "true",
Usage: "Set the `http.sslVerify` global git config. Setting this to `false` is not advised unless you are sure that you trust your git remote.",
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage shows:

      --ssl-verify http.sslVerify            Set the http.sslVerify global git config. Setting this to `false` is not advised unless you are sure that you trust your git remote. (default true)

But other boolean argument like --submodules does not have the type info:

      --submodules                           Initialize and fetch git submodules. (default true)

Another issue is, when I set --ssl-verify false, Params.SslVerify is still true:

INFO[0000] [param] SSL verify: true

}

// setupGitConfig configures git settings for SSL verification and CA bundle.
func (c *GitClone) setupGitConfig() error {
Copy link
Copy Markdown
Contributor

@tkdchen tkdchen Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setupGitConfig has side-effect by modifying the git config in host. ConfigLocal calls git config --local.

If the current working directory is not a git repository, git-clone will fail, for example of disabling http.sslVerify:

INFO[0000] Disabling SSL verification (http.sslVerify=false) 
FATA[0000] failed to configure http.sslVerify: git config failed with exit code 128: exit status 128 (stderr: fatal: --local can only be used inside a git repository
) 

git-clone supports passing config via --config, https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---configkeyvalue

Please write a test to ensure git-clone does not modify the git config in host, or only makes the config available in the target directory.

I made the following manual change for this testing due to the issue mentioned in #45 (comment).

 // setupGitConfig configures git settings for SSL verification and CA bundle.
 func (c *GitClone) setupGitConfig() error {
+       c.Params.SslVerify = false

TypeKind: reflect.String,
DefaultValue: "/tekton/home",
Usage: "Absolute path to the user's home directory. Set this explicitly if you are running the image as a non-root user.",
},
Copy link
Copy Markdown
Contributor

@tkdchen tkdchen Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Adam's comment, I think we even do not need a user-home. That makes it possible for git-clone to run in host rather than only in container.

git supports reading credentials from custom location, e.g. https://git-scm.com/docs/git-credential-store#Documentation/git-credential-store.txt---filepath.

It is worth to have a dedicated directory to hold credentials and config without having side-effects of writing execution data into the standard git locations in host.

Related comment #45 (comment)

Also ensure this behavior in tests that do not run in container.

Copy link
Copy Markdown

@MartinBasti MartinBasti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO recursive submodules aren't working

@tnevrlka tnevrlka force-pushed the git-clone branch 7 times, most recently from 97a539a to 8a8f719 Compare February 10, 2026 15:32
@tnevrlka
Copy link
Copy Markdown
Contributor Author

/agentic_review

1 similar comment
@tnevrlka
Copy link
Copy Markdown
Contributor Author

/agentic_review

@tnevrlka
Copy link
Copy Markdown
Contributor Author

/agentic_review

1 similar comment
@chmeliik
Copy link
Copy Markdown
Contributor

/agentic_review

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Mar 18, 2026

Persistent review updated to latest commit 9a3ae16

Comment on lines +204 to +229
// Validate subdirectory for path traversal
if c.Params.Subdirectory != "" {
if filepath.IsAbs(c.Params.Subdirectory) {
return fmt.Errorf("subdirectory must be a relative path, got absolute path: %s", c.Params.Subdirectory)
}
if strings.Contains(c.Params.Subdirectory, "..") {
return fmt.Errorf("subdirectory must not contain path traversal (..): %s", c.Params.Subdirectory)
}
// Verify the resolved path stays within OutputDir.
// When OutputDir is empty, filepath.Abs resolves relative to cwd,
// which is the correct base for validation.
baseDir := c.Params.OutputDir
if baseDir == "" {
baseDir = "."
}
absOutput, err := filepath.Abs(baseDir)
if err != nil {
return fmt.Errorf("failed to resolve output dir: %w", err)
}
absCheckout, err := filepath.Abs(filepath.Join(baseDir, c.Params.Subdirectory))
if err != nil {
return fmt.Errorf("failed to resolve checkout dir: %w", err)
}
if absCheckout != absOutput && !strings.HasPrefix(absCheckout, absOutput+string(filepath.Separator)) {
return fmt.Errorf("subdirectory must not escape output directory")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Subdir symlink escape 🐞 Bug ⛨ Security

validateParams() verifies the checkout path stays within OutputDir using filepath.Abs(), but does
not resolve symlinks, so an existing symlinked subdirectory (e.g., <output>/source -> /etc) can
cause cleanCheckoutDir() and performClone() to operate outside the intended workspace. This can
delete/init arbitrary filesystem locations before the later in-repo symlink check runs.
Agent Prompt
### Issue description
`validateParams()` ensures `Subdirectory` doesn't escape `OutputDir` using `filepath.Abs()` + `HasPrefix`, but this can be bypassed when any path component under `OutputDir` is an existing symlink (e.g. `OutputDir/source -> /etc`). Because `Run()` may call `cleanCheckoutDir()` (deleting files) and `performClone()` (git init/fetch/checkout) **before** `common.CheckSymlinks()`, a symlinked checkout path can cause operations outside the workspace.

### Issue Context
This is a checkout-directory *path* containment problem; the existing repo symlink scan (`common.CheckSymlinks`) validates symlinks *inside* the checked out tree and occurs later.

### Fix Focus Areas
- pkg/commands/git_clone/git_clone.go[190-229]
- pkg/commands/git_clone/git_clone.go[59-113]
- pkg/commands/git_clone/setup.go[18-47]
- pkg/commands/git_clone/clone.go[13-24]

### What to change
1. Add a robust containment check that resolves symlinks:
   - Resolve `OutputDir` with `Abs` + `EvalSymlinks`.
   - After ensuring/creating the checkout directory (or at least its parent), resolve `checkoutDir` with `Abs` + `EvalSymlinks` and confirm it is within the resolved output dir.
2. Before `cleanCheckoutDir()` removes entries, ensure the resolved checkout dir is within the resolved output dir and that the checkout dir itself is not a symlink (`os.Lstat`).
3. If `EvalSymlinks` can’t be applied because the path doesn’t exist yet, create the directory first (safely) and then resolve, or validate each existing path component with `Lstat` to ensure no symlink components are traversed.

### Acceptance criteria
- A pre-existing symlinked `Subdirectory` under `OutputDir` is rejected with a clear error.
- No filesystem deletion or git init can occur outside `OutputDir` due to symlink traversal.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Contributor Author

@tnevrlka tnevrlka Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not sure if this would be an issue when in Konflux... since this is technically a standalone CLI that can be ran on baremetal, I guess it can be addressed though

}

var minGitVersion = [3]int{2, 25, 0}
var gitVersionRegex = regexp.MustCompile(`git version (\d+)\.(\d+)\.(\d+)`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: move closer to usage.

@tnevrlka tnevrlka force-pushed the git-clone branch 2 times, most recently from b3c3f16 to 570e4df Compare March 31, 2026 14:14
c = &GitClone{
CliWrappers: CliWrappers{GitCli: _mockGitCli},
Params: &Params{
URL: "https://github.com/user/repo.git",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have the URL to non existing service unless strictly needed.

defer func() { _ = os.RemoveAll(tempDir) }()

gitClone, err := newGitClone(&gitclone.Params{
URL: "https://github.com/kelseyhightower/nocode",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using a repository we don't own. We might have tests failed very unexpectedly.


// Clone only specific directories using sparse checkout
gitClone, err := newGitClone(&gitclone.Params{
URL: "https://github.com/konflux-ci/build-definitions",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day it will be deprecated. Will it cause problems in the future?


// Test_GitClone_DoesNotWriteToHomeDirectory verifies that git-clone does not write
// .git-credentials, .gitconfig, or .ssh/ to the user's $HOME directory.
func Test_GitClone_DoesNotWriteToHomeDirectory(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails, it might mess up your real credentials.

@@ -0,0 +1,776 @@
package integration_tests
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplicity of the approach for integration tests buys, but we have several issues with it:

  • Very likely a containerized version of the clone command will be needed to do integration test of the whole pipeline
  • Some tests (for example Test_GitClone_DoesNotWriteToHomeDirectory) might be dangerous to run on host.
  • Usage of github.com and/or not owber repositories might bring problems, especially in air gapped environments.

In general, we have at least three options:

  1. Run git operations on host (current)
  2. Run git operations in a container like other integration tests do (safe, but still requires external resources).
  3. Setup local git server (e.g. Forgejo) and run tests in container (safest, but most heavy).

I lean towards option 3, but would like to hear your opinion @konflux-ci/build-maintainers .

Copy link
Copy Markdown
Contributor Author

@tnevrlka tnevrlka Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup local git server (e.g. Forgejo)

I guess it shouldn't be any harder than the Zot registry ¯\_(ツ)_/¯

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least option 2 would be good. Option 3 seems like it could be cleanly added onto option 2 later, so if it's not required at the moment, I don't mind the external dependency on github for now.

I think option 3 would be needed if we want to set up git repos in very specific states (like that case where git fetch --tags failed when running the first time but worked the second time)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 3 should not be much harder, there is code that creates commits. Also, we might to do it only once for all tests.

Copy link
Copy Markdown
Member

@mmorhun mmorhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving code part, but integration tests must be reworked in a follow up PR.

tnevrlka added 10 commits April 2, 2026 11:31
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Clean existing checkout directory contents before cloning.

Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Reject symlinks pointing outside the cloned repository.

Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Support basic auth (git credentials or K8s secrets) and SSH keys.

Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
Assisted-by: Claude Opus 4.5
Signed-off-by: Tomáš Nevrlka <[email protected]>
c.internalDir = internalDir

// Save and restore all environment variables that may be modified during Run().
restoreEnv := createEnvRestoreFunc("GIT_CONFIG_GLOBAL", "GIT_SSH_COMMAND", "GIT_SSL_NO_VERIFY")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to define constants for these environment variables in order to not repeat these strings in the code.

c.internalDir = internalDir

// Save and restore all environment variables that may be modified during Run().
restoreEnv := createEnvRestoreFunc("GIT_CONFIG_GLOBAL", "GIT_SSH_COMMAND", "GIT_SSL_NO_VERIFY")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing these environment variables via Cmd.Env would save code, i.e. createEnvRestoreFunc, setup functions and tests. It does not modify host's environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants