Skip to content

git: shallow clone via ls-remote for sanity, simplify mirror logic#27

Closed
gastmaier wants to merge 13 commits intoanalogdevicesinc:mainfrom
gastmaier:cim-shallow
Closed

git: shallow clone via ls-remote for sanity, simplify mirror logic#27
gastmaier wants to merge 13 commits intoanalogdevicesinc:mainfrom
gastmaier:cim-shallow

Conversation

@gastmaier
Copy link
Copy Markdown
Contributor

@gastmaier gastmaier commented Apr 14, 2026

Reworks the git operations layer to assume shallow clone.

Minor change:

  • ls_remote returns (sha, refname) tuples instead of refname, allows to resolve a symbolic ref to its exact object without a local clone.

New:

  • resolve_fetch_refspec maps a reference string to a (fetch_refspec, update_ref_name, sha) tuple. Supports refs/tags/* refs/heads/* sha HEAD and implicit tag and branch.
  • fetch_ref fetches a single reference
  • cat_file checks whether an object is already present

Main changes:

  • clone_repo: do init, remote add, fetch, checkout FETCH_HEAD, to let arbitrary sha.
  • window hacks have been removed.
  • execute_git_clone threaded clone has been removed (was not included in the mirror path, deadcode by default).
  • removed is_branch_reference, get_latest_commit_for_branch, get_latest_commit_for_remote_branch, use ls_remote instead
  • update_git has been removed (deadcode).

Simplification:

The _no_mirror variants of clone_repo_to_workspace, handle_existing_workspace_repo, update_workspace_repos, and update_workspace_repos_with_result are merged into their counterparts with a added no_mirror: bool / Option<&Path> parameter.

manifests$ time cim init --target jupiter-sdr  --source  .

Initializing workspace at: /home/me/dsdk-jupiter-sdr
Updating mirror repositories...
✓ build (cloned)
✓ trusted-firmware-a (cloned)
✓ u-boot (cloned)
✓ linux (cloned)

Initializing workspace repositories...
✓ trusted-firmware-a (cloned and checked out jupiter to latest: 4683880e)
✓ build (cloned and checked out zynqmp-boot-bin to latest: 2a57beea)
✓ u-boot (cloned and checked out jupiter-sdr to latest: b768132d)
✓ linux (cloned and checked out main to latest: 9b0170df)
✓ Workspace initialized successfully at: /home/me/dsdk-jupiter-sdr

real    1m29.011s
user    0m21.238s
sys     0m11.412s

dsdk-jupiter-sdr$ time cim update
Workspace: /home/me/dsdk-jupiter-sdr
Updating mirror repositories...
✓ trusted-firmware-a
✓ u-boot
✓ build
✓ linux

Updating workspace repositories...
✓ trusted-firmware-a (updated jupiter to latest: 4683880e)
✓ u-boot (updated jupiter-sdr to latest: b768132d)
✓ build (updated zynqmp-boot-bin to latest: 2a57beea)
✓ linux (updated main to latest: 9b0170df)

real    0m2.884s
user    0m1.123s
sys     0m0.334s

Notes:

Since the mirror is a local copy, a better alternative is to use worktrees, and bring the mirror feature in a future instance as a real, Ethernet or read-only storage, mirror.
There is a new scoped thread that allows to drop many .clone() in the source code.

The method is not used anywhere, remove it to simplify maintanence.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
It is a pre-requisite to have a functional git instance falling back on
error, only on windows, is not useful for subsequent commands.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
The sha of the ls-remote command was being discarted, but is useful to
match a sha to a branch,tag, and vice-versa.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Method to fetch a single reference (sha, branch, tag, ...) with a
specified depth (usually 1).

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Remove the shallow methods, to refactor in a refactored clone_repo with
shallow clone capability.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Add method to check if object, like commit shas, exists.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Determine the fetch refspec, update-ref target, and resolved SHA from
ls-remote pairs.

Returns (fetch_refspec, update_ref_name, sha), in order of precedence:

- Tag v1.0   : (refs/tags/v1.0, refs/tags/v1.0, Some(sha))
- Branch foo : (refs/heads/foo, refs/heads/foo, Some(sha))
- Commit SHA : (sha, refs/heads/trunk, None)

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Test corner cases of the resolution method.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
The method was only called for the no-mirror or mirror not found
condition, for the default with mirror, the original full clone happens
without any feedback. Clean-up the method for now.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Now works uniformly for branches (refs/heads/*), tags (refs/tags/*),
commit SHAs, and HEAD. Uses init -> remote add -> fetch -> checkout
FETCH_HEAD to achieve this. Drops argument reference (not useful for
same host with write access), and adds depth to set the depth to fetch
the contents.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Use git ls-remote returned tuple for branch and tag detection, instead
of relying in the local copy.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Remove the methods that rely on a full clone of the repository:
- is_branch_reference
- get_latest_commit_for_branch
- get_latest_commit_for_remote_branch

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
There was a fairly amount of duplication due to _no_mirror variants of
the same methods. Merge them with a no_mirror boolean parameter instead.

Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Comment on lines +699 to +708
if !git_operations::cat_file(&mirror_repo_path, &target) {
let fetch_ok =
git_operations::fetch_ref(&mirror_repo_path, "origin", &fetch_refspec, 1)
.is_ok_and(|r| r.is_success());
if fetch_ok {
let _ = git_operations::update_ref(
&mirror_repo_path,
&update_ref_name,
"FETCH_HEAD",
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that becomes unnecessary if we use worktrees, all mirror paths vanish.

@gastmaier gastmaier marked this pull request as ready for review April 15, 2026 12:21
@joabech
Copy link
Copy Markdown
Contributor

joabech commented Apr 20, 2026

Now I have spent a good amount of time on this PR and here is my summary. First, I wanted to compare time and size from different stages, with a script I did three things. Setup from scratch (empty mirrors), setup with mirrors primed and then run a "git remote update" in each gits. This is the "jupiter-sdr" setup that involves Linux, U-boot and TF-A. Here are the stats (cim is current upstream and cim-pr27 is the binary when compiled with the changes in this PR.:

Metric                                                     cim            cim-pr27
----------------------------------------------------------------------------------
Cold init time                                          584.2s               84.0s
Warm init time                                            8.1s               16.3s
Foreach time                                              1.7s              857.9s
Mirror trusted-firmware-a (cold)                       44.0 MB              7.8 MB
Mirror trusted-firmware-a (warm)                       44.0 MB              7.8 MB
Mirror u-boot (cold)                                  307.5 MB             27.8 MB
Mirror u-boot (warm)                                  307.5 MB             27.8 MB
Mirror linux (cold)                                     4.5 GB            268.2 MB
Mirror linux (warm)                                     4.5 GB            268.2 MB
Workspace after cold init                               1.6 GB              1.9 GB
Workspace after warm init                               1.6 GB              1.9 GB
Workspace after foreach                                 1.6 GB              6.5 GB

Some reflections:

  • This PR is much faster from cold init.
  • This PR is slower from warm init.
  • git remote update (mentioned as Foreach time above) is much much slower on this PR. The reason for this is that we have setup both the mirror and the workspace with --depth 1, which forces us to clone the entire history upon the git remote update.
  • This PR has much smaller mirror overall.
  • This PR produce larger workspaces and much larger workspaces as soon as you have used git remote update.

Mirror
I think that the mirror with the changes in this PR loose its purpose, since it will no longer serve as a real mirror. My intention with the mirror has been that is shall have the majority of the source code cached, so we only have to get delta's from internet. Also, as we can see from the numbers above, once the mirror is primed, then the subsequent runs is very fast. So, I want to keep the current behavior.

In relation to this I've also been thinking about whether worktree's would work or not. Technically it would, but in practice I believe it will be problematic, perhaps not if you as a individual is the only user or the mirror, but if the mirror is stored as a shared mirror, then we would have different users contributing with different branches etc.

Workspace
I kind of like the idea of being able to use --depth=1, but I would like that to be optional. I.e,. we could extend so we optionally can call using cim init ... --depth=1. That would still not affect the mirrors, but would affect the workspace cloning. Perhaps that could save some (clone) time and disc space during CI/CD etc.

PR continuation
As said, I'd like to keep the current mirror behavior, but I think there are good improvements in the patch series. The windows hack removals, also perhaps the ls-remote.

This patch https://github.com/adi-innersource/sdk-manager/commit/95f35cdcc1e9888fae38be23c01c8d65e268cf11 could be worth mention as well, which was a try to ensure that mirror finds and fetches updates from the upstream trees. We had an issue that branches that we're created wasn't found in the mirror and then the workspace setup failed.

Do you think you can rework this PR, to a) skip things touching the mirror b) implement a cim init --depth=x flag c) keep the windows hack removal and d) see if we still can leverage ls-remote without ending with the issues seen on the PR above. Perhaps even worth doing things in different PRs.

@joabech
Copy link
Copy Markdown
Contributor

joabech commented Apr 21, 2026

I have taken most patches in this series and merged them to main. They mirror feature has been left out, i.e., the old mirror feature is as is. I took the liberty to change the git commit messages a bit. I think we can close this PR now.

@gastmaier
Copy link
Copy Markdown
Contributor Author

gastmaier commented Apr 23, 2026

Acked, closing it! We definitely need a more broad approach than this pr touched for the shallow improvements.

some ideas for the future:
-> workspace is a worktree of the mirror
-> real mirror is the --reference of the first clone step
-> libgit2 instead of subprocess: allows us to better use the pipe methods without 'undefined' (or guessing) returns
-> ls_remote will return in a future pr

Noted on git remote update, no comment atm because I need to look into the 'why' first

@gastmaier gastmaier closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants