Skip to content

Presumably corrupted git-annex branches #132

@adswa

Description

@adswa

This re-posts an issue I previously created at https://gin.g-node.org/G-Node/Info/issues/62.

Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science.

I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at datalad/datalad-gooey#349.

Describe the bug

The issue presents as follows:
At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).

An example is this dataset (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a git annex addurl call). If I add a new gin repository as a remote, and push it using datalad push, the push succeeds for the default branch, but fails with a non-fast-forward error for the git-annex branch, similar to the one below:

*	refs/heads/master:refs/heads/master	[new branch]
!	refs/heads/git-annex:refs/heads/git-annex	[rejected] (non-fast-forward)
Done'] [err: 'Delta compression using up to 16 threads
Total 422 (delta 198), reused 149 (delta 33), pack-reused 0                                                                                      error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.']

Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex.
The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:

(gooyey) C:\Users\adina\Desktop\ml-books2>git log git-annex
commit 4e226892a69de8989b56cef5f41c49f138aee09e (git-annex)
Author: Adina Wagner <[email protected]>
Date:   Fri Oct 14 09:22:57 2022 +0200

    continuing transition ["forget git history"]

commit 38be5a7d07b019e2a7e42c8dff0734926c276f7d
Author: Adina Wagner <[email protected]>
Date:   Fri Oct 14 09:17:56 2022 +0200

    update

commit 72cd967f9648209aab5c55aebf5b60f1aea41099 (origin/git-annex)
Author: Adina Wagner <[email protected]>
Date:   Tue Apr 19 13:29:07 2022 +0200

    update

A manual pull fails locally:

❱ git pull gin git-annex
From https://gin.g-node.org/adswa/mlbooksmoretests
 * branch            git-annex  -> FETCH_HEAD
fatal: refusing to merge unrelated histories

And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.

(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf          1 !
whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed
whereis: 1 failed
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master

❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf            130 !
get A.Shashua-Introduction_to_Machine_Learning.pdf (not available) 
  No other repository is known to contain the file.
failed
get: 1 failed
(gooey) adina@mun

I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or local availability. The only datasets I successfully pushed had in common that they were created on my local computer and not cloned, thus had all file content available locally and no special remotes. But we couldn't 100% pinpoint it. Can you advise what might be wrong?

To reproduce

  • Clone https://github.com/datalad-datasets/machinelearning-books (datalad clone https://github.com/datalad-datasets/machinelearning-books)
  • Create a sibling with datalad (datalad create-sibling-gin somereponame). Or create a new repo and add it manually as a remote (git remote add gin [email protected]:/<user>/somereponame.git)
  • datalad push --to gin or perform a manual git push and git-annex sync

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions