Skip to content

index.write() can cause index corruption if tree extension/tree cache is not taken into account #2421

@special-bread

Description

Current behavior 😯

When updating entries in the index, and saving them with index.write() this correctly updates the entries. However, it does not update the tree extension. Looking into it, it appears to me that gitoxide does not yet support updating the tree extension as part of the index.write() call.

As a result, the written index is in an inconsistent state, and external git clients show git status as showing no changes on disk (since they query the tree extension) - this can cause files modified to be completely invisible, and commits to commit things that are not in the index entries (being in the tree cache)

You can fix this by deleting and recreating the index with git reset and git add *

Expected behavior 🤔

Its not clear what the best solution here is, since in the long term, the answer is to update the tree extension when writing the index.

But we have some options:

  1. Update the documentation to note that you must remove the tree first, and call it like so until the tree updating is implemented:
    index.remove_tree();
    index.write(gix::index::write::Options::default())?;
  1. Add an internal index.remove_tree() to the process of writing the index, as an option which defaults to true, until the tree update itself is implemented. This will make the default behaviour workable, creating a valid index, just without the tree extension
  2. Implement the actual update to the tree extension in the index - this is of course desirable but requires actual work :>

Git behavior

git updates the tree extension, and libgit2 also does this:
https://github.com/libgit2/libgit2/blob/main/src/libgit2/index.c#L3174

im not sure if this is correct, but it looks like the tree cache code is here:
https://github.com/libgit2/libgit2/blob/main/src/libgit2/tree-cache.c

Steps to reproduce 🕹

  1. find a repo with an index that uses the tree extension - in my experience most of my repos do by default
  2. The entries you want to update can be some modified files in a commit. to reproduce things easier i used a commit that had 2 files modified in it, and then committed a revert of that commit - this makes it so that you know which files should be modified.
  3. Its easy to corrupt the index, but SEEING that is a bit more difficult. So If i recall, the steps to visualize the issue are to first do a soft reset of the commit with gitoxide, and then to discard one/more of the modified files with gitoxide. This causes the index to have the entry updated in the index, but the tree cache is stale, containing the old committed files before they were discarded.
  4. if you then commit the remaining undiscarded file(s) - you will notice that the actual commit contains more than your status showed. This is the commit using the tree extension to write the blobs from the tree, but using the index entries to see if anything is changed, causing a mismatch between what is committed and what was shown as changed.

Honestly im not entirely sure what the exact reproduction steps are, as I had fixed the issue on my end a few weeks ago. But it involves updating the index and not the tree extension (which gitoxide does by default) and then creating a commit to see what happens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acknowledgedan issue is accepted as shortcoming to be fixedhelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions