Skip to content

LocateRepository still fails with Found invalid data while decoding on a v1 reftable in SDK 10.0.203 (incomplete fix for #1470?) #1674

@jk-atbas

Description

@jk-atbas

Summary

After the reftable implementation in #1498 shipped in SDK 10.0.200, Microsoft.Build.Tasks.Git.LocateRepository still fails on our repository when the local Git repository uses the reftable reference backend. The error is Found invalid data while decoding (the exact message thrown by for example System.IO.Compression.InflaterManaged), which could suggests the new parser invokes the inflater on data that is not zlib-compressed.

Per the reftable spec, v1 ref blocks and ref index blocks are not zlib-compressed - only log blocks are. Since #1498's description states "Only implements support for ref blocks and ref index blocks", the inflater should ideally never run against this data, so the failure most likely indicates either a block-offset miscalculation or accidental traversal into the log-block region.

The original Unsupported repository extension 'refstorage' error from #1470 is gone (so #1469 / #1498 are clearly active in this SDK), but the parser fails one layer deeper.

Environment

  • .NET SDK: 10.0.203
  • Assembly verified via binlog: C:\Program Files\dotnet\sdk\10.0.203\Sdks\Microsoft.Build.Tasks.Git\tools\net\Microsoft.Build.Tasks.Git.dll (SDK-bundled, no PackageReference override) (see binlog_excerpt)
  • Git: 2.54.0.windows.1
  • OS: Windows 11 25H2
  • Project setup: Solution with ~190 .NET projects (.NET 8 / .NET Standard 2.0 mix), Central Package Management, no Microsoft.SourceLink.* or Microsoft.Build.Tasks.Git PackageReferences anywhere

Reproduction (Atleast what I did in my repo)

  1. In a multi-project solution, switch to the reftable backend:
    git refs migrate --ref-format=reftable
    
  2. Run dotnet build on the solution.
    Result for every project that triggers InitializeSourceControlInformationFromSourceControlManager:
C:\Program Files\dotnet\sdk\10.0.203\Sdks\Microsoft.Build.Tasks.Git\build\Microsoft.Build.Tasks.Git.targets(25,5):
error : Found invalid data while decoding.

In a large solution, only the leaf projects (no <ProjectReference>) appear in the failure list because MSBuild stops the topological build at the first failure layer; all other projects are silently skipped via the broken dependency chain. Confirmed by:

  • Building a non-leaf project directly (e.g. dotnet build Some.Project.csproj) reproduces the failure on its leaf dependency, not on the consumer itself
  • Setting <EnableSourceControlManagerQueries>false</EnableSourceControlManagerQueries> makes the entire solution build cleanly - confirming every project would fail the same way if reached, the issue is not project-specific

Reftable file details

A freshly migrated, minimal reftable produced by:

git refs migrate --ref-format=files
git refs migrate --ref-format=reftable
  • Single table file: 0x000000000001-0x0000000000b8-65296b80.ref, 37 KB
  • Header (first 5 bytes via Get-Content -AsByteStream -TotalCount 5):
    Magic:   REFT
    Version: 1
    
  • min_update_index = 0x1, max_update_index = 0xb8 (184 update indices, batched across 375 total refs — multiple refs per update transaction)
  • tables.list contains exactly one entry
  • Repository scale: 394,461 objects, 16,984 commits

Git integrity checks

git fsck --full reports zero errors against the same reftable file that SourceLink fails to parse:

Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
Checking objects: 100% (394461/394461), done.
Verifying commits in commit graph: 100% (16984/16984), done.
Verifying OID order in multi-pack-index: 100% (393257/393257), done.
Verifying object offsets: 100% (393258/393258), done.

git for-each-ref enumerates all 375 refs without error. The reftable file is therefore spec-conformant from Git's perspective; the parsing failure is isolated to SourceLink's implementation

Diagnostics performed

To narrow this down before reporting, the following were verified:

  1. Correct DLL loaded. Binlog shows the SDK-bundled Microsoft.Build.Tasks.Git.dll from dotnet\sdk\10.0.203\Sdks\…. No NuGet override
  2. No PackageReference contamination. Repo-wide search for Microsoft.SourceLink.*, Microsoft.Build.Tasks.Git, and IncludeSourceRevisionInInformationalVersion in *.csproj, Directory.Build.props, Directory.Build.targets, Directory.Packages.props - no relevant matches
  3. Reftable freshness. Round-trip migration (reftable -> files -> reftable) yields a single, minimal table. Bug still reproduces - rules out compaction-history artifacts
  4. Reftable format version. Verified version 1 (24-byte header, no hash_id field), not version 2.
  5. Concurrency. dotnet build -m:1 produces the identical failure set - rules out potential cross-node race conditions in the SourceLink repository cache
  6. IncludeSourceRevisionInInformationalVersion. Setting this false globally does not change behavior. The gate for LocateRepository invocation is EnableSourceControlManagerQueries, which (in our case) is explicitly true as a workaround for dotnet/sdk#36666
  7. Reftable validity from Git's perspective. git fsck --full reports zero errors; git for-each-ref enumerates all 375 refs cleanly

My two cents while looking into the System.IO.Compression Assembly

InvalidDataException("Found invalid data while decoding.") corresponds to the SR.GenericInvalidData resource string in System.IO.Compression. Without a stack trace I cannot pinpoint the exact source, but the message is produced by at least two distinct paths:

  1. InflaterManaged internal throws when zlib/DEFLATE decoding fails on invalid input data (e.g. malformed block headers, invalid Huffman tables).
  2. DeflateStream.ThrowGenericInvalidData (a defensive check in ReadCore) when the source Stream returns more bytes than the read buffer can hold — typically only reachable via a misbehaving wrapper Stream.

Current Workaround

In Directory.Build.props, suppress SCM queries only outside CI (where the files backend is used on a fresh actions/checkout and the bug doesn't apply):

<EnableSourceControlManagerQueries Condition="'$(ContinuousIntegrationBuild)' != 'true'">false</EnableSourceControlManagerQueries>

This preserves SourceLink behavior in CI/pack scenarios while keeping local development unblocked.

Attachments

binlog_excerpt.txt
reftable.zip (Contains the complete .git\reftable Folder)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions