Fix NullReferenceException in object store scan during slot migration when tombstones are present#1596
Draft
Fix NullReferenceException in object store scan during slot migration when tombstones are present#1596
Conversation
…ombstones Remove includeTombstones: true from MigrateOperation.Scan. Tombstone records in the object store have null IGarnetObject values, which caused a crash in ObjectStoreScan.SingleReader when passed to ClusterSession.Expired(ref value). Tombstones represent deleted keys with no data to migrate, so excluding them is both correct and safe. Add regression test ClusterMigrateSlotsWithObjectTombstones. Co-authored-by: vazois <96085550+vazois@users.noreply.github.com>
Co-authored-by: vazois <96085550+vazois@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix automatic slot migrator failure in garnet
Fix NullReferenceException in object store scan during slot migration when tombstones are present
Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automatic slot migration crashes with
NullReferenceExceptioninObjectStoreScan.SingleReaderwhen object store keys with TTLs or explicit deletes create tombstone records during migration. The tombstone'sIGarnetObjectvalue isnull, andClusterSession.Expired(ref value)dereferences it unconditionally.Root cause
MigrateOperation.ScanpassedincludeTombstones: trueto bothIterateMainStoreandIterateObjectStore. Object store tombstones havevalue = null(set byConcurrentDeleter/ default-initialized inCreateNewRecordDelete). When the scan handed these records toObjectStoreScan.SingleReader, the crash occurred:Tombstones were never actually migrated — they would have been skipped as
NOTFOUNDduring transmission — soincludeTombstones: truehad no effect other than causing crashes.Changes
libs/cluster/Server/Migration/MigrateOperation.cs— RemoveincludeTombstones: truefrom bothIterateMainStoreandIterateObjectStorecalls inScan(), reverting to the defaultfalse. Deleted keys have no data to migrate; excluding tombstones is both correct and safe.test/Garnet.test.cluster/ClusterMigrateTests.cs— Add regression testClusterMigrateSlotsWithObjectTombstones. The test writes string keys to the source node's main store first (required because the object store scan range is bounded bystoreTailAddress— the main store's tail address — so tombstones must fall within that range to trigger the bug), then creates a sorted set, deletes it to produce an object store tombstone, migrates the slot, and asserts the live key arrives on the target.Original prompt
This section details on the original issue you should resolve
<issue_title>Automatic Slot Migrator Failure (
System.NullReferenceExceptioninCreateAndRunMigrateTasksandMigrateSession.RecoverFromFailure failed to make slots STABLE)</issue_title><issue_description>### Describe the bug
When running garnet, the Automatic Slot Migration fails.
Cluster: A two-node (2 shards, no replicas) cluster. The cluster is initialized with all slots being on one node with 5-10 thousand keys, and then we try to migrate some hash slots from one primary/shard to another while continuosly writing data and deleting keys (both manually and using TTLs). There are some short TTL keys that are being deleted and modified consistently.
Garnet Version:
1.0.94Network: IPv6 with TLS
Steps to reproduce the bug
Command:
When the above command is executed, the sender says that's migrating slots to the target but, it throws an error in the logs and the
cluster_slots_okandcluster_slots_assignedare both decreased (by the number of slots being migrated).When you look at
CLUSTER MTASKS, it says that it's0. It should say1during the migration.Sender's Logs: