Skip to content

Orphan segments in ceph #18788

@poriniki

Description

@poriniki

Affected Version

The Druid version 33.0.0.

Description

We have identified a potential issue with orphaned segments in our deployment, which utilizes Ceph as deep storage and Postgres as the metadata store.

Several anomalies have been observed:

  • A significant number of segments are present in Ceph but missing from Postgres metadata.
  • Some of these segments are very old, have exceeded their retention period, and were never cleaned up.
  • A subset of segments had never been loaded by the cluster because they did not exist in Postgres at all, implying they were unknown to the coordinator.
  • After manually deleting these segments from Ceph, there were no related errors or recovery attempts from the cluster, and Ceph disk usage dropped noticeably, confirming they were unused and orphaned.
  • the steps we took for removing were:
    • list segments from Ceph
    • list from postgres using payload field from druid_segments table ([payload] [loadSpec] [key])
    • check differences and remove keys that were not in PostgreSQL and existed on Ceph storage

Additional context:

  • These segments appear to be completely unmanaged by Druid since their metadata entries never existed or were removed prematurely.
  • Manual deletion did not cause any segment load/unload events, coordinator log warnings, or missing segment alerts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions