-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Description
Affected Version
The Druid version 33.0.0.
Description
We have identified a potential issue with orphaned segments in our deployment, which utilizes Ceph as deep storage and Postgres as the metadata store.
Several anomalies have been observed:
- A significant number of segments are present in Ceph but missing from Postgres metadata.
- Some of these segments are very old, have exceeded their retention period, and were never cleaned up.
- A subset of segments had never been loaded by the cluster because they did not exist in Postgres at all, implying they were unknown to the coordinator.
- After manually deleting these segments from Ceph, there were no related errors or recovery attempts from the cluster, and Ceph disk usage dropped noticeably, confirming they were unused and orphaned.
- the steps we took for removing were:
- list segments from Ceph
- list from postgres using payload field from druid_segments table ([payload] [loadSpec] [key])
- check differences and remove keys that were not in PostgreSQL and existed on Ceph storage
Additional context:
- These segments appear to be completely unmanaged by Druid since their metadata entries never existed or were removed prematurely.
- Manual deletion did not cause any segment load/unload events, coordinator log warnings, or missing segment alerts.