Skip to content

Cleanup temp datasets#1631

Draft
ilongin wants to merge 1 commit intomainfrom
ilongin/1630-cleanup-temp-datasets
Draft

Cleanup temp datasets#1631
ilongin wants to merge 1 commit intomainfrom
ilongin/1630-cleanup-temp-datasets

Conversation

@ilongin
Copy link
Contributor

@ilongin ilongin commented Mar 10, 2026

Add cleanup for orphaned temporary (session_*) datasets left behind when a worker is killed before session cleanup runs.

  • metastore.get_temp_datasets_to_clean() finds session_* dataset versions whose job is in a final state (COMPLETE/FAILED/CANCELED).
    Versions without a job_id are skipped.
  • catalog.cleanup_temp_datasets() removes those versions (and the parent dataset if it was the only version).
  • datachain gc now includes this cleanup step.

@ilongin ilongin marked this pull request as draft March 10, 2026 15:15
@ilongin ilongin linked an issue Mar 10, 2026 that may be closed by this pull request
@cloudflare-workers-and-pages
Copy link

Deploying datachain with  Cloudflare Pages  Cloudflare Pages

Latest commit: b0b17a7
Status: ✅  Deploy successful!
Preview URL: https://eb4ef68f.datachain-2g6.pages.dev
Branch Preview URL: https://ilongin-1630-cleanup-temp-da.datachain-2g6.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up orphaned temporary (session_*) datasets and stale jobs

1 participant