Skip to content

Better cleanup of output buckets after failed datums #36

@emk

Description

@emk

This is a follow-on to fixing #33.

From the source:

            // Remove `OutputFile` records for this datum, so we can upload the
            // same output files again.
            //
            // TODO: Unfortunately, there's an issue here. It takes one of two
            // forms:
            //
            // 1. Workers use deterministic file names. In this case, we
            //    _should_ be fine, because we'll just overwrite any files we
            //    did manage to upload.
            // 2. Workers use random filenames. Here, there are two subcases: a.
            //    We have successfully created an `OutputFile` record. b. We
            //    have yet to create an `OutputFile` record.
            //
            // We need to fix (2b) by pre-creating all our `OutputFile` records
            // _before_ uploading, and then updating them later to show that the
            // output succeeded. Which them into case (2a). And then we can fix (2a)
            // by deleting any S3/GCS files corresponding to `OutputFile::uri`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions