Blob purging is horribly slow

From prod, this query has terrible performance on some deployments:

```sql
DELETE FROM blobs
  WHERE id IN (
    SELECT b.id
      FROM blobs AS b
      LEFT JOIN client_audits          AS ca ON ca."blobId"    = b.id
      LEFT JOIN submission_attachments AS sa ON sa."blobId"    = b.id
      LEFT JOIN form_attachments       AS fa ON fa."blobId"    = b.id
      LEFT JOIN form_defs              AS fd ON fd."xlsBlobId" = b.id
    WHERE ca."blobId"    IS NULL
      AND sa."blobId"    IS NULL
      AND fa."blobId"    IS NULL
      AND fd."xlsBlobId" IS NULL
      AND b.s3_status = 'uploaded'
    LIMIT 1
)
  RETURNING blobs.id, blobs.sha
```

Call stack:

* `Blobs._purgeOneUnattachedUploaded()`
* `Blobs.purgeUnattached()`
* `purgeTask()`
* `lib/bin/purge.js`
* `purge.sh`

# Queries

* what does the query plan look like on servers where this is slow?
  * A. it is much faster without the `LIMIT` clause (~100x)
  * A. it has many materialization steps
  * A. there are seq scans for all the joins, so perhaps indexes are missing on `blobId` columns?
  * A. perhaps index is missing on `s3_status` column?
* how often are blobs de-duplicated across different types (e.g. client audits vs submission attachments vs form attachments vs form definitions)?  It seems unlikely that there is significant space saved by merging all of these

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blob purging is horribly slow #1443

Queries

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Blob purging is horribly slow #1443

Description

Queries

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions