Skip to content

Perform database dump anonymisation using a local, in-cluster database #3760

@AP-Hunt

Description

@AP-Hunt

User Need

As a platform engineer
I want the database backup anonymisation process to happen in isolation from the real database
so that I can perform the process in production without risking the production database


Context

The current process restores the dump into a fresh database inside the same instance, anonymises it, then swaps and deletes the databases. We don't do it in production, because we don't want to that to happen to the production database.

In this story, don't worry about how to make it happen in production at all or only. Just alter the existing anonymisation process.

Consider looking at how Postgres and MySQL can be configured to speed up database restoration times.


What’s Needed

List anything the solution must do or be (behaviour, performance, security, UX, etc.).

  • Read the dump file from S3 onto disk in Kubernetes
  • Explore ways to speed up the dump process (if reasonable)
  • Restore the dump file into a locally running database
  • Explore ways to speed up the restore process (again if reasonable)
  • Anonymise it
  • Dump it back out to the right place

Acceptance Criteria

  • The anonymisation process happens in isolation of the real database
  • The real database in staging and integration is replaced with the content of the anonymised database
  • The production database is not affected

Notes

  • This work is done in a way that functions independently from existing applications/containers/processes etc.

Metadata

Metadata

Assignees

Labels

RefinedA ticket that's been parsed by the forum of people in backlog refinementurgency discussedThe urgency of this item has been discussed (probably in Backlog Refinement]
No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions