Skip to content

Rewrite StandardEditionMigrator [WHIT-3270]#11562

Draft
ChrisBAshton wants to merge 12 commits into
mainfrom
refactor-standard-edition-migrator
Draft

Rewrite StandardEditionMigrator [WHIT-3270]#11562
ChrisBAshton wants to merge 12 commits into
mainfrom
refactor-standard-edition-migrator

Conversation

@ChrisBAshton

@ChrisBAshton ChrisBAshton commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What

This PR rewrites StandardEditionMigrator to simplify its API and make it easier to work with. It makes it far easier to write and iterate recipes.

Given a legacy record and a recipe, we can 'preview' the migration, seeing the before-and-after payloads as well as a normalised diff summary.

StandardEditionMigrator.preview_migration(
  TopicalEvent.find_by(slug: "g8-2013"),
  StandardEditionMigrator::TopicalEventRecipe,
)

If you're happy with the preview, you can then 'perform' the migration by one of two methods - migrate_existing_document and create_new_document - which take the same arguments.

  • migrate_existing_document overwrites an existing Document and all of its Editions. It is intended to be called for legacy edition types such as Publication, Speech, etc. It is not compatible with other content types.
  • create_new_document creates a new Document instance and one 'published' Edition. It is intended to be called for legacy content types that don't follow the Document/Edition model, e.g. TopicalEvent, Organisation etc. It could in theory support legacy edition types too, so that we have an alternative to overwriting-in-place (could instead duplicate a given Document and then delete the old one) but that's outside of the scope of this PR, and in any case is 'lossy' vs the overwrite approach above.
StandardEditionMigrator.create_new_document(
  TopicalEvent.find_by(slug: "g8-2013"),
  StandardEditionMigrator::TopicalEventRecipe,
)

There's also an optional raise_if_payloads_differ keyword arg, which defaults to true - it raises an exception if there is any diff between the normalised payloads.

StandardEditionMigrator.create_new_document(
  TopicalEvent.find_by(slug: "g8-2013"),
  StandardEditionMigrator::TopicalEventRecipe,
  raise_if_payloads_differ: true,
)

Finally, you can also enqueue the migration of an array of elements (creating a StandardEditionMigratorJob for each):

StandardEditionMigrator.bulk_enqueue_migration(
  TopicalEvent.reject { |te| te.topical_event_about_page },
  StandardEditionMigrator::TopicalEventRecipe,
  migration_method: "create_new_document",
  raise_if_payloads_differ: true,
)

Why

We need to migrate legacy topical events to StandardEdition. In trying to write the recipe, I found the existing StandardEditionMigrator too cumbersome to work with - so have rebuilt it from the ground up.

This PR removes the responsibility for translations management from StandardEditionMigratorJob and makes it the responsibility of the Recipe. It's also removed the options for auto-publishing to Publishing API post migration, as this adds complexity and ultimately wasn't used for the NewsArticle migration due to the race conditions it introduced. Finally, it massively simplifies the 'job' and keeps the logic in the StandardEditionMigrator itself.

In the next PR we will write a TopicalEventRecipe that puts this StandardEditionMigrator service to the test.

Jira: https://gov-uk.atlassian.net/browse/WHIT-3270


⚠️ This repo is Continuously Deployed: make sure you follow the guidance ⚠️

This application is owned by the Whitehall Experience team. Please let us know in #govuk-whitehall-experience-tech when you raise any PRs.

Follow these steps if you are doing a Rails upgrade.

We'll rebuild the StandardEditionMigrator over the next few
commits. Having spiked the approach in PR 11530, the changes are
so numerous that attempting to do them commit by commit makes little
sense - we're better off killing what we have and starting from
the ground up, as so much of the behaviour will have changed.

StandardEditionMigrator started life as an Editionable-only migrator
for News Articles. It then underwent some modest modification
for CaseStudies, before then having some more significant prep work
done on it in advance of Topical Event migration. That prep work,
to support non-Editionable source documents (topical events), is
untested, and what we've ended up with is a migrator service that
is quite difficult to reason about. Additional methods were added
for defining titles and summaries (because implementations differ
across content types), but worse than that, there is an attempt
to add generic support for translations where 1) all editionable
content types support translations, 2) some non-editionable content
types support translations (e.g. Organisations) and 3) some
non-editionable content types don't support translations (e.g.
Topical Events). The resulting API made it difficult to build and
iterate a recipe.

Similarly, there was a half-baked 'preview' method which had little
use. There was a 're-present to Publishing API' option which was
ultimately not used for news articles due to race conditions.
There was an awkward coupling of logic between the Job and the
Migrator itself.

I'm hoping the new StandardEditionMigrator will be simpler to
work with, and will scale to all content types. The intention is
to use it to migrate topical events shortly.
@ChrisBAshton ChrisBAshton force-pushed the refactor-standard-edition-migrator branch 5 times, most recently from 850e3b2 to ba7aab3 Compare June 15, 2026 17:09
This is a much more straightforward API than before. We will now
be able to pass the legacy record - of any type - and a given
recipe (uninitialized), and let the StandardEditionMigrator
service figure out how to compare the before-and-after state of
the given record.

I'll implement the `compare_payloads` method in the next commit.
We call the recipe's `build_edition` method, which should build
a StandardEdition instance in memory, and we then compare the
content/links payloads with those of the legacy presenter on the
legacy record.

The actual 'diff' generation will happen in the next commit, as
it is complex enough to require testing in isolation.
The recipes are about to get more complex (when it comes to saving
artefacts associated with the edition built in memory). The test
recipes would also end up duplicating more and more definitions. Extracting
the common logic into one base recipe that future recipes can
inherit from should be a useful starting point.
Hypothetically speaking, we want the payload of a content item to
be the same after migration as before. But in reality, there are
often new content properties and links on StandardEditionPresenter
that did not exist on the legacy content item. Likewise, there are
often content properties or links on the legacy content item that
are no longer relevant and should not be carried over to their
StandardEdition equivalent.

We therefore have the following methods defined for normalising
the diff:

- ignore_legacy_content_fields
- ignore_new_content_fields
- ignore_legacy_links
- ignore_new_links
As introduced in draft PR #11431. Sometimes we just want a generic
Whitehall error, or a way of catching multiple Whitehall errors,
without accidentally swallowing unexpected errors like NoMethodError
etc.
This method takes a legacy record and a recipe, and creates a new
Document (and a 'published' Edition, though this is down to the
recipe - in theory the Edition could be persisted in a different
state if the dev wishes), with the same content ID
as before (allowing us to overwrite the old content in-place in
Publishing API).

This is where a `save_artefacts!` method becomes necessary on the
recipe; whilst the StandardEditionMigrator can handle the saving
of the Document and the Edition, it has no knowledge of any
related records created within the recipe (e.g. associations with
organisations, featurings, images etc). It is up to the recipe to
store a reference to these in-memory associations under an array
called `@artefacts_to_save`, and the StandardEditionMigrator
iterates over said artefacts to save them.

We also have a two-pass save system, where all records are saved
without validation initially, as some may be transiently invalid
whilst waiting for dependent records to be saved. Once all records
are persisted, we expect that all records should be considered
valid, so we test this by re-saving with `validate: true`.
This takes an existing Document and all of its Editions (including
superseded and deleted ones) and migrates all of them in-place to
StandardEdition.
We want the security of being able to dynamically compare our
normalised payloads across all editions being migrated, BEFORE
committing to saving those changes. If there are any unexpected
differences in the payloads, we want to know about them and raise
an exception so that we can investigate and iterate the recipe.
We want to leave some sort of note to the publisher, and to our
future selves, that the migrated document originated from a legacy
document.
This method queues up StandardEditionMigratorJobs for every legacy
record in scope.
@ChrisBAshton ChrisBAshton force-pushed the refactor-standard-edition-migrator branch from ba7aab3 to a3a8cc2 Compare June 15, 2026 17:14
@ChrisBAshton ChrisBAshton changed the title Rewrite StandardEditionMigrator Rewrite StandardEditionMigrator [WHIT-3270] Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant