Rewrite StandardEditionMigrator [WHIT-3270]#11562
Draft
ChrisBAshton wants to merge 12 commits into
Draft
Conversation
We'll rebuild the StandardEditionMigrator over the next few commits. Having spiked the approach in PR 11530, the changes are so numerous that attempting to do them commit by commit makes little sense - we're better off killing what we have and starting from the ground up, as so much of the behaviour will have changed. StandardEditionMigrator started life as an Editionable-only migrator for News Articles. It then underwent some modest modification for CaseStudies, before then having some more significant prep work done on it in advance of Topical Event migration. That prep work, to support non-Editionable source documents (topical events), is untested, and what we've ended up with is a migrator service that is quite difficult to reason about. Additional methods were added for defining titles and summaries (because implementations differ across content types), but worse than that, there is an attempt to add generic support for translations where 1) all editionable content types support translations, 2) some non-editionable content types support translations (e.g. Organisations) and 3) some non-editionable content types don't support translations (e.g. Topical Events). The resulting API made it difficult to build and iterate a recipe. Similarly, there was a half-baked 'preview' method which had little use. There was a 're-present to Publishing API' option which was ultimately not used for news articles due to race conditions. There was an awkward coupling of logic between the Job and the Migrator itself. I'm hoping the new StandardEditionMigrator will be simpler to work with, and will scale to all content types. The intention is to use it to migrate topical events shortly.
850e3b2 to
ba7aab3
Compare
This is a much more straightforward API than before. We will now be able to pass the legacy record - of any type - and a given recipe (uninitialized), and let the StandardEditionMigrator service figure out how to compare the before-and-after state of the given record. I'll implement the `compare_payloads` method in the next commit.
We call the recipe's `build_edition` method, which should build a StandardEdition instance in memory, and we then compare the content/links payloads with those of the legacy presenter on the legacy record. The actual 'diff' generation will happen in the next commit, as it is complex enough to require testing in isolation.
The recipes are about to get more complex (when it comes to saving artefacts associated with the edition built in memory). The test recipes would also end up duplicating more and more definitions. Extracting the common logic into one base recipe that future recipes can inherit from should be a useful starting point.
Hypothetically speaking, we want the payload of a content item to be the same after migration as before. But in reality, there are often new content properties and links on StandardEditionPresenter that did not exist on the legacy content item. Likewise, there are often content properties or links on the legacy content item that are no longer relevant and should not be carried over to their StandardEdition equivalent. We therefore have the following methods defined for normalising the diff: - ignore_legacy_content_fields - ignore_new_content_fields - ignore_legacy_links - ignore_new_links
As introduced in draft PR #11431. Sometimes we just want a generic Whitehall error, or a way of catching multiple Whitehall errors, without accidentally swallowing unexpected errors like NoMethodError etc.
This method takes a legacy record and a recipe, and creates a new Document (and a 'published' Edition, though this is down to the recipe - in theory the Edition could be persisted in a different state if the dev wishes), with the same content ID as before (allowing us to overwrite the old content in-place in Publishing API). This is where a `save_artefacts!` method becomes necessary on the recipe; whilst the StandardEditionMigrator can handle the saving of the Document and the Edition, it has no knowledge of any related records created within the recipe (e.g. associations with organisations, featurings, images etc). It is up to the recipe to store a reference to these in-memory associations under an array called `@artefacts_to_save`, and the StandardEditionMigrator iterates over said artefacts to save them. We also have a two-pass save system, where all records are saved without validation initially, as some may be transiently invalid whilst waiting for dependent records to be saved. Once all records are persisted, we expect that all records should be considered valid, so we test this by re-saving with `validate: true`.
This takes an existing Document and all of its Editions (including superseded and deleted ones) and migrates all of them in-place to StandardEdition.
We want the security of being able to dynamically compare our normalised payloads across all editions being migrated, BEFORE committing to saving those changes. If there are any unexpected differences in the payloads, we want to know about them and raise an exception so that we can investigate and iterate the recipe.
We want to leave some sort of note to the publisher, and to our future selves, that the migrated document originated from a legacy document.
This method queues up StandardEditionMigratorJobs for every legacy record in scope.
ba7aab3 to
a3a8cc2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
This PR rewrites StandardEditionMigrator to simplify its API and make it easier to work with. It makes it far easier to write and iterate recipes.
Given a legacy record and a recipe, we can 'preview' the migration, seeing the before-and-after payloads as well as a normalised diff summary.
If you're happy with the preview, you can then 'perform' the migration by one of two methods -
migrate_existing_documentandcreate_new_document- which take the same arguments.migrate_existing_documentoverwrites an existing Document and all of its Editions. It is intended to be called for legacy edition types such as Publication, Speech, etc. It is not compatible with other content types.create_new_documentcreates a new Document instance and one 'published' Edition. It is intended to be called for legacy content types that don't follow the Document/Edition model, e.g. TopicalEvent, Organisation etc. It could in theory support legacy edition types too, so that we have an alternative to overwriting-in-place (could instead duplicate a given Document and then delete the old one) but that's outside of the scope of this PR, and in any case is 'lossy' vs the overwrite approach above.There's also an optional
raise_if_payloads_differkeyword arg, which defaults totrue- it raises an exception if there is any diff between the normalised payloads.Finally, you can also enqueue the migration of an array of elements (creating a
StandardEditionMigratorJobfor each):Why
We need to migrate legacy topical events to StandardEdition. In trying to write the recipe, I found the existing StandardEditionMigrator too cumbersome to work with - so have rebuilt it from the ground up.
This PR removes the responsibility for translations management from StandardEditionMigratorJob and makes it the responsibility of the Recipe. It's also removed the options for auto-publishing to Publishing API post migration, as this adds complexity and ultimately wasn't used for the NewsArticle migration due to the race conditions it introduced. Finally, it massively simplifies the 'job' and keeps the logic in the StandardEditionMigrator itself.
In the next PR we will write a TopicalEventRecipe that puts this StandardEditionMigrator service to the test.
Jira: https://gov-uk.atlassian.net/browse/WHIT-3270
This application is owned by the Whitehall Experience team. Please let us know in #govuk-whitehall-experience-tech when you raise any PRs.
Follow these steps if you are doing a Rails upgrade.