Skip to content

Replace usage of in_array() in MigrateExecutable::handleMissingSourceRows #5765

Open
@mdolnik

Description

@mdolnik

Describe the bug
Usage of in_array() in MigrateExecutable::handleMissingSourceRows() is proving to be very inefficient for migrations with a very large amount of rows.

To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in handleMissingSourceRows() will have the process seem like its frozen for an indeterminate amount of time.

Actual behavior
Running a migration ID with many rows (in my case over 300,000 for upgrade_d7_file_private) would take roughly 20-30 minutes for the actual migration, but would hang on MigrateExecutable::handleMissingSourceRows() for multiple hours before having to manually stop the process.

Using in_array() can be very inefficient as it needs to compare all array values until it finds a match not to mention the current logic is trying to find an an array within an array of arrays.

Workaround
Instead of using in_array() the $allSourceIdValues property should be keyed with a unique ID in order to utilize isset()

Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the $allSourceIdValues property in MigrateExecutable::onPrepareRow() and reading it within handleMissingSourceRows().

Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions