Skip to content

Replace usage of in_array() in MigrateExecutable::handleMissingSourceRows #5765

Open
@mdolnik

Description

@mdolnik

Describe the bug
Usage of in_array() in MigrateExecutable::handleMissingSourceRows() is proving to be very inefficient for migrations with a very large amount of rows.

To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in handleMissingSourceRows() will have the process seem like its frozen for an indeterminate amount of time.

Actual behavior
Running a migration ID with many rows (in my case over 300,000 for upgrade_d7_file_private) would take roughly 20-30 minutes for the actual migration, but would hang on MigrateExecutable::handleMissingSourceRows() for multiple hours before having to manually stop the process.

Using in_array() can be very inefficient as it needs to compare all array values until it finds a match not to mention the current logic is trying to find an an array within an array of arrays.

Workaround
Instead of using in_array() the $allSourceIdValues property should be keyed with a unique ID in order to utilize isset()

Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the $allSourceIdValues property in MigrateExecutable::onPrepareRow() and reading it within handleMissingSourceRows().

Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions