Skip to content

Conversation

@adam-vessey
Copy link
Contributor

@adam-vessey adam-vessey commented Dec 9, 2025

When performing a migration with rows that update existing entities, leaving properties out of the source for could lead to having the corresponding destination fields on entities cleared of their values.

Let's more directly track the properties based on whether they might have had a value available, such that:

  • if a property is absent, we should avoid changing the destination field
  • if a property is present, but with an empty/null value, we will clear the destination field
  • if a property is present, we will migrate it

More specifically, this should be useful in processes such as islandora_spreadsheet_ingest, should a sheet bearing a subset of the fields be provided while also mapping entity IDs (such that it's possible to try to "update" existing entities in the first place).

Summary by CodeRabbit

  • Documentation

    • Added guidance for Search API direct/immediate indexing and detailed entity update process, including partial-source handling and tracking configuration via environment variables.
  • New Features

    • Added optional tracking of which source properties populated each destination property during migrations; configurable enable/disable and used to preserve existing destination fields.
    • Standardized revision messages to include migration context.
  • Tests

    • Added unit tests covering tracking behavior across migration scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@adam-vessey adam-vessey added the minor Added functionality that is backwards compatible. label Dec 9, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Adds a new TrackingGet migrate process plugin that wraps the existing Get plugin to track source→destination property mappings during migrations; conditionally enables it via env var and integrates filtering into DgiRevisionedEntity. Documentation and unit tests accompany the feature.

Changes

Cohort / File(s) Summary
Documentation
README.md
Adds Configuration sections: "Search API, direct/immediate indexing" and "Entity update process" documenting env vars DGI_MIGRATE_SUPPRESS_DIRECT_INDEXING_DURING_MIGRATIONS and DGI_MIGRATE_TRACKING_GET_DISABLED, and describing partial source handling and tracking behavior.
Module wiring
dgi_migrate.module
Imports TrackingGet, conditionally preserves original get definition as dgi_migrate_original_get, and replaces the get plugin class with TrackingGet unless DGI_MIGRATE_TRACKING_GET_DISABLED is 'true'.
TrackingGet plugin
src/Plugin/migrate/process/TrackingGet.php
New process plugin wrapping Get to build/propagate a tracking map of destination properties to originating source properties; includes factory create(), setWrappedPlugin(), transform() tracking logic, filterRow() helper, and an any() helper.
Destination integration
src/Plugin/migrate/destination/DgiRevisionedEntity.php
Adds null-safe migration id retrieval, new doGetEntity() helper that uses TrackingGet::filterRow() to optionally filter rows and swap destination properties (respecting DGI_MIGRATE_TRACKING_GET_DISABLED), and a generateRevisionMessage() helper.
Unit tests
tests/src/Unit/TrackingGetTest.php
New test class covering TrackingGet scenarios: present/absent/empty source handling, transitive resolution, and filter behavior across six test methods with mocks and helpers.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Migrator as Migration process
    participant TG as TrackingGet (wrapper)
    participant Get as Original Get plugin
    participant Row as Row
    participant Entity as DgiRevisionedEntity

    Migrator->>TG: transform(value, migrate_executable, row, dest_prop)
    activate TG
    TG->>Row: read current destination properties & tracking map
    alt mapping resolution
        TG->>Get: delegate transform(value,...)
        activate Get
        Get-->>TG: transformed value
        deactivate Get
        TG->>Row: update tracking map for dest_prop
    end
    TG-->>Migrator: return transformed value
    deactivate TG

    Migrator->>Entity: import(row)
    activate Entity
    alt tracking enabled
        Entity->>TG: TrackingGet::filterRow(row)
        activate TG
        TG->>Row: clone & remove/mark missing destination props per tracker
        TG-->>Entity: filtered row
        deactivate TG
        Entity->>Entity: swap filtered destination props back to original row
    end
    Entity->>Entity: getEntity(filtered/original row)
    deactivate Entity
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • TrackingGet's transform() transitive resolution and edge cases for empty vs missing source values.
    • filterRow() cloning and correct propagation/removal of tracking metadata.
    • Interaction/contract between DgiRevisionedEntity::doGetEntity() and TrackingGet::filterRow().
    • Tests correctness and completeness relative to intended tracking semantics.

Suggested reviewers

  • nchiasson-dgi

Poem

🐇 I watched the fields of data trace,
Source to dest in tidy grace,
A little map, a careful hop,
No orphaned bits, no sudden drop,
The rabbit hums: migrations ace! 🎋

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introducing property filtering based on source availability to prevent unintended field clearing during entity updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/update-entities-via-migration

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5413680 and b250f7f.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: PHPUnit / Drupal 10.3 | PHP 8.2
  • GitHub Check: PHPUnit / Drupal 10.4 | PHP 8.4
  • GitHub Check: PHPUnit / Drupal 10.4 | PHP 8.3
  • GitHub Check: PHPUnit / Drupal 10.5 | PHP 8.3
  • GitHub Check: PHPUnit / Drupal 10.5 | PHP 8.4

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • JIRA integration encountered authorization issues. Please disconnect and reconnect the integration in the CodeRabbit UI.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Was an artifact of earlier implementation, where the wrapped/overridden plugin
was being instantiated closer to when it was used.
Given our testing environment installs the polyfills, we could get into a
weird place where tests pass, but runtime environments (without the "dev"
requirements) would fail.
@adam-vessey adam-vessey force-pushed the feature/update-entities-via-migration branch from 7364164 to 5511278 Compare December 9, 2025 20:06
…g the `get` plugin.

For example, should a property's definition just do a `dgi_migrate.process.entity_query` using
`conditions`, which reads from the row but does _not_ make use of the `get` plugin.
@adam-vessey adam-vessey marked this pull request as ready for review December 16, 2025 15:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/src/Unit/TrackingGetTest.php (1)

51-54: Consider adding a test for multiple source properties.

The TrackingGet plugin supports array sources (see line 72 in TrackingGet.php: $properties = is_string($source) ? [$source] : $source). A test with ['source' => ['a', 'b']] where one exists and one doesn't would validate the any() logic returns true if at least one source exists.

src/Plugin/migrate/process/TrackingGet.php (1)

71-72: Consider adding validation for the source configuration.

If source is neither a string nor an array (e.g., null or missing), the is_string($source) ? [$source] : $source expression would pass a non-array to any(), potentially causing issues. While the wrapped Get plugin likely validates this, defensive validation here could provide clearer error messages.

     $source = $this->configuration['source'];
+    if (!isset($source)) {
+      throw new MigrateException('The "source" configuration is required.');
+    }
     $properties = is_string($source) ? [$source] : $source;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 923e309 and 5413680.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • dgi_migrate.module (2 hunks)
  • src/Plugin/migrate/destination/DgiRevisionedEntity.php (2 hunks)
  • src/Plugin/migrate/process/TrackingGet.php (1 hunks)
  • tests/src/Unit/TrackingGetTest.php (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/src/Unit/TrackingGetTest.php (1)
src/Plugin/migrate/process/TrackingGet.php (3)
  • TrackingGet (21-160)
  • setWrappedPlugin (58-61)
  • filterRow (134-158)
src/Plugin/migrate/destination/DgiRevisionedEntity.php (1)
src/Plugin/migrate/process/TrackingGet.php (2)
  • TrackingGet (21-160)
  • filterRow (134-158)
src/Plugin/migrate/process/TrackingGet.php (2)
src/Plugin/migrate/destination/DgiRevisionedEntity.php (1)
  • create (47-53)
src/MigrateBatchExecutable.php (1)
  • getIdMap (545-547)
🔇 Additional comments (10)
README.md (1)

82-91: Clear and comprehensive documentation for the new entity update process.

The documentation effectively explains:

  • The purpose of partial source handling to avoid erasing existing fields
  • How the tracking works via the get plugin
  • The environment variable toggle for disabling
  • The known limitation about plugins that bypass get
dgi_migrate.module (1)

37-46: Well-structured plugin wrapping pattern.

The implementation correctly:

  • Preserves the original get plugin definition as dgi_migrate_original_get (which TrackingGet::create() references)
  • Conditionally wraps based on the environment variable
  • Follows the same pattern established for LockingMigrationLookup
src/Plugin/migrate/destination/DgiRevisionedEntity.php (3)

68-86: Solid implementation of the tracking-aware entity retrieval.

The method correctly:

  • Bypasses tracking when disabled via environment variable
  • Filters the row through TrackingGet::filterRow() to apply tracking logic
  • Syncs destination properties back to the original row to preserve the reference
  • Updates the ID map from the filtered row

The property sync pattern (remove all, then copy from filtered) ensures a clean state.


51-51: Good defensive coding with null-safe fallback.

Using $migration?->id() ?? '(unknown; not provided)' handles edge cases where migration may be null.


129-131: Clean helper for revision messaging.

The generateRevisionMessage() method provides a consistent, traceable revision log entry.

tests/src/Unit/TrackingGetTest.php (2)

59-147: Comprehensive test coverage for core tracking scenarios.

Tests appropriately cover:

  • Present source → tracked as true, not marked empty after filtering
  • Present empty source with skip → tracked as true, marked empty after filtering
  • Present empty source with pass-through → tracked as true, value preserved
  • Absent source → tracked as false, filtered out entirely

These align well with the PR objectives for handling present vs. absent vs. empty source properties.


152-201: Good coverage of transitive property resolution.

The transitive tests verify that when a destination property references another destination property (via @ prefix), the tracking correctly propagates:

  • If the referenced property's source existed → transitive is tracked as true
  • If the referenced property's source was absent → transitive is tracked as false
src/Plugin/migrate/process/TrackingGet.php (3)

66-95: Well-designed tracking logic with proper source/destination handling.

The implementation correctly:

  • Retrieves or initializes the tracker from the destination property
  • Handles both string and array source configurations
  • Parses the @ prefix using the same logic as Drupal core's Row class (with proper attribution)
  • For source properties: checks $row->hasSourceProperty()
  • For destination properties: checks the tracker first, then falls back to $row->hasDestinationProperty()

The any() semantics correctly implement "at least one source property exists."


110-123: Good forward-compatible polyfill for array_any().

The implementation:

  • Uses native array_any() when available (PHP 8.4+)
  • Falls back to a manual implementation with proper attribution to Symfony's polyfill
  • Correctly passes both value and key to the callback

134-158: Clean implementation of row filtering.

The filterRow() method correctly:

  • Returns early if no tracker exists (no-op for non-tracked rows)
  • Clones the row to avoid mutating the original
  • Copies all destination properties to the clone
  • Marks properties as empty only when the tracker indicates source was present but no destination value exists
  • Removes the tracking property from the output

The logic at lines 148-152 ensures that if a source column was present (array_filter($tracker) returns truthy entries), but the destination property wasn't set, it marks it as explicitly empty—correctly implementing the "empty source clears field" behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Added functionality that is backwards compatible.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants