Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDTF Full Date Processor #141

Open
wants to merge 3 commits into
base: 2.x
Choose a base branch
from
Open

Conversation

francisayyad03
Copy link

GitHub Issue: #140

What does this Pull Request do?

This PR introduces the updated EDTF Date Processor code to index full EDTF dates (including partial or multiple dates) in Solr, rather than just extracting the year. It consolidates date logic within the controlled_access_terms module, enabling more precise date-based sorting and filtering in Drupal’s Search API.

What's new?

Adds a new EDTFDateProcessor class that:

  • Converts complete or incomplete single EDTF dates (e.g., 2012, 2012-05, 2012-05-03) into a complete date for Solr (YYYY-01-01T00:00:00Z, YYYY-MM-01T00:00:00Z, YYYY-MM-DDT00:00:00Z, etc.).
  • Supports multiple dates (e.g., {2012-01-01, 2012-04-15}) by splitting them into separate Solr date values.
  • Filters out dates outside configured open interval years (open_start_year, open_end_year) if specified.

This change does not introduce new dependencies but does require updating the search index configuration to use this processor.

Existing code dealing strictly with “year-only” fields remains unaffected.

How should this be tested?

  1. Enable EDTF Date Processor

    • In Drupal, go to Extend
    • Locate the Controlled Access Terms module (which now contains the EDTF Date Processor).
    • Enable it if it’s not already enabled.
  2. Configure the Processor Settings

    • Navigate to Search API → Your Index → Processors
    • Enable the EDTF Date Processor.
    • Scroll to the EDTF Date Processor settings form near the bottom of the page.
    • Choose your preferred configuration options (similar to how you set up the EDTFYear processor)
  3. Add the edtf_dates Field to Your Index

    • Still under Search API → Your Index, click Fields
    • Click Add fields and look for EDTF Dates (listed under EDTF Date Processor).
    • Check the box next to edtf_dates and save.
    • This ensures Solr (or whichever backend) knows to store and index these date values.
  4. Enter Test Content with Various EDTF Formats

    • Go to any content type that uses an EDTF field (as configured in step 2).

    • Add or edit content, then set the EDTF field to one of the following:

      1. Single Complete Date: 2012-12-12
        • Should index as 2012-12-12T00:00:00Z in edtf_dates.
      2. Missing Day: 2012-12
        • Should index as 2012-12-01T00:00:00Z.
      3. Missing Month and Day: 2012
        • Should index as 2012-01-01T00:00:00Z.
      4. Multiple Dates: {2013-12-12, 2014-09-09}
        • Should split into two entries:
          • 2013-12-12T00:00:00Z
          • 2014-09-09T00:00:00Z
        • Both values should appear in ascending order in edtf_dates.
  5. Check the Indexed Results in Solr

    • After saving your content, run a re-index of your Search API index (if necessary).
    • In the Solr Admin UI or via a Solr query, verify the edtf_dates field for the content item.
    • You should see all indexed dates, sorted in ascending order.

Additional Notes:

  • Sets for Multiple Dates:
    If you wish to allow multiple dates (e.g., {2013-12-12, 2014-09-09}), ensure your EDTF field is configured to accept sets. The EDTF Date Processor will then index each date separately into the edtf_dates array.

  • Merged Dates from Multiple Fields:
    For each field you configure in the EDTF Date Processor settings, all valid dates found are added to edtf_dates, sorted in ascending order. This means if multiple EDTF fields are selected, the resulting edtf_dates field will contain dates from all of them.

  • Sorting by edtf_dates:
    If you configure your index or views to sort by edtf_dates, the earliest date (i.e., the first element in the sorted edtf_dates array) determines the sort order.

  • Open Interval Settings:
    If you specify a start year (open_start_year) and end year (open_end_year), any dates falling outside that range (and not ignored by your configuration) will be excluded from edtf_dates. This can help narrow the indexed date range if you only need a specific window of time.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers
@Islandora/committers

@francisayyad03 francisayyad03 changed the title working full date processor EDTF Full Date Processor Mar 13, 2025
@joshdentremont
Copy link
Contributor

This has been on my TODO for ages. Very excited to see this!

At a quick glance, it looks like it doesn't handle Xs in the date, or seasonal dates.

I think any Xs in the year could just be converted to 0s. For example, should 199X be transformed to 1990 for sorting purposes?
For XX as the month or day it could convert to 01. eg. 1990-03-XX could become 1990-03-01. It would also need to support dates that look like this 1990-XX-07, which could probably just map to 1990-01-07

For seasons, I think a mapping to a single day would probably work best. From https://www.loc.gov/standards/datetime/
21 Spring (independent of location)
22 Summer (independent of location)
23 Autumn (independent of location)
24 Winter (independent of location)
25 Spring - Northern Hemisphere
26 Summer - Northern Hemisphere
27 Autumn - Northern Hemisphere
28 Winter - Northern Hemisphere
29 Spring - Southern Hemisphere
30 Summer - Southern Hemisphere
31 Autumn - Southern Hemisphere
32 Winter - Southern Hemisphere
33 Quarter 1 (3 months in duration)
34 Quarter 2 (3 months in duration)
35 Quarter 3 (3 months in duration)
36 Quarter 4 (3 months in duration)
37 Quadrimester 1 (4 months in duration)
38 Quadrimester 2 (4 months in duration)
39 Quadrimester 3 (4 months in duration)
40 Semestral 1 (6 months in duration)
41 Semestral 2 (6 months in duration)

I think some of these could be a simple map, for example, 1990-25 could probably map to 1990-04-01 rather than determining the first day of spring for the given year, which should be close enough for sorting purposes. This would likely be better for winter as well, since winter 2025 should sort to the start of 2025, not the end.
For the ones where the hemisphere is not given you could maybe determine it from the site's country or timezone settings, or have another option in the configuration form to select your hemisphere (this would probably work best since sites might not have their country set).

It also needs to support dates with one or more ~, ?, and/or %. But for the purposes of sorting, we could just strip those out before running the code you have now. Something like this should work
return (bool) preg_match('/^\d{4}(-\d{2}(-\d{2})?)?$/', str_replace(array("~", "?", "%"), "", $value));

@francisayyad03
Copy link
Author

I’ve implemented all the requested changes:

  • Added support for dates with Xs, converting them appropriately for sorting (e.g., 199X1990, 1990-XX-071990-01-07).
  • Implemented seasonal date mapping using predefined values (e.g., 1990-251990-04-01 for spring).
  • Stripped ~, %, and ? from dates before processing to ensure correct sorting.

@joshdentremont
Copy link
Contributor

Awesome, thanks. I will try to test this as soon as possible

@joshdentremont
Copy link
Contributor

Actually, I wonder if this code is doing the same thing as

public static function iso8601Value(string $edtf) {

If it is, maybe we can simplify this PR by reusing that existing code?

@francisayyad03
Copy link
Author

I have just looked into this functionality and note that a lot of code overlaps. However, before proceeding, I want to clarify a couple of differences:

  1. From what I see in this function, if the month or day are missing, no 01-01 is appended—potentially resulting in incomplete dates (e.g., YYYY instead of YYYY-01-01).
  2. When handling multiple dates, the line
    $dates = preg_split('/(,|\.\.)/', trim($edtf, '{}[]'));

only selects the first date in a set rather than processing all values. My current functionality, however, stores all dates and only sorts based on the earliest date when sorting.

How would you like me to proceed with this?

@joshdentremont
Copy link
Contributor

We have a tech call tomorrow. It's open to everyone, if you would like to join, but I'm also happy to bring this to the group and report back on how to tackle this. You can find the link in the Islandora Slack if you would like to join us.

My thoughts are that maybe we can implement your code into the existing functionality as long as that doesn't break anything, and then in this PR we could just call the existing function (with your updates) to generate the date for Solr. I also want to ask about the season mapping they are using, because for sorting purposes I don't think it will handle winter properly (#142)

@seth-shaw-asu seth-shaw-asu self-assigned this Mar 19, 2025
@seth-shaw-asu
Copy link
Member

seth-shaw-asu commented Mar 19, 2025

Talking about winter sorting... the EDTF spec itself is ambiguous. The ISO standard (that adopted the EDTF spec) states

3.1.3.5
winter
season (3.1.3.1) following autumn (3.1.3.4) and preceding spring (3.1.3.2)

Therefore, by my reading:

  1. Autumn 2025
  2. Winter 2025
  3. Spring 2026

which is the current sort behavior, not the proposed sorting

  1. Autumn 2024
  2. Winter 2025
  3. Spring 2025

However, I admit that the spec is still rather ambiguous.

@joshdentremont
Copy link
Contributor

Rather than us having to make an assumption about what date a season should be mapped to, maybe we can make it an option in the processor configuration? In the same way there are options for the EDTF year processor, maybe we just have 4 options for the seasons. They could default to the mapping we currently have, but then people could customize them if they wanted to.
image

@seth-shaw-asu
Copy link
Member

I think giving an option to customize what the season matches to is a reasonable option. Granted, it might be a bit much to require @francisayyad03 to add into the PR. Perhaps we focus on this as it is and then you could do a follow-up PR, @joshdentremont? Or you could fork @francisayyad03 's branch and add in the code which supercedes this one? We would credit both of you on the commit.

That said, this PR makes some interesting conversions of valid EDTF dates into the Solr DatePoint format:

EDTF SOLR Result (current)
1900 1900-01-01T00:00:00Z
1900-01 1900-01-01T00:00:00Z
1900-01-02 1900-01-02T00:00:00Z
190X 1900-01-01T00:00:00Z
1900-XX 1900-01-01T00:00:00Z
1900-91 1900-91-01T00:00:00Z
1900-91-01 1900-91-01T00:00:00Z
1900-3X 1900-12-01T00:00:00Z
1900-31 1900-03-01T00:00:00Z
190X-5X-8X 1900-50-80T00:00:00Z
19000 19000T00:00:00Z
Y19000 Y19000T00:00:00Z
190u 190uT00:00:00Z
190 190T00:00:00Z
190-99-52 190-99-52T00:00:00Z
1900-01-02T 1900-01-02TT00:00:00Z
1900-01-02T1:1:1 1900-01-02T1:1:1T00:00:00Z
1900-01-02T01:22:33 1900-01-02T01:22:33T00:00:00Z
1900-01-02T01:22:33Z 1900-01-02T01:22:33ZT00:00:00Z
1900-01-02T01:22:33+ 1900-01-02T01:22:33+T00:00:00Z
1900-01-02T01:22:33+05:00 1900-01-02T01:22:33+05:00T00:00:00Z

Granted, most of the weird ones aren't valid EDTF values, but a few are. (This list was taken from the ISO 8601 tests.) So, what should the code do with invalid EDTF strings? Give a weird value as it does or should we be throwing warnings? At least the 'Y' prefixed year one should be fixed.

Note, I wanted to respond this morning, but I was hit with emergency system stuff until now. Unfortunately, I will be out tomorrow through next week for surgery/recovery; so I won't be able to review until I get back.

@joshdentremont
Copy link
Contributor

@seth-shaw-asu @francisayyad03 Sounds good to me. If we push this through without the option to customize seasons, I can definitely add it in later in another PR.

@joshdentremont
Copy link
Contributor

@seth-shaw-asu I think it's probably fine if the dates have weird mappings if they are dates that the edtf widget won't accept, but I agree that we should fix any that are valid edtf. Do you get similar results when using the ISO function

That being said, if it's simple to piggy back off the way the widget detects whether the edtf is valid, maybe it would make sense to set all invalid edtf to some date in the distant past, or the distant future?

@francisayyad03
Copy link
Author

Are we expected to support all EDTF-valid dates, or is supporting Level 0 and Level 1 sufficient? For context, Level 2 features like S (significant digit) are currently not handled. I also noticed that Y (for extended years), which is a Level 1 feature, is not currently supported in my implementation—but I could add support for it unless we decide to rely entirely on the existing ISO function. I am also unsure about handling these improper dates with a distant date since these dates will either consistently appear as first or last in sorting (depending if distant past or future). However I am unsure on how to handle these invalid dates.

@seth-shaw-asu
Copy link
Member

Invalid dates (or even valid EDTF dates that aren't supported) can probably be skipped if you generate a warning in the Drupal logger. I would expect at least levels 0 & 1 to be supported; but in any case the plugin configuration form's text should be very clear about what is not supported.

@joshdentremont
Copy link
Contributor

@francisayyad03 we just discussed this again at the tech call and this is what came up:

  • If you can update the code so that it passes the code review tests and implements levels 0 and 1 then we are OK to merge this.
  • If you want to use the existing iso function you can, but if you would rather keep it all in your own module that is also fine
  • for dates that are invalid please have them leave a log message noting that the date was invalid and may not convert properly. Please also note what is supported in the config form for the plugin.
  • I will attempt to add some options later to allow site admins to adjust what dates the seasons map to, unless you want to add that yourself. No pressure to put this in this PR if you would rather leave it for me to tackle later.

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants