Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

454 migrate resources from old cms to aiod platform #469

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

antonis96
Copy link
Collaborator

@antonis96 antonis96 commented Mar 4, 2025

Change

Fetch organizations and events from ai4europe CMS, convert them to the AIoD schema, and add periodic updates.

How to Test

  1. Run the command:
    docker compose up --build cms-connector
  2. Wait about 5 minutes for the container to complete its process.
  3. Verify that the database has been populated with around 260 organisations and about 200 events.

Checklist

  • Tests have been added or updated to reflect the changes, or their absence is explicitly explained.
  • Documentation has been added or updated to reflect the changes, or their absence is explicitly explained.
  • A self-review has been conducted checking:
    • No unintended changes have been committed.
    • The changes in isolation seem reasonable.
    • Anything that may be odd or unintuitive is provided with a GitHub comment explaining it (but consider if this should not be a code comment or in the documentation instead).
  • All CI checks pass before pinging a reviewer, or provide an explanation if they do not.

Related Issues

Closes #454

@antonis96 antonis96 linked an issue Mar 4, 2025 that may be closed by this pull request
@antonis96 antonis96 requested a review from PGijsbers March 4, 2025 15:26
@antonis96 antonis96 marked this pull request as draft March 4, 2025 15:39
@antonis96 antonis96 marked this pull request as ready for review March 10, 2025 12:38
@antonis96 antonis96 requested review from PGijsbers and removed request for PGijsbers March 10, 2025 12:38
@antonis96 antonis96 self-assigned this Mar 10, 2025
@PGijsbers
Copy link
Collaborator

Hi Antonis, just want to let you know I've seen the request and I will get to soonish (this week).

@antonis96 antonis96 changed the title 454 migrate organizations from old cms to aiod platform 454 migrate resources from old cms to aiod platform Mar 12, 2025
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think it's probably mostly fine, though I only tried out the Event Connector so far. Please make sure all connectors work from the branch without any local changes (e.g., for the events connector I noticed already the directory seemed incorrect) and address the other concerns in this review (Event feedback largely also applies to the other connectors). After those concerns are addressed, I'll have another look. Please let me know if you have any questions.

if event.get("platform_resource_identifier") is not None
else None
),
platform=event["platform"] if event.get("platform") is not None else None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
platform=event["platform"] if event.get("platform") is not None else None,
platform=event.get("platform")

Safely retrieving the value for a key and otherwise returning None is exactly what dict.get does, so you can replace this pattern with just a call to dict.get.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Platform and Platform Identifier are specifically for connectors and are limited to exactly those platforms we have connectors for. In this case I think the platform should simply be PlatformName.ai4europe_cms with the identifier of the asset in the ai4europe_cms catalogue.

Comment on lines 65 to 72
registration_link=event["registration_link"]
if event.get("registration_link") is not None
and len(event.get("registration_link")) <= 256
else None,
mode=event["mode"] if event.get("mode") is not None else None,
scientific_domain=[sd for sd in event.get("scientific_domain")]
if event.get("scientific_domain") is not None
else [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases you may also need to use the second parameter which specifies the value to return if the key is not present in the dictionary, e.g.:

event.get("scientific_domain", [])
event["scientific_domain"] if event.get("scientific_comain") is not None else []

are equivalent.

yield RecordError(identifier=None, error=e)
return

for event in events:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you are fetching -all- events from ai4eu every hour. This is very wasteful and imposes unneeded strain on both servers and databases, and more prone to connection errors. If you do not fix this from the ai4eu side, please consider at least updating the connector. For example, you could make sure the date_published exceeds the last added entry. You can use the state dictionary that's passed as an argument to this function for this. The dictionary is saved to a file every x items, and automatically loaded and passed through on the next invocation of the connector.

E.g., if state.get("last_entry_published", "2000-01-01T00:00:00") > event["last_published"]: continue at the start of the loop body and set state["last_entry_published"] = event["date_published"] after the yield statement should work and at least avoid trying to insert many events which already exist in the database.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more "clear" way to resolve this is by adding the ai4eu identifier into the response of the API. This way we could utilize ResourceConnectorById class since we know that every new asset (organisation / news / event) will get an identifier that is higher than the currest highest identifier.
@AlexJoom do you have any thoughts on this one? Can we add the identifier into the ai4eu API response?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonis96 the ai4europe api response has a platform_resource_identifier property.

For example page https://ai4europe.eu/node/2420
has platform_resource_identifier: "node-2420"

Is this sufficient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexJoom Yes, this is totally fine. I will proceed with the necessary changes.

@antonis96
Copy link
Collaborator Author

@PGijsbers Connector is now updated into inheriting from ResourceConnectorById .

@PGijsbers PGijsbers moved this to In Progress in AIoD API Apr 1, 2025
@PGijsbers
Copy link
Collaborator

Sorry, I missed the ping somehow. Added this to my list.

@PGijsbers PGijsbers moved this from In Progress to Review in AIoD API Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

Migrate Resources from Old CMS to AIoD Platform
3 participants