-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
454 migrate resources from old cms to aiod platform #469
base: develop
Are you sure you want to change the base?
454 migrate resources from old cms to aiod platform #469
Conversation
Hi Antonis, just want to let you know I've seen the request and I will get to soonish (this week). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think it's probably mostly fine, though I only tried out the Event Connector so far. Please make sure all connectors work from the branch without any local changes (e.g., for the events connector I noticed already the directory seemed incorrect) and address the other concerns in this review (Event feedback largely also applies to the other connectors). After those concerns are addressed, I'll have another look. Please let me know if you have any questions.
if event.get("platform_resource_identifier") is not None | ||
else None | ||
), | ||
platform=event["platform"] if event.get("platform") is not None else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platform=event["platform"] if event.get("platform") is not None else None, | |
platform=event.get("platform") |
Safely retrieving the value for a key and otherwise returning None is exactly what dict.get
does, so you can replace this pattern with just a call to dict.get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Platform and Platform Identifier are specifically for connectors and are limited to exactly those platforms we have connectors for. In this case I think the platform should simply be PlatformName.ai4europe_cms
with the identifier of the asset in the ai4europe_cms catalogue.
registration_link=event["registration_link"] | ||
if event.get("registration_link") is not None | ||
and len(event.get("registration_link")) <= 256 | ||
else None, | ||
mode=event["mode"] if event.get("mode") is not None else None, | ||
scientific_domain=[sd for sd in event.get("scientific_domain")] | ||
if event.get("scientific_domain") is not None | ||
else [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these cases you may also need to use the second parameter which specifies the value to return if the key is not present in the dictionary, e.g.:
event.get("scientific_domain", [])
event["scientific_domain"] if event.get("scientific_comain") is not None else []
are equivalent.
yield RecordError(identifier=None, error=e) | ||
return | ||
|
||
for event in events: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you are fetching -all- events from ai4eu every hour. This is very wasteful and imposes unneeded strain on both servers and databases, and more prone to connection errors. If you do not fix this from the ai4eu side, please consider at least updating the connector. For example, you could make sure the date_published
exceeds the last added entry. You can use the state
dictionary that's passed as an argument to this function for this. The dictionary is saved to a file every x items, and automatically loaded and passed through on the next invocation of the connector.
E.g., if state.get("last_entry_published", "2000-01-01T00:00:00") > event["last_published"]: continue
at the start of the loop body and set state["last_entry_published"] = event["date_published"]
after the yield statement should work and at least avoid trying to insert many events which already exist in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more "clear" way to resolve this is by adding the ai4eu identifier into the response of the API. This way we could utilize ResourceConnectorById
class since we know that every new asset (organisation / news / event) will get an identifier that is higher than the currest highest identifier.
@AlexJoom do you have any thoughts on this one? Can we add the identifier into the ai4eu API response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antonis96 the ai4europe api response has a platform_resource_identifier
property.
For example page https://ai4europe.eu/node/2420
has platform_resource_identifier
: "node-2420"
Is this sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexJoom Yes, this is totally fine. I will proceed with the necessary changes.
src/tests/connectors/ai4europe_cms/test_ai4europe_cms_event_connector.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Pieter Gijsbers <[email protected]>
…nnector.py Co-authored-by: Pieter Gijsbers <[email protected]>
@PGijsbers Connector is now updated into inheriting from |
Sorry, I missed the ping somehow. Added this to my list. |
Change
Fetch organizations and events from ai4europe CMS, convert them to the AIoD schema, and add periodic updates.
How to Test
docker compose up --build cms-connector
Checklist
Related Issues
Closes #454