Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Feb 3, 2026

This pull request introduces comprehensive support for the FAO Open Knowledge data source, including new data models, a URL collector, and extensive unit tests. It also adds new utility functions for serializing dataclass instances and integrates this serialization into the document collection workflow. These changes improve the system's ability to ingest, validate, and process FAO Open Knowledge documents, while ensuring robust error handling and test coverage.

FAO Open Knowledge integration:

  • Added new data models in fao_open_knowledge.py to represent FAO Open Knowledge items, bundles, bitstreams, and related metadata, enabling structured parsing and validation of API responses.
  • Implemented FAOOpenKnowledgeURLCollector in fao_open_knowledge_collector.py to fetch and construct WeLearnDocument objects from the FAO Open Knowledge API, supporting automated document discovery and ingestion.

Testing and validation:

  • Added extensive unit tests for both the FAO Open Knowledge plugin and its data models, covering scenarios such as embargoed, withdrawn, unauthorized, and error cases to ensure reliability and correct handling of edge cases. [1] [2]

Dataclass serialization utilities:

  • Introduced utility functions is_dataclass_instance, _inner_serialize_dataclass, and serialize_dataclass_instance in computed_metadata.py to recursively serialize dataclass instances, improving compatibility with downstream processing and storage. [1] [2]

Workflow integration:

  • Integrated the new dataclass serialization utility into the main document collection workflow in document_collector.py, ensuring that all document details are properly serialized before database insertion. [1] [2] [3]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the FAO Open Knowledge data source into the system, enabling automated document discovery, validation, and ingestion from the FAO Open Knowledge repository.

Changes:

  • Added comprehensive FAO Open Knowledge integration including data models, URL collector, and document collector plugin
  • Implemented dataclass serialization utilities to properly handle structured metadata before database storage
  • Added extensive unit tests covering various edge cases (embargoed, withdrawn, unauthorized documents, HTTP errors)

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
welearn_datastack/plugins/rest_requesters/fao_open_knowledge.py New collector plugin that fetches and processes FAO Open Knowledge documents
welearn_datastack/data/source_models/fao_open_knowledge.py Pydantic models for FAO Open Knowledge API responses
welearn_datastack/collectors/fao_open_knowledge_collector.py URL collector for discovering FAO Open Knowledge documents
welearn_datastack/modules/computed_metadata.py Added dataclass serialization utilities
welearn_datastack/nodes_workflow/DocumentHubCollector/document_collector.py Integrated dataclass serialization into workflow
welearn_datastack/plugins/rest_requesters/init.py Registered FAO collector plugin
welearn_datastack/nodes_workflow/URLCollectors/node_fao_open_knowledge_collect.py Workflow node for FAO URL collection
tests/source_models/test_fao_open_knownledge.py Tests for FAO data models
tests/document_collector_hub/plugins_test/test_fao_open_knowledge.py Tests for FAO collector plugin
welearn_datastack/plugins/rest_requesters/open_alex.py Removed blank line

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants