Skip to content

Add saint.tech (IMG Saxony-Anhalt) image provider DAG#5576

Open
wprashed wants to merge 26 commits into
WordPress:mainfrom
wprashed:main
Open

Add saint.tech (IMG Saxony-Anhalt) image provider DAG#5576
wprashed wants to merge 26 commits into
WordPress:mainfrom
wprashed:main

Conversation

@wprashed
Copy link
Copy Markdown

Resolves #5571

This PR adds a new Provider API DAG for SAiNT (IMG Saxony-Anhalt) to ingest CC-licensed images of touristic points of interest, tours, and events in Saxony-Anhalt.

Changes Made

  • Provider Details: Registered saint_tech as a default provider in catalog/dags/common/loader/provider_details.py and set its default image category to PHOTOGRAPH.
  • API Ingester Script: Created catalog/dags/providers/provider_api_scripts/saint.py containing the SaintDataIngester class. The script handles pagination via the page and pageSize parameters, passing an API key fetched from the Airflow Variable API_KEY_SAINT. It extracts the PrimaryImage attributes (URL, width, height), POI title, foreign identifier, and evaluates the attached licenses to ensure compatibility.
  • Workflows: Registered SaintDataIngester in catalog/dags/providers/provider_workflows.py to run on a monthly schedule.
  • Unit Tests: Added unit tests in catalog/tests/dags/providers/provider_api_scripts/test_saint.py for fetching parameters, batch data extraction, and correct parsing of records.

Testing Instructions

  1. You must set an Airflow variable for the API key to test the script:
    airflow variables set API_KEY_SAINT "<your-saint-tech-api-key>"
  2. Run the unit tests locally to verify the mocked data parsing:
    just catalog/test tests/dags/providers/provider_api_scripts/test_saint.py
  3. Optionally, trigger a manual test run of the saint_tech_workflow DAG to verify real-world data ingestion works as expected.

@wprashed wprashed requested a review from a team as a code owner May 11, 2026 05:25
@wprashed wprashed requested review from krysal and obulat and removed request for a team May 11, 2026 05:25
@openverse-bot openverse-bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository labels May 11, 2026
@openverse-bot openverse-bot moved this to 👀 Needs Review in Openverse PRs May 11, 2026
@wprashed wprashed requested a review from a team as a code owner May 11, 2026 06:29
@wprashed wprashed requested a review from a team as a code owner May 11, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Projects

Status: 👀 Needs Review

Development

Successfully merging this pull request may close these issues.

saint.tech

2 participants