A lightweight, LinkML‑based data catalog that can be generated and validated automatically via Copier templates and GitHub Actions.
- Adding Content to the Catalog
- Managing Repository Permissions
- What Happens After a Push?
- Development Workflow (for reference)
- Customization
All catalog metadata lives in data-catalog/data-catalog.yaml.
The file follows a LinkML schema (see simple_data_catalog_model).
datasets:
- identifier: ex:myDataset
title: My Dataset
publisher:
name: My Organization
contactPoint:
hasEmail: contact@myorg.org
status: draft
theme:
- ex:energy
- ex:climate
license:
title: CC‑BY 4.0
temporal:
hasBeginning: "2024-01-01"
hasEnd: "2024-12-31"
inSeries: ex:mySeries
distribution: ex:myDist- identifier – must be a globally unique CURIE (e.g.,
ex:myDataset). - distribution – points to an entry in the
distributionssection.
distributions:
- identifier: ex:myDist
format: csv
accessURL: https://example.com/data/my-dataset.csv
issued: "2024-11-01"dataServices:
- identifier: ex:myService
title: Example API
servesDataset:
- ex:myDataset
description: REST endpoint exposing the dataset
endpointURL: https://api.example.com/v1/datasets/my-datasetAdd entries under the respective top‑level keys (concepts, series, metrics, qualityMeasurements) following the same pattern as above.
Tip: Keep the YAML tidy by using the existing sections as templates. The generate-datacatalog.sh script (triggered by CI) will automatically render the yaml into the final TTL file, as well ass the published catalog.
Repository owners can control who can commit, push, and merge changes to the catalog.
| Permission | GitHub Setting | Recommended Role |
|---|---|---|
| Read | Read access |
All collaborators |
| Write | Write access |
Contributors who add/modify catalog entries |
| Admin | Admin access |
Project maintainers |
-
Branch protection
- Go to Settings → Branches → Add rule (e.g.,
main). - Enable Require pull request reviews before merging.
- Optionally require status checks (see next section).
- Go to Settings → Branches → Add rule (e.g.,
-
Team permissions (if using an organization)
- In Settings → Collaborators & teams, create a team (e.g.,
catalog‑editors). - Grant the team Write permission.
- In Settings → Collaborators & teams, create a team (e.g.,
-
Code owners (optional)
- Add a
CODEOWNERSfile at the repository root:
# CODEOWNERS # Users listed here must approve PRs that modify the catalog *.yaml @catalog-editorsThis forces review by the designated owners.
- Add a
-
GitHub Actions Workflow (
.github/workflows/catalog.yml) runs automatically on every push tomain(or on pull‑request merges). -
The workflow executes the following steps:
- Validate the updated
data-catalog.yamlusinglinkml-validate. - Render the yaml file (
data-catalog.yaml) into a TTL file vialinkml-convert. - Commit the generated
data-catalog.ttlback to the repository (if the workflow is configured to do so). - Publish the artifact (e.g., upload to a static site or an S3 bucket) so downstream users can fetch the latest catalog.
- Validate the updated
-
If validation fails, the workflow aborts and the CI status on the PR becomes failed. Reviewers must fix the YAML before merging.
-
Successful runs result in a green checkmark on the PR/commit, indicating the catalog is up‑to‑date and syntactically correct.
# 1. Install dependencies (once)
uv sync
# 2. Run the generation script locally (optional)
bash template/scripts/generate-datacatalog.sh
# 3. Validate manually
uv run linkml-validate -s simple_data_catalog_model/src/simple_data_catalog_model/data-catalog.yaml data-catalog/data-catalog.yaml
If you wish to custoomize the look of the catalog, this can be done in the suplemental-ui directory. For instance, if you wish to change the logo in the top left corner with your own, replace the file in supplemental-ui/img/logo.svg with your own. Make sure to call the new file 'logo.svg'