Skip to content

feat: move schemas and iwc workflows script to python package (#540) #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
run: pip install -r ./catalog/build/py/requirements.txt
- name: Run linkml-lint
# Run linting on the LinkML schemas, to enforce conventions such as in naming, and to catch simple errors.
run: linkml-lint ./catalog/schema --validate --verbose
run: npm run lint-schema
- name: Test LinkML Python generation
# Generate Python code from the main LinkML schemas, discarding the output; this will catch more subtle errors such as references to nonexistent elements.
run: npm run test-gen-python
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ These values will be substituted with assembly-specific values at runtime.

## Editing the LinkML schemas

If the LinkML schemas in `catalog/schema` are edited, the derived JSON schemas and TypeScript definitions should be
If the LinkML schemas in `catalog/py_package/catalog_build/schema` are edited, the derived JSON schemas and TypeScript definitions should be
updated
as follows:

Expand Down
3 changes: 2 additions & 1 deletion catalog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ This directory provides the catalog data (information on genome assemblies, orga
- `py` - Python scripts.
- `ts` - Typescript scripts.
- `output` - JSON files output by the catalog build process, to be consumed by the app.
- `schema` - LinkML schemas for source files.
- `py_package` - Python package used to share catalog features, such as the schemas and build process, with other projects.
- `schema` - Schema-related scripts and derived models.
- `source` - YAML files providing data used as input for building the catalog.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from package.catalog_build import build_files
from ...py_package.catalog_build import build_files

ASSEMBLIES_PATH = "catalog/source/assemblies.yml"
ORGANISMS_PATH = "catalog/source/organisms.yml"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ def __contains__(self, key: str) -> bool:

linkml_meta = LinkMLMeta(
{
"default_prefix": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/schema.yaml#",
"default_prefix": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/schema.yaml#",
"description": "Combined source data schemas.",
"id": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/schema.yaml#",
"id": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/schema.yaml#",
"imports": [
"./assemblies",
"./organisms",
Expand All @@ -61,7 +61,7 @@ def __contains__(self, key: str) -> bool:
"prefix_reference": "https://w3id.org/linkml/",
}
},
"source_file": "./catalog/schema/schema.yaml",
"source_file": "/Users/hunter/git-repos/brc-analytics/catalog/py_package/catalog_build/schema_utils/../schema/schema.yaml",
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source_file metadata uses an absolute, user-specific path; it should reference the relative schema location within the package or be regenerated to use a portable path.

Suggested change
"source_file": "/Users/hunter/git-repos/brc-analytics/catalog/py_package/catalog_build/schema_utils/../schema/schema.yaml",
"source_file": str(Path(__file__).parent.parent / "schema" / "schema.yaml"),

Copilot uses AI. Check for mistakes.

}
)

Expand Down Expand Up @@ -146,7 +146,7 @@ class Assemblies(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/assemblies.yaml#",
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/assemblies.yaml#",
"tree_root": True,
}
)
Expand All @@ -167,7 +167,7 @@ class Assembly(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/assemblies.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/assemblies.yaml#"
}
)

Expand All @@ -187,7 +187,7 @@ class Organisms(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/organisms.yaml#",
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/organisms.yaml#",
"tree_root": True,
}
)
Expand All @@ -208,7 +208,7 @@ class Organism(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/organisms.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/organisms.yaml#"
}
)

Expand Down Expand Up @@ -238,7 +238,7 @@ class Outbreaks(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/outbreaks.yaml#",
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/outbreaks.yaml#",
"tree_root": True,
}
)
Expand All @@ -259,7 +259,7 @@ class Outbreak(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/outbreaks.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/outbreaks.yaml#"
}
)

Expand Down Expand Up @@ -333,7 +333,7 @@ class OutbreakResource(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/outbreaks.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/outbreaks.yaml#"
}
)

Expand Down Expand Up @@ -370,7 +370,7 @@ class MarkdownFileReference(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/outbreaks.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/outbreaks.yaml#"
}
)

Expand Down Expand Up @@ -402,7 +402,7 @@ class WorkflowCategories(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflow_categories.yaml#",
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflow_categories.yaml#",
"tree_root": True,
}
)
Expand All @@ -426,7 +426,7 @@ class WorkflowCategory(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflow_categories.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflow_categories.yaml#"
}
)

Expand Down Expand Up @@ -476,7 +476,7 @@ class Workflows(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflows.yaml#",
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflows.yaml#",
"tree_root": True,
}
)
Expand All @@ -497,7 +497,7 @@ class Workflow(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflows.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflows.yaml#"
}
)

Expand Down Expand Up @@ -569,7 +569,7 @@ class WorkflowParameter(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflows.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflows.yaml#"
}
)

Expand Down Expand Up @@ -610,7 +610,7 @@ class WorkflowUrlSpec(ConfiguredBaseModel):

linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta(
{
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflows.yaml#"
"from_schema": "https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflows.yaml#"
}
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

import requests
import yaml
from generated_schema.schema import (

from .generated_schema.schema import (
Workflow,
WorkflowCategoryId,
WorkflowParameter,
Expand All @@ -15,7 +16,6 @@
)

URL = "https://iwc.galaxyproject.org/workflow_manifest.json"
WORKFLOWS_PATH = "catalog/source/workflows.yml"
DOCKSTORE_COLLECTION_TO_CATEGORY = {
"Variant Calling": WorkflowCategoryId.VARIANT_CALLING,
"Transcriptomics": WorkflowCategoryId.TRANSCRIPTOMICS,
Expand All @@ -31,9 +31,9 @@
)


def read_existing_yaml():
if os.path.exists(WORKFLOWS_PATH):
with open(WORKFLOWS_PATH) as fh:
def read_existing_yaml(workflows_path):
if os.path.exists(workflows_path):
with open(workflows_path) as fh:
workflows = Workflows.model_validate(yaml.safe_load(fh)).workflows
else:
# start from scratch
Expand Down Expand Up @@ -116,8 +116,8 @@ def generate_current_workflows():
return by_trs_id


def merge_into_existing():
existing = read_existing_yaml()
def merge_into_existing(workflows_path):
existing = read_existing_yaml(workflows_path)
current = generate_current_workflows()
merged: Dict[str, Workflow] = {}
for versionless_trs_id, current_workflow_input in current.items():
Expand All @@ -144,8 +144,8 @@ def merge_into_existing():
return merged


def to_workflows_yaml(exclude_other: bool):
by_trs_id = merge_into_existing()
def to_workflows_yaml(workflows_path: str, exclude_other: bool):
by_trs_id = merge_into_existing(workflows_path)
# sort by trs id, should play nicer with git diffs
sorted_workflows = list(dict(sorted(by_trs_id.items())).values())
if exclude_other:
Expand All @@ -160,25 +160,28 @@ def to_workflows_yaml(exclude_other: bool):
final_workflows.append(workflow)
sorted_workflows = final_workflows
final_workflows = sorted_workflows
with open(WORKFLOWS_PATH, "w") as out:
with open(workflows_path, "w") as out:
yaml.safe_dump(
Workflows(workflows=final_workflows).model_dump(exclude_none=True),
out,
allow_unicode=True,
sort_keys=False,
)
# Turns out the YAML style prettier likes is really hard to create in python ...
subprocess.run(["npx", "prettier", "--write", WORKFLOWS_PATH])
subprocess.run(["npx", "prettier", "--write", workflows_path])


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Build workflows.yaml file from latest IWC JSON manifest."
)
parser.add_argument(
"workflows_path", help="Path of workflows YAML file to read/write."
)
parser.add_argument(
"--exclude-other",
action="store_true",
help="Exclude other items from processing.",
)
args = parser.parse_args()
to_workflows_yaml(exclude_other=args.exclude_other)
to_workflows_yaml(args.workflows_path, exclude_other=args.exclude_other)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/assemblies.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/assemblies.yaml#
name: assemblies
description: Schema for defining genomic assemblies available in the BRC Analytics platform.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/organism_ploidy.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/organism_ploidy.yaml#
name: enums_organism_ploidy

enums:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/outbreak_priority.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/outbreak_priority.yaml#
name: enums_outbreak_priority

enums:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/outbreak_resource_type.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/outbreak_resource_type.yaml#
name: enums_outbreak_resource_type

enums:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/workflow_type.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/workflow_type.yaml#
name: enums_workflow_category_id
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema id references workflow_type.yaml but this is the workflow_category_id.yaml file; update the URL to point to workflow_category_id.yaml#.

Suggested change
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/workflow_type.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/workflow_category_id.yaml#

Copilot uses AI. Check for mistakes.

description: Definition of the workflow category ID enum.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/workflow_parameter_variable.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/workflow_parameter_variable.yaml#
name: enums_workflow_parameter_variable

enums:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/enums/workflow_ploidy.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/enums/workflow_ploidy.yaml#
name: enums_workflow_ploidy

enums:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/organisms.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/organisms.yaml#
name: organisms
description: Schema for defining source organism information used in the BRC Analytics platform.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/outbreaks.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/outbreaks.yaml#
name: outbreaks
description: Schema for defining outbreak and pathogen information used in the BRC Analytics platform.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/schema.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/schema.yaml#
name: schema
description: Combined source data schemas.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflow_categories.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflow_categories.yaml#
name: workflow_categories
description: Schema for defining workflow categories used to organize Galaxy workflows in the BRC Analytics platform.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/schema/workflows.yaml#
id: https://github.com/galaxyproject/brc-analytics/blob/main/catalog/py_package/catalog_build/schema/workflows.yaml#
name: workflows
description: Schema for defining Galaxy workflows available in the BRC Analytics platform.

Expand Down
Loading
Loading