Requirements:
- Git
- UV
- Pre-commit
Setup:
- install tools (
brew install git uv pre-commit) - clone and setup project [1]
[1]
% git clone https://gitlab.data.bas.ac.uk/uk-pdc/metadata-infrastructure/metadata-library.git
% cd metadata-library/
% pre-commit install
% uv sync --all-groupsTaskipy is used to define development tasks, such as running tests and rebuilding distribution schemas. These tasks are akin to NPM scripts or similar concepts.
Run task --list (or uv run task --list) for available commands.
Run task [task] (uv run task [task]) to run a specific task.
See Adding development tasks for how to add new tasks.
Tip
If offline, use uv run --offline task ... to avoid lookup errors trying to the unconstrained build system
requirements in pyproject.toml, which is a Known Issue within UV.
All changes except minor tweaks (typos, comments, etc.) MUST:
- be associated with an issue (either directly or by reference)
- be included in
CHANGELOG.md
Note
This task requires significant work.
To add a new standard:
- create a new module under
bas_metadata_library.standards, e.g.bas_metadata_library.standards.foo_v1/__init__.py - in this module, overload the
Namespaces,MetadataRecordConfigandMetadataRecordclasses as needed- version the
MetadataRecordConfigclass, e.g.MetadataRecordConfigV1
- version the
- create a suitable metadata configuration JSON schema in
bas_metadata_library.schemas.src, e.g.bas_metadata_library.schemas.src.foo_v1.json - update the
generate_schemasmethod in the Test App to generate distribution schemas - update the
validate_schemasmethod in the Test App to check source and distribution schemas - add a script line to the
publish-schemas-stageandpublish-schemas-prodjobs in.gitlab-ci.yml, to publish the distribution schema within the BAS Metadata Standards website - define a series of test configurations (e.g. minimal, typical and complete) for generating test records in
tests.resources.configse.g.tests.resources.configs.foo_v1_standard - add a route in the Test App for generating test records for the new standard
- update the
capture_test_recordsmethod in the Test App to generate and save test records - add relevant Tests with methods to test each metadata element class and test records
Note
These instructions are specific to the ISO 19115 metadata standards family.
- amend configuration schema:
- new or changed properties should be added to the configuration for the relevant standard (e.g. ISO 19115-1)
- typically, this involves adding new elements to the
definitionsproperty and referencing these in the relevant parent element (e.g. to theidentificationproperty)
- generate distribution schemas
- amend test configs:
- new or changed properties should be made to the relevant test record configurations in
tests.resources.configs - there are different levels of configuration, from minimal to complete, which should, where possible, build on each other (e.g. the complete record should include all the properties and values of the minimal record)
- the
minimumconfiguration should not be changed, as all mandatory elements are already implemented - the
base_simpleconfiguration should contain elements used most of the time, that use free-text values - the
base_complexconfiguration should contain elements used most of the time, that use URL or other identifier values - the
completeconfiguration should contain examples of all supported elements, providing this still produces a valid record, in order to ensure high test coverage - where possible, configurations should be internally consistent, but this can be ignored if needed
- values used for identifiers and other external references should use the correct form/structure but do not need to exist or relate to the resource described by each configuration (i.e. DOIs should be valid URLs but could be a DOI for another resource for example)
- new or changed properties should be made to the relevant test record configurations in
- add relevant element class:
- new or changed elements should be added to the configuration for the relevant package for each standard
- for the ISO 19115 family of standards, element classes should be added to the
iso_19115_commonpackage - the exact module to use within this package will depend on the nature of the element being added, but in general,
elements should be added to the module of their parent element (e.g.
data_identification.pyfor elements under theidentificationrecord configuration property), elements used across a range of elements should be added to thecommon_elements.pymodule - remember to include references to new element class in the parent element class (in both the
make_elementandmake_configmethods)
- capture test records
- initially this acts as a good way to check new or changed element classes encode configuration properties correctly
- check the git status of these test records to check existing records have changed how you expect (and haven't changed things you didn't intend to for example)
- capture test JSON configurations
- check the git status of these test configs to check they are encoded correctly from Python (i.e. dates)
- add tests:
- new test cases should be added, or existing test cases updated, in the relevant module within
tests/bas_metadata_library/ - for the ISO 19115 family of standards, this should be
test_standard_iso_19115_1.py, unless the element is only part of the ISO 19115-2 standard - providing there are enough test configurations to test all the ways a new element can be used (e.g. with a simple text string or anchor element for example), adding a test case for each element is typically enough to ensure sufficient test coverage
- where this isn't the case, it's suggested to add one or more 'edge case' test cases to test remaining code paths explicitly
- new test cases should be added, or existing test cases updated, in the relevant module within
- check test coverage:
- for missing coverage, consider adding edge case test cases where applicable
- coverage exemptions should be avoided wherever feasible and all exemptions must be discussed before they are added
- where exceptions are added, they should be documented as an issue with information on how they will be addressed in the longer term
- update
README.mdexamples if common element:- this is probably best done before releasing a new version
- update
CHANGELOG.md - if needed, add name to
authorsproperty inpyproject.toml
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
Note
This task requires significant work. It should be reserved for breaking or major changes to a schema.
Tip
In these instructions, v1 refers to the current/previous configuration version. v2 refers to the new version.
First, create a new configuration version that is identical to the current/previous version, but that sets up the schema, objects, methods, tests and documentation needed for the new configuration, and to convert between the old and new configurations.
- create an issue summarising, and referencing specific issues for, changes to be made in the new schema version
- copy the current/previous metadata configuration JSON schema from
bas_metadata_library.schemas.srce.g.bas_metadata_library.schemas.src.foo_v1.jsontobas_metadata_library.schemas.src.foo_v2.json- change the version in:
- the
$idproperty - the
titleproperty - the
descriptionproperty
- the
- change the version in:
- duplicate the configuration classes for the standard in
bas_metadata_library.standards- i.e. in
bas_metadata_library.standards.foo_v1/__init__.py, copyMetadataRecordConfigV1toMetadataRecordConfigV2
- i.e. in
- in the new configuration class, add
upgrade_to_v1_config()anddowngrade_to_v2_config()methods- the
upgrade_from_v2_config()method should accept a current/previous configuration class - the
downgrade_to_v1_config()method should return a current/previous configuration class
- the
- change the signature of the
MetadataRecordclass to use the new configuration class - change the
make_config()method of theMetadataRecordclass to return the new configuration class - update the
_generate_schemas()method in the Test App to generate distribution schemas for the new schema version - update the
_validate_schemas()method in the Test App to generate distribution schemas for the new schema version - Generate configuration schemas
- add a line to the
publish-schemas-stageandpublish-schemas-prodjobs in.gitlab-ci.yml, to publish the distribution schema for the new schema version within the BAS Metadata Standards website - define a series of test configurations (e.g. minimal, typical and complete) for generating test records in
tests.resources.configs/e.g.tests.resources.configs.foo_v1_standard- note that the version in these file names is for the version of the standard, not the configuration
- new config objects will be made within this file that relate to the new configuration version
- update the
_capture_json_test_configs()method in Test App to generate JSON versions of each test configuration - Capture test JSON record configurations
- update the route for the standard in Test App (e.g.
standard_foo_v1) to:- upgrade configs for the old/current version of the standard (as the old/current MetadataRecordConfig class will now be incompatible with the updated MetadataRecord class)
- include configs for the new config version of the standard
- update the
capture_test_records()method in Test App to capture test records for the new test configurations - Capture test XML records
- add test cases for the new
MetadataRecordConfigclass in the relevant module intests.bas_metadata_library:test_invalid_configuration_v2test_configuration_v2_from_json_filetest_configuration_v2_from_json_stringtest_configuration_v2_to_json_filetest_configuration_v2_to_json_stringtest_configuration_v2_json_round_triptest_parse_existing_record_v2test_lossless_conversion_v2
- change all test cases to target record configurations for the new version
- update the
test_record_schema_validation_validandtest_record_schema_validation_validtest cases, which test the XML/XSD schema for the standard, not the configuration JSON schema - update the existing
test_lossless_conversion_v1test case to upgrade v1 configurations to v2, as theMetadataRecordclass will no longer be compatible with theMetadataRecordConfigV1class - update the Supported configuration versions section of the README
- add the new schema version, with a status of 'alpha'
- update the encode/decode subsections in the Usage section of the README to use the new
RecordConfigclass and$schemaURI - if the lead standard (ISO 19115) is being updated also update these Usage subsections:
- add a subsection to the Usage section of the README explaining how to upgrade and downgrade a configuration between the old and new versions
- Update the change log to reference the creation of the new schema version, referencing the summary issue
Second, iteratively introduce changes to the new configuration, adding logic to convert between the old and new configurations as needed. This logic will likely be messy and may target specific known use-cases. This is acceptable on the basis these methods will be relatively short-lived.
- as changes are made, add notes and caveats to the upgrade/downgrade methods in code, and summarise any significant points in the Usage instructions as needed (e.g. in the 'Information that will be lost when downgrading:' section)
- if changes are made to the minimal record configuration, update examples in the README
- if circumstances where data can't be mapped between schemas, consider raising exception in methods for manual conversion
Caution
This subsection is a Work in Progress (WIP) and may not be complete/accurate.
... release the new configuration version as experimental for the standard ...
- update the Supported configuration versions section of the README
- add the new/current schema version with a status of 'experimental'
Once confirmed as working, and the new schema version is agreed for release:
- update the
README.mdSupported Configuration Versions section to:- update the new/current schema version with a status of 'stable'
- update the old/previous schema version with a status of 'deprecated'
- create a new production Release of the package
- create an issue for retiring the old schema version
- delete the previous metadata configuration JSON schema from
bas_metadata_library.schemas.srce.g.bas_metadata_library.schemas.src.foo_v1.json - delete the configuration classes for the standard in
bas_metadata_library.standards- i.e. in
bas_metadata_library.standards.foo_v1/__init__.py, deleteMetadataRecordConfigV1
- i.e. in
- in the new/current configuration class, remove
upgrade_to_v1_config()anddowngrade_to_v2_config()methods - delete the
upgrade_to_v1_config()anddowngrade_to_v2_config()methods from the standardsutilsmodule - delete the test configurations from
tests.resources.configs(minimal_record_v1, etc. infoo_v1.py) - delete corresponding JSON configurations from
tests.resources.configs(e.g. intests.resources.configs.foo_v1) - delete corresponding test records from
tests.resources.records(e.g. intests.resources.records.foo_v1) - update the relevant
_generate_record_*()method in the Test App - update the
_generate_schemas()method in the Test App to remove the old schema version - update the
_validate_schemas()method in the Test App to remove the old schema version - update the
_capture_json_test_configs()method in the Test App to remove the old schema version - update the
_capture_test_records()method in the Test App to remove the old schema version - update the
publish-schemas-stageandpublish-schemas-prodjobs in.gitlab-ci.yml, to remove the old schema version - remove test cases for the old
MetadataRecordConfigclass in the relevant module intests.bas_metadata_library:test_invalid_configuration_v1test_configuration_v1_from_json_filetest_configuration_v1_from_json_stringtest_configuration_v1_to_json_filetest_configuration_v1_to_json_stringtest_configuration_v1_json_round_triptest_parse_existing_record_v1test_lossless_conversion_v1
- if applicable, remove any edge case tests for converting from the old to new/current schema version
- update the Supported configuration versions section of the README
- update the old schema version with a status of 'retired'
- remove the subsection to the Usage section of the README for how to upgrade and downgrade a configuration between the old and new/current versions
- Update the change log to reference the removal of the new schema version, referencing the summary issue, as a breaking change
Tip
See 33b7509c 🛡️ for an example of removing a schema version.
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
See #250 🛡️ for an example of adding a profile.
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
[!TIP]
In these instructions, v1 refers to the current/previous profile version. v2 refers to the new version.
First, create a new profile version that is identical to the current/previous version, but that sets up the schema, tests, documentation and other information needed for the new profile.
- create an issue referencing the development of the new profile version (i.e. agreed abstract changes)
- copy the current/previous profile JSON schema from
bas_metadata_library.schemas.srce.g.bas_metadata_library.schemas.src.foo_v1.jsontobas_metadata_library.schemas.src.foo_v2.json- change the version in:
- the
$idproperty - the
titleproperty - the
descriptionproperty
- the
- change the version in:
- update the
_generate_schemas()method in the Test App to generate distribution schemas for the new profile version - update the
_validate_schemas()method in the Test App to generate distribution schemas for the new profile version - Generate configuration schemas
- add a line to the
publish-schemas-stageandpublish-schemas-prodjobs in.gitlab-ci.yml, to publish the distribution schema for the new schema version within the BAS Metadata Standards website - update the profile's standards validate method to support the new profile version
- define a series of test configurations (e.g. minimal, typical and complete) for generating test records in
tests.resources.configs/e.g.tests.resources.configs.foo_profile- new config objects will be made within this file that relate to the new profile version
- update the relevant profile in the
profilesdict in the_capture_json_test_configs()method in the Test App to generate JSON test configurations for both v1 and v2 profile versions - Capture test JSON record configurations
- update the route for the profile in the Test App (e.g.
_generate_record_foo) to target configurations for both v1 and v2 profile versions - update the relevant profile in the
profilesdict in thecapture_test_records()method in the Test App to capture test records for both the v1 and v2 profile versions - Capture test XML records
- copy the tests module for the v1 profile to create a v2 profile under
tests.bas_metadata_library - update the configuration import and tests to target the v2 profile version
- add the new profile version to the
README.mdSupported Profiles section - update the
README.mdSupported Configuration Versions section to:- add the new profile version, with a status of 'experimental'
- Update the change log to reference the creation of the new profile version, referencing the summary issue
Second, implement the changes from the agreed new abstract profile into the new schema.
- Generate configuration schemas when the schema is updated
- Update the test configurations to comply with and demonstrate the new scheme
- Re-capture test JSON record configurations
- Re-capture test XML records
Tip
The generated records (XML) and record configurations (JSON) SHOULD be used as examples in the abstract profile.
- merge the new profile changes into
mainand create a Prerelease for testing - test the package to check it works as expected
To generate distribution schemas from source schemas, run the
generate-schemas Development Task, which uses the internal Flask Test App.
Tip
jsonref is used to resolve any references in source schemas.
To add a schema for a new standard/profile:
- adjust the
schemaslist in the_generate_schemas()method in the Flask Test App - this list should contain dictionaries with keys for the common name of the schema (based on the common file name of the schema JSON file), and whether the source schema should be resolved (true) or simply copied (false)
- this should be true by default, and is only relevant to schemas that do not contain any references, as this will cause an error if resolved
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
To check source and distribution JSON Schemas comply with the JSON Schema specification, run the validate-schemas
Development Task, which uses the internal Flask Test App.
To add a schema for a new standard/profile:
- adjust the
schemaslist in the_validate_schemas()method in the Flask Test App - this list should contain dictionaries with a key for the common name of the schema (based on the common file name of the schema JSON file)
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
See #266 🛡️ for an example of removing a standard.
See the Taskipy documentation.
The minimum Python version is 3.9 for compatibility with older BAS IT base images.
The Safety package checks dependencies for known vulnerabilities.
Warning
As with all security tools, Safety is an aid for spotting common mistakes, not a guarantee of secure code. In particular this is using the free vulnerability database, which is updated less frequently than paid options.
Checks are run automatically in Continuous Integration.
Tip
To check locally run the safety Development Task.
- create an issue and switch to branch
- run
uv tree --outdated --depth=1to list outdated packages - follow https://docs.astral.sh/uv/concepts/projects/sync/#upgrading-locked-package-versions
- note upgrades in the issue
- review any major/breaking upgrades
- run Tests manually
- commit changes
Ruff is used to lint and format Python files. Specific checks and config options are
set in pyproject.toml. Linting checks are run automatically in
Continuous Integration and the Pre-Commit Hook.
Tip
To check linting manually run the lint Development Task, for formatting run the format task.
Ruff is configured to run Bandit, a static analysis tool for Python.
Warning
As with all security tools, Bandit is an aid for spotting common mistakes, not a guarantee of secure code. In particular this tool can't check for issues that are only be detectable when running code.
PyMarkdown is used to lint Markdown files. Specific checks and config
options are set in pyproject.toml. Linting checks are run automatically in
Continuous Integration and the Pre-Commit Hook.
Tip
To check linting manually run the markdown Development Task.
Wide tables will fail rule MD013 (max line length). Wrap such tables with pragma disable/enable exceptions:
<!-- pyml disable md013 -->
| Header | Header |
|--------|--------|
| Value | Value |
<!-- pyml enable md013 -->Stacked admonitions will fail rule MD028 (blank lines in blockquote) as it's ambiguous whether a new blockquote has
started where another element isn't inbetween. Wrap such instances with pragma disable/enable exceptions:
<!-- pyml disable md028 -->
> [!NOTE]
> ...
> [!NOTE]
> ...
<!-- pyml enable md028 -->For consistency, it's strongly recommended to configure your IDE or other editor to use the
EditorConfig settings defined in .editorconfig.
A Pre-Commit hook is configured in .pre-commit-config.yaml.
To update Pre-Commit and configured hooks:
% pre-commit autoupdateTip
To run pre-commit checks against all files manually run the pre-commit Development Task.
Important
This library does not, and cannot, support all possible elements and variations within each standard. Its tests are therefore not exhaustive, and reflects the subset of each standard needed for use-cases within BAS.
pytest with a number of plugins is used for testing the application. Config options are set
in pyproject.toml. Tests are defined in the tests package and use an internal Flask App.
Tests are run automatically in Continuous Integration.
Tip
To run tests manually run the test Development Task.
Tip
To run a specific test:
% uv run pytest tests/path/to/test_module.py::<class>.<method>If a test run fails with a NotImplementedError exception run the test-reset Development Task.
This occurs where:
- a test fails and the failed test is then renamed or parameterised options changed
- the reference to the previously failed test has been cached to enable the
--failed-firstruntime option - the cached reference no longer exists triggering an error which isn't handled by the
pytest-random-orderplugin
Running this task clears Pytest's cache and re-runs all tests, skipping the --failed-first option.
Fixtures SHOULD be defined in tests.conftest, prefixed with fx_ to indicate they are a fixture when used in tests.
E.g.:
import pytest
@pytest.fixture()
def fx_foo() -> str:
"""Example of a test fixture."""
return 'foo'pytest-cov checks test coverage. We aim for 100% coverage but exemptions are
fine with good justification:
# pragma: no cover- for general exemptions# pragma: no branch- where a conditional branch can never be called
Continuous Integration will check coverage automatically.
Tip
To check coverage manually run the test-cov Development Task.
Tip
To run tests for a specific module locally:
% uv run pytest --cov=lantern.some.module --cov-report=html tests/lantern_tests/some/moduleWhere tests are added to ensure coverage, use the cov mark, e.g:
import pytest
@pytest.mark.cov()
def test_foo():
assert 'foo' == 'foo'An internal Flask app (tests.app) is used to generate, validate and capture test records, record configurations and
record configuration schemas. It consists of:
- routes for:
- calling the Metadata Library to generate records from a given configuration for a standard
- CLI commands to:
- generate schemas for standards
- capture record configurations as JSON
- capture records as XML
Available routes and commands can be used listed using:
$ uv run flask --app tests.app --helpPytest will automatically trigger Configuration Schemas Validation.
Test methods check individual elements are formed correctly. Comparisons are also made against static test records to
check the structure of whole records for each standard are also formed correctly. These records, from minimal through
to 'complete' usage (against our supported subsets of standards), are defined in tests.resources.configs/.
To generate test records for standards encoded as JSON files in tests.resources.records run the capture-test-records
Development Task, which uses the internal Flask Test App.
Important
These records check element classes dump/load (encode/decode) information from/to records correctly. They MUST be manually verified as accurate.
It is intended that this command will update pre-existing records, with differences reviewed in version control to aid in this manual verification.
To generate and update test configurations for standards encoded as JSON files in tests.resources.configs, run the
capture-json-test-configs Development Task, which uses the internal
Flask Test App.
Important
These records check element classes dump/load (encode/decode) information from/to records correctly. They MUST be manually verified as accurate.
It is intended that this command will update pre-existing records, with differences reviewed in version control to aid in this manual verification.
A set of test keys are used for signing and encrypting administrative metadata within test configurations. These keys
were generated using tests.resources.keys.make_keys() [1] and are intended as insecure shared values.
Warning
Changing these keys will MAY require changes to downstream projects.
For downstream projects needing to set these keys as environment variables, use:
# signing key (public)
FOO="{\"kty\":\"EC\",\"kid\":\"bas_metadata_testing_signing_key\",\"alg\":\"ES256\",\"crv\":\"P-256\",\"x\":\"FzxBM1ZPO5W2bYlhT9AjZUKz5_oH5vIh4_k4aEZ64rM\",\"y\":\"vmK5PWOoIA9eO0ntLh37AMpVODyj0NWf842FwoN-GRs\"}",
# signing key (private)
BAR="{\"kty\":\"EC\",\"kid\":\"bas_metadata_testing_signing_key\",\"alg\":\"ES256\",\"crv\":\"P-256\",\"x\":\"FzxBM1ZPO5W2bYlhT9AjZUKz5_oH5vIh4_k4aEZ64rM\",\"y\":\"vmK5PWOoIA9eO0ntLh37AMpVODyj0NWf842FwoN-GRs\",\"d\":\"2lBuUtJK2TcV_b4B-bDCPnRVAqMnYvnLZ41IUguprs8\"}",
# encryption key (private)
BAZ="enc":"{\"kty\":\"EC\",\"kid\":\"bas_metadata_testing_encryption_key\",\"alg\":\"ECDH-ES+A128KW\",\"crv\":\"P-256\",\"x\":\"kYiwq6MW8lGN6PB2csVMuMRcISVk5eNUpGkjM-mm8QY\",\"y\":\"raOTT2xAQhHFKhPHy338L8Ql0hvgsDtHwtEc8pCOf2Q\",\"d\":\"2lBuUtJK2TcV_b4B-bDCPnRVAqMnYvnLZ41IUguprs8\"}"[1]
% python -c "from tests.resources.keys import make_keys; make_keys()"All commits will trigger Continuous Integration using GitLab's CI/CD platform, configured in .gitlab-ci.yml.
See README for available releases.
Create a release issue 🛡️ and follow its instructions.
Creating a tag will automatically trigger a Deployment, which will trigger a GitLab Release, including:
- a milestone link
- the change log extract taken from
CHANGELOG.md - a package artefact link
- a
README.mdlink at the relevant tag
Caution
This section is a Work in Progress (WIP) and may not be complete/accurate.
To create an initial pre-release for an upcoming minor version:
% uv version --bump minor --bump alpha
Then follow the Release Workflow using uv version --bump alpha to create additional pre-releases.
Note
Do not reflect pre-releases in the change log. Do track changes they contain in the floating 'unreleased' release.
Configuration schemas are distributed via the
metadata-resources.data.bas.ac.uk static site via Continuous Deployment.
This project is distributed as a Python (Pip) package available from PyPi via Continuous Deployment.
Tip
To build the package manually run the build Development Task.
Follow the Release Workflow to trigger a deployment, which will:
- build the Python package
- upload it to PyPi
Tagged commits created for Releases will trigger Continuous Deployment using GitLab's
CI/CD platform configured in .gitlab-ci.yml.
Terraform is used to provision IAM resources needed to host Configuration schemas for external access.
Access to the BAS AWS Account 🛡️ is needed to provision these resources.
Note
This provisioning should have already been performed (and applies globally). Any changes only need to be applied once.
# start terraform inside a docker container if not installed locally
$ cd provisioning/terraform
$ docker compose run terraform
# setup terraform
$ terraform init
# apply changes
$ terraform validate
$ terraform fmt
$ terraform apply
# exit container
$ exit
$ docker compose downState information for this project is stored remotely using a Backend.
Specifically the AWS S3 backend as part of the BAS Terraform Remote State 🛡️ project.
Remote state storage will be automatically initialised when running terraform init. Any changes to remote state will
be automatically saved to the remote backend, there is no need to push or pull changes.
Permission to read and/or write remote state information for this project is restricted to authorised users. Contact the BAS Web & Applications Team to request access.
See the BAS Terraform Remote State 🛡️ project for how these permissions to remote state are enforced.