Skip to content

Conversation

@marc-white
Copy link
Contributor

Closes #363 .

This PR is my first pass at trying to extract out as much of the Gadi dependency as I can from the main source code. I've also included a quick write-up of what I think would be required to get the code working on a different HPC system.

Doubtless I've missed something, so I'm marking this as a draft for others to look over.

@codecov
Copy link

codecov bot commented Mar 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (e47cf40) to head (27c4563).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #371   +/-   ##
=======================================
  Coverage   99.03%   99.04%           
=======================================
  Files          15       15           
  Lines        1353     1361    +8     
=======================================
+ Hits         1340     1348    +8     
  Misses         13       13           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@charles-turner-1 charles-turner-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, somehow never got round to reviewing this.

I think we can use environment variables to make quite a bit more straightforward to port. Otherwise looks good!


__version__ = _version.get_versions()["version"]

CATALOG_LOCATION = "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a trick we use for managing config files which is common in Django/Web Development which might be relevant/useful here:

import os

CATALOG_LOCATION = os.environ.get("CATALOG_LOCATION", "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml")

Basically, it lets us override the default with an environment variable, if found, falling back to the hardcoded default. It strikes me that it might be useful here, although we'd probably want to prefix all the environment variable names as a namespacing strategy, eg.:

CATALOG_LOCATION = os.environ.get(
    "ACCESS_NRI_INTAKE_CATALOG_LOCATION", "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml"
)

CATALOG_LOCATION = "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml"
"""Location for 'live' master catalog YAML."""

USER_CATALOG_LOCATION = str(Path.home() / ".access_nri_intake_catalog/catalog.yaml")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
USER_CATALOG_LOCATION = str(Path.home() / ".access_nri_intake_catalog/catalog.yaml")
USER_CATALOG_LOCATION = str(Path.home() / ".access_nri_intake_catalog/catalog.yaml")
USER_CATALOG_LOCATION = os.environ.get(
"ACCESS_NRI_INTAKE_USER_CAT_LOCATION", str(Path.home() / ".access_nri_intake_catalog/catalog.yaml")
)

USER_CATALOG_LOCATION = str(Path.home() / ".access_nri_intake_catalog/catalog.yaml")
"""Location where user can place a master catalog YAML to override standard 'live' version."""

STORAGE_ROOT = "/g/data"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STORAGE_ROOT = "/g/data"
STORAGE_ROOT = os.environ.get(
"ACCESS_NRI_INTAKE_STORAGE_ROOT", "/g/data"
)

STORAGE_ROOT = "/g/data"
"""Root storage location for catalog experiments"""

STORAGE_FLAG_PATTERN = r"gdata/[a-z]{1,2}[0-9]{1,2}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STORAGE_FLAG_PATTERN = r"gdata/[a-z]{1,2}[0-9]{1,2}"
STORAGE_FLAG_PATTERN = = os.environ.get(
"ACCESS_NRI_INTAKE_STORAGE_FLAG_PATTERN", r"gdata/[a-z]{1,2}[0-9]{1,2}"
)

STORAGE_FLAG_PATTERN = r"gdata/[a-z]{1,2}[0-9]{1,2}"
"""Pattern for matching 'storage flags' - related to Gadi file access system"""

STORAGE_LOCATION_REGEX = r"^/g/data/(?P<proj>[a-z]{1,2}[0-9]{1,2})/.*?$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STORAGE_LOCATION_REGEX = r"^/g/data/(?P<proj>[a-z]{1,2}[0-9]{1,2})/.*?$"
STORAGE_LOCATION_REGEX = os.environ.get(
"ACCESS_NRI_INTAKE_STORAGE_LOCATION_REGEX", r"^/g/data/(?P<proj>[a-z]{1,2}[0-9]{1,2})/.*?$"

contents of :code:`config/metadata-sources` are archival copies of live experiment :ref:`metadata`;
you will not need to replace these on your system.)

3. The command-line scripts in :code:`bin/` contain PBS commands and file paths specific to Gadi. You will need
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should then be able to just update all this to reflect that the defaults can be overridden with environment variables.

@charles-turner-1
Copy link
Collaborator

As noted by @marc-white in #437, this PR will also require decoupling the hard requirement on the telemetry package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Portability

3 participants