Skip to content

feat: symlink overridden paths to canonical default locations on startup#147

Open
gipert wants to merge 2 commits into
mainfrom
feat/link-external-paths
Open

feat: symlink overridden paths to canonical default locations on startup#147
gipert wants to merge 2 commits into
mainfrom
feat/link-external-paths

Conversation

@gipert
Copy link
Copy Markdown
Member

@gipert gipert commented May 22, 2026

Summary

  • Adds link_external_paths() to legenddataflow/methods/paths.py, called in onstart via paths.link_external_paths(config, workflow.basedir, logger=logger)
  • When a paths.<key> in dataflow-config.yaml points outside the current production tree, a relative symlink is created at the canonical default location (e.g. generated/tier/dsp -> /other/prod/generated/tier/dsp) so the generated/ layout stays consistent regardless of overrides
  • Stale symlinks at default locations are cleaned up automatically when overrides are removed
  • Real directories at default locations are never modified
  • Mirrors the identical strategy in legend-simflows (legendsimflow.utils.link_external_paths)
  • Candidate keys: all leaf-level tier (tier_*, tier_raw_blind) and par (par_*) paths, plus plt and metadata; parent dirs tier and par are excluded to avoid parent/child symlink conflicts
  • Validity-file writes for par catalogs are skipped when the corresponding par_<tier> path is redirected outside the current production tree, to avoid writing into shared/read-only external directories

Test plan

  • Run workflow with all paths at defaults — no symlinks created, no errors
  • Override one path (e.g. tier_dsp) to an external directory — symlink appears at generated/tier/dsp
  • Remove the override — stale symlink is cleaned up on next run
  • Override a path to a non-existent directory — warning logged, broken symlink created
  • Override par_pht — validity file is not written to the external par directory

When paths.<key> in dataflow-config.yaml is set to an external location
(e.g. reusing tier_dsp from another production), link_external_paths()
now creates a relative symlink at the default generated/ location so the
production tree maintains a consistent layout. Mirrors the same strategy
already in place in legend-simflows.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 16.27907% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.60%. Comparing base (0b88ccf) to head (a2f490a).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
workflow/src/legenddataflow/methods/paths.py 16.27% 36 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
- Coverage   64.70%   61.60%   -3.10%     
==========================================
  Files           7        7              
  Lines         629      672      +43     
==========================================
+ Hits          407      414       +7     
- Misses        222      258      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gipert gipert requested a review from ggmarshall May 28, 2026 10:06
@gipert
Copy link
Copy Markdown
Member Author

gipert commented May 28, 2026

@ggmarshall i noticed that when par dirs are taken from outside the dataflow still creates a par dir with a validity file inside in the current cycle. why? is this needed?

@ggmarshall
Copy link
Copy Markdown
Collaborator

ggmarshall commented May 28, 2026

Looks like it is written but not used
e.g.

pht_par_cat_file = Path(paths.pars_path(config)) / "pht" / "validity.yaml"
    if pht_par_cat_file.is_file():
        pht_par_cat_file.unlink()
    try:
        Path(pht_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
        pht_par_catalog.write_to(pht_par_cat_file)
    except NameError:
        logger.warning("No pht parameter catalog found")

can add a check if in prod cycle and not write in that case

@gipert
Copy link
Copy Markdown
Member Author

gipert commented May 28, 2026

i can then see how to skip this? do you know where to look?

@ggmarshall
Copy link
Copy Markdown
Collaborator

main snakefile

@gipert
Copy link
Copy Markdown
Member Author

gipert commented May 29, 2026

i think the last commit is ugly, but i think there is no other way

@ggmarshall
Copy link
Copy Markdown
Collaborator

Can we not merge into a loop over ["dsp", "hit", 'psp", "pht"] so its a bit cleaner?

When par_<tier> is overridden to an external location, writing the
validity file there would pollute or fail on a shared/read-only directory.
Guard each write with a path comparison so it only runs for tiers whose
par path is within the current production tree.
@gipert gipert force-pushed the feat/link-external-paths branch from 4330835 to a2f490a Compare May 30, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants