Skip to content

Commit 40c390b

Browse files
authored
Merge pull request #250 from Daylily-Informatics/codex/dyec-fsx-dra-mounts
Implement DRA FSx mount lifecycle
2 parents ac12736 + d0a56e0 commit 40c390b

63 files changed

Lines changed: 7396 additions & 2536 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 101 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22

33
[![Latest release](https://img.shields.io/badge/dynamic/yaml?url=https%3A%2F%2Fraw.githubusercontent.com%2FDaylily-Informatics%2Fdaylily-ephemeral-cluster%2Fmain%2Fconfig%2Fdaylily_cli_global.yaml&query=%24.daylily.git_ephemeral_cluster_repo_release_tag&label=latest%20release&cacheSeconds=300&color=teal)](https://github.com/Daylily-Informatics/daylily-ephemeral-cluster/releases) [![Latest tag](https://img.shields.io/badge/dynamic/yaml?url=https%3A%2F%2Fraw.githubusercontent.com%2FDaylily-Informatics%2Fdaylily-ephemeral-cluster%2Fmain%2Fconfig%2Fdaylily_cli_global.yaml&query=%24.daylily.git_ephemeral_cluster_repo_tag&label=latest%20tag&color=pink&cacheSeconds=300)](https://github.com/Daylily-Informatics/daylily-ephemeral-cluster/tags)
44

5-
Daylily stands up a short-lived AWS ParallelCluster, finishes the headnode configuration after `pcluster` itself reports success, gives the operator a validated Session Manager login shell as `ubuntu`, stages laptop-side inputs into the FSx-backed data plane, launches the workflow repo in tmux, exports results back to the backing S3 repository, and then tears the cluster down when the run is complete.
5+
DayEC is the operator control plane for short-lived AWS ParallelCluster environments that run Daylily analysis workloads on FSx for Lustre. The current data plane is DRA-first: the cluster starts with reference data mounted at `/fsx/data`, run folders are attached only when needed under `/fsx/run_dir_mounts/<mount_id>`, workflow outputs stay under `/fsx/analysis_results/...`, and selected results are exported through a temporary `/fsx/exports/<export_id>` DRA to a chosen S3 analysis bucket.
66

7-
> The bucket is durable. The cluster is ephemeral. Export before delete.
7+
The cluster is ephemeral. S3 buckets are durable. Verify the export receipt before deleting the cluster.
88

99
## Supported Operator Contract
1010

11-
The supported path is:
11+
Use the checkout environment and the CLI, not historical helper-script paths:
1212

1313
1. `source ./activate`
14-
2. `daylily-ec preflight`
15-
3. `daylily-ec create`
16-
4. `daylily-ec headnode connect`
17-
5. `daylily-ec samples stage`
18-
6. `daylily-ec workflow launch`
19-
7. `daylily-ec export --target-uri analysis_results/ubuntu`
20-
8. `daylily-ec delete --dry-run`
21-
9. `daylily-ec delete`
22-
23-
Supported remote access is AWS Systems Manager Session Manager landing directly in the `ubuntu` login shell. The repo hard-checks the Session Manager document and the effective remote user before supported command payloads run.
24-
25-
> A cluster is not "ready" when CloudFormation or ParallelCluster first says the infrastructure exists. The supported readiness point is when `daylily-ec create` returns successfully after the post-create headnode configuration and bootstrap validation steps complete.
14+
2. `dyec preflight`
15+
3. `dyec create`
16+
4. `dyec headnode connect`
17+
5. `dyec samples stage` for sample-manifest inputs, or `dyec mounts create` for run-folder inputs
18+
6. `dyec workflow launch`
19+
7. copy selected outputs into `/fsx/exports/<export_id>/...`
20+
8. `dyec export --export-id <id> --source-path /exports/<id>/... --destination-s3-uri s3://...`
21+
9. inspect `fsx_export.yaml`
22+
10. `dyec delete --dry-run`
23+
11. `dyec delete`
24+
25+
`daylily-ec` and `dyec` are the same entrypoint. The shorter `dyec` form is used in examples.
2626

2727
## One Copy-Pasteable Lifecycle
2828

@@ -35,149 +35,145 @@ export REGION_AZ=us-west-2d
3535
export CLUSTER_NAME=day-demo-$(date +%Y%m%d%H%M%S)
3636
export DAY_EX_CFG="$HOME/.config/daylily/daylily_ephemeral_cluster.yaml"
3737
export REF_BUCKET=s3://lsmc-dayoa-omics-analysis-us-west-2
38+
export ANALYSIS_BUCKET=s3://lsmc-dayoa-analysis-results-us-west-2
3839
export ANALYSIS_SAMPLES=etc/analysis_samples_template.tsv
3940
export STAGE_CFG_DIR="$PWD/tmp-stage-config/$CLUSTER_NAME"
4041
export EXPORT_DIR="$PWD/tmp-export/$CLUSTER_NAME"
42+
export EXPORT_ID="${CLUSTER_NAME}-results"
43+
export EXPORT_S3_URI="$ANALYSIS_BUCKET/$EXPORT_ID/"
4144

42-
daylily-ec preflight \
45+
dyec preflight \
4346
--profile "$AWS_PROFILE" \
4447
--region-az "$REGION_AZ" \
4548
--config "$DAY_EX_CFG"
4649

47-
daylily-ec create \
50+
dyec create \
4851
--profile "$AWS_PROFILE" \
4952
--region-az "$REGION_AZ" \
5053
--config "$DAY_EX_CFG"
5154

52-
daylily-ec headnode connect \
55+
dyec headnode connect \
5356
--profile "$AWS_PROFILE" \
5457
--region "$REGION" \
5558
--cluster "$CLUSTER_NAME"
5659

57-
daylily-ec samples stage \
58-
"$ANALYSIS_SAMPLES" \
60+
dyec samples stage "$ANALYSIS_SAMPLES" \
5961
--profile "$AWS_PROFILE" \
6062
--region "$REGION" \
6163
--reference-bucket "$REF_BUCKET" \
6264
--config-dir "$STAGE_CFG_DIR"
6365

64-
# The manifest is row-oriented and multi-modality:
65-
# - legacy Illumina rows can still use R1_FQ/R2_FQ
66-
# - aligned inputs can be supplied directly through ULTIMA_CRAM, ONT_CRAM,
67-
# PB_BAM, ONT_BAM, or ROCHE_BAM columns
68-
# - ONT_FASTQ_PREFIX stages one S3 fastq_pass/<tag>/ prefix into ONT_R1_PATH,
69-
# with ONT_R2_PATH=na; set ONT_FLOWCELL_ID when the prefix has multiple flowcells
70-
# - hybrid units populate multiple source groups on one row
71-
# - add --run-metric-staging RUN_UID:PLATFORM:FOFN to copy run-level metric
72-
# sidecars under runs/<RUN_UID>/ in the same remote stage
73-
74-
# Use the "Remote FSx stage directory" printed by the staging helper.
75-
daylily-ec workflow launch \
66+
dyec workflow launch \
7667
--profile "$AWS_PROFILE" \
7768
--region "$REGION" \
7869
--cluster "$CLUSTER_NAME" \
7970
--stage-dir "/fsx/data/staged_sample_data/remote_stage_<timestamp>" \
8071
--destination "<analysis-run-id>" \
81-
--git-tag main \
82-
--aligners sent \
83-
--dedupers dmd \
84-
--snv-callers sentd
85-
86-
# Or use a catalog command to stage and launch in one CLI call.
87-
daylily-ec samples run \
88-
"$ANALYSIS_SAMPLES" \
89-
--command-id complete_genomics_mgi_snv_concordance \
72+
--git-tag 1.0.7
73+
74+
# For run-folder work, attach only the S3 prefix you need.
75+
dyec --json mounts create \
9076
--profile "$AWS_PROFILE" \
9177
--region "$REGION" \
9278
--cluster "$CLUSTER_NAME" \
93-
--reference-bucket "$REF_BUCKET" \
94-
--destination "<analysis-run-id>" \
95-
--dry-run
79+
--s3-uri "s3://sequencer-run-bucket/runs/RUN123/" \
80+
--mount-id RUN123 \
81+
--run-id RUN123 \
82+
--platform ILMN \
83+
--read-only \
84+
--wait
85+
86+
dyec --json mounts verify \
87+
--profile "$AWS_PROFILE" \
88+
--region "$REGION" \
89+
--cluster "$CLUSTER_NAME" \
90+
--mount-id RUN123
91+
92+
dyec workflow launch \
93+
--profile "$AWS_PROFILE" \
94+
--region "$REGION" \
95+
--cluster "$CLUSTER_NAME" \
96+
--run-context-file ./runs.tsv \
97+
--destination "<run-analysis-id>" \
98+
--git-tag 1.0.7 \
99+
--dy-command "bin/day_run produce_illumina_run_qc --config run_context_file=config/runs.tsv -p -j 5 -k"
100+
101+
# On the headnode, copy only the outputs you want to export:
102+
# mkdir -p /fsx/exports/$EXPORT_ID/analysis_results/ubuntu/
103+
# cp -a /fsx/analysis_results/ubuntu/<analysis-run-id>/ /fsx/exports/$EXPORT_ID/analysis_results/ubuntu/
96104

97-
daylily-ec export \
105+
dyec export \
98106
--profile "$AWS_PROFILE" \
99107
--region "$REGION" \
100-
--cluster-name "$CLUSTER_NAME" \
101-
--target-uri analysis_results/ubuntu \
108+
--cluster "$CLUSTER_NAME" \
109+
--export-id "$EXPORT_ID" \
110+
--source-path "/exports/$EXPORT_ID/analysis_results/ubuntu/" \
111+
--destination-s3-uri "$EXPORT_S3_URI" \
102112
--output-dir "$EXPORT_DIR"
103113

104114
cat "$EXPORT_DIR/fsx_export.yaml"
105115

106-
daylily-ec delete --dry-run \
116+
dyec delete --dry-run \
107117
--profile "$AWS_PROFILE" \
108118
--region "$REGION" \
109-
--cluster-name "$CLUSTER_NAME"
119+
--cluster "$CLUSTER_NAME"
110120

111-
daylily-ec delete \
121+
dyec delete \
112122
--profile "$AWS_PROFILE" \
113123
--region "$REGION" \
114-
--cluster-name "$CLUSTER_NAME"
124+
--cluster "$CLUSTER_NAME"
115125
```
116126

117-
`fsx_export.yaml` is the machine-readable export receipt. A successful run writes `status: success` and the resolved S3 destination.
118-
119127
## Architecture At A Glance
120128

121-
1. `daylily-ec` is the control-plane CLI, with `dyec` installed as a shorter alias for the same entrypoint. It handles AWS readiness validation, preflight, create, cluster inspection, export, delete, environment introspection, runtime checks, and pricing snapshots.
122-
2. The create flow renders the cluster configuration, calls ParallelCluster, then runs Daylily headnode configuration over Session Manager.
123-
3. The durable data plane is the S3 bucket plus the FSx for Lustre filesystem attached to the cluster. Laptop-side staging writes into the bucket-backed FSx namespace.
124-
4. The supported connect path is `daylily-ec headnode connect`, which opens Session Manager into the `ubuntu` login shell.
125-
5. Workflow launch happens from the operator machine through `daylily-ec workflow launch`, which creates a run directory at `/home/ubuntu/daylily-runs/<session>/`, writes `launch.sh`, `tmux.log`, and `status.json`, and starts the run inside tmux.
126-
6. Export uses the FSx data repository task API and writes `fsx_export.yaml` locally so the operator has a concrete export receipt before teardown.
127-
128-
## What This Repo Ships
129-
130-
- `environment.yaml` plus `pyproject.toml`: the `DAY-EC` environment contract
131-
- `activate`: checkout bootstrap that creates or repairs `DAY-EC`, installs the repo editable, and validates the local toolchain
132-
- `daylily-ec headnode connect`: interactive Session Manager shell launcher with `ubuntu`-only validation
133-
- `daylily-ec headnode configure`: explicit headnode configuration helper for repair or manual reruns
134-
- `daylily-ec headnode info`: full `pcluster describe-cluster` output for one cluster
135-
- `daylily-ec headnode jobs`: Slurm queue output using the same format as the headnode `sq` alias
136-
- `daylily-ec aws validate permissions|quotas|all`: read-only AWS readiness validation with optional admin gap reports
137-
- `daylily-ec cluster list/describe/wait`: ParallelCluster inspection helpers
138-
- `daylily-ec samples stage`: translator and staging helper that turns a multi-modality `analysis_samples.tsv` into workflow-ready `samples.tsv` and `units.tsv`, with optional run-metric sidecar staging
139-
- `daylily-ec workflow launch/status/logs`: remote launcher and run-state inspection helpers
140-
- `daylily-ec state list/show`: local state-file inspection helpers
141-
- `daylily_ec/ssh_to_ssm_e2e_runner.py`: AWS-backed end-to-end runner that exercises the supported lifecycle through the repo CLI/helpers
142-
- `bin/utils/ilmn/extract_undetermined_indexes`: Illumina Undetermined/Unclassified FASTQ index triage utility for ranking or splitting observed dual-index pairs from local paths, S3 URIs, or presigned URLs
129+
```mermaid
130+
flowchart LR
131+
Ref["S3 reference bucket /data/"] -->|reference-data DRA| Data["/fsx/data"]
132+
Run["S3 run prefix"] -->|ephemeral run DRA| Mount["/fsx/run_dir_mounts/<mount_id>"]
133+
Data --> Workflow["DayOA workflow"]
134+
Mount --> Workflow
135+
Workflow --> Results["/fsx/analysis_results/..."]
136+
Results --> Copy["copy selected outputs"]
137+
Copy --> Export["/fsx/exports/<export_id>"]
138+
Export -->|EXPORT_TO_REPOSITORY| Analysis["S3 analysis bucket/prefix"]
139+
```
143140

144-
## AWS And Local Prerequisites
141+
Key rules:
145142

146-
At minimum, the operator account needs:
143+
- `/fsx/data` is the reference-data DRA created with the cluster.
144+
- `/fsx/run_dir_mounts/<mount_id>` is for read-oriented run inputs and is not an export source.
145+
- `/fsx/analysis_results/...` is where workflow checkouts and outputs live.
146+
- `/fsx/exports/<export_id>` is the temporary export namespace for selected outputs.
147+
- `fsx_export.yaml` is the export receipt to keep before teardown.
147148

148-
- a working named AWS profile
149-
- permission for STS identity lookup, IAM inspection/bootstrap, Service Quotas reads, S3 bucket discovery/access, EC2/VPC inspection, FSx, SSM, and ParallelCluster operations
150-
- a reference bucket in the target region that will back the cluster FSx filesystem
151-
- Session Manager document `SSM-SessionManagerRunShell` configured to run shell sessions as `ubuntu` in `/home/ubuntu` and source a login shell
152-
- enough regional quota for the requested cluster shape
149+
## Pipeline Catalog
153150

154-
Local toolchain for the supported path:
151+
`config/daylily_available_repositories.yaml` is the source of truth for repositories and blessed launch profiles. The packaged copy under `daylily_ec/resources/payload/config/` must match it.
155152

156-
- Conda
157-
- `daylily-ec` or its short alias `dyec`
158-
- `aws`
159-
- `pcluster`
160-
- `session-manager-plugin`
161-
- `jq`, `yq`, `rclone`, `node`, and the rest of the `DAY-EC` Conda layer
153+
The current DayOA pin is `1.0.7` for the repository default and every DayOA command. Catalog v2 separates:
162154

163-
If any of this is missing, cluster creation will fail in annoying ways. Run `daylily-ec aws validate all --profile "$AWS_PROFILE" --region-az "$REGION_AZ" --gap-analysis aws_gap.md` before account handoff, then run `daylily-ec preflight` before create.
155+
- `sample_analysis`: uses `analysis_samples.tsv`, stages inputs, and writes `samples.tsv` / `units.tsv`.
156+
- `run_analysis`: uses `runs.tsv`, requires a run DRA, and launches run-folder workflows such as Illumina run QC and BCL Convert.
164157

165-
## Cost, Time, And Failure Notes
158+
## What This Repo Ships
166159

167-
- `daylily-ec create` can take a long time. The ParallelCluster build alone can take tens of minutes, and Daylily still has headnode bootstrap work to finish after that.
168-
- The cluster is disposable; the export target is not. Do not delete until you have checked `fsx_export.yaml`.
169-
- The supported remote user is `ubuntu`. Any path that would land you as another user is a defect, not a supported fallback.
170-
- Session Manager misconfiguration is a hard stop. The repo does not tell operators to connect first and then switch users manually.
160+
- `source ./activate`: creates or repairs the `DAY-EC` environment and installs the checkout editable
161+
- `dyec` / `daylily-ec`: preflight, create, headnode, sample, workflow, mount, export, delete, state, repository, pricing, and AWS validation commands
162+
- DRA-backed ParallelCluster templates under `config/day_cluster/`
163+
- packaged resources under `daylily_ec/resources/payload/`
164+
- `day-clone` for headnode repository checkouts
165+
- tests that guard the catalog, packaged resources, SSM behavior, DRA mounts, export receipts, and environment contract
171166

172167
## Read This Next
173168

174-
- [docs/ultra_rapid_start.md](docs/ultra_rapid_start.md): the shortest happy path
175-
- [docs/quickest_start.md](docs/quickest_start.md): a guided walkthrough with sanity checks
176-
- [docs/operations.md](docs/operations.md): connect, stage, run, monitor, export, and delete
177-
- [docs/aws_setup.md](docs/aws_setup.md): AWS prerequisites, IAM expectations, quotas, and Session Manager requirements
178-
- [docs/cli_reference.md](docs/cli_reference.md): command reference grounded in current `--help` output
179-
- [docs/testing_and_debugging.md](docs/testing_and_debugging.md): test commands, E2E runner usage, and failure triage
180-
- [docs/monitoring_and_troubleshooting.md](docs/monitoring_and_troubleshooting.md): runtime and operational debugging
181-
- [docs/DAY_EC_ENVIRONMENT.md](docs/DAY_EC_ENVIRONMENT.md): `DAY-EC` checkout environment contract
182-
- [docs/pip_install.md](docs/pip_install.md): pip-install path and external prerequisites
183-
- [docs/archive/README.md](docs/archive/README.md): historical material, pre-rewrite snapshot, and unsupported legacy appendix
169+
- [docs/dra_fsx_strategy.md](docs/dra_fsx_strategy.md): current DRA-enabled FSx strategy and diagrams
170+
- [docs/ultra_rapid_start.md](docs/ultra_rapid_start.md): shortest current run path
171+
- [docs/quickest_start.md](docs/quickest_start.md): guided walkthrough with checks
172+
- [docs/operations.md](docs/operations.md): day-2 operations
173+
- [docs/cli_reference.md](docs/cli_reference.md): command reference
174+
- [docs/aws_setup.md](docs/aws_setup.md): AWS prerequisites
175+
- [docs/monitoring_and_troubleshooting.md](docs/monitoring_and_troubleshooting.md): failure triage
176+
- [docs/testing_and_debugging.md](docs/testing_and_debugging.md): local and AWS-backed validation
177+
- [docs/DAY_EC_ENVIRONMENT.md](docs/DAY_EC_ENVIRONMENT.md): environment contract
178+
- [docs/pip_install.md](docs/pip_install.md): pip install path
179+
- [docs/archive/README.md](docs/archive/README.md): historical material only

0 commit comments

Comments
 (0)