Skip to content

Commit f13b7ee

Browse files
committed
Add cloud report portal proof
1 parent 3feede6 commit f13b7ee

10 files changed

Lines changed: 435 additions & 0 deletions

File tree

.github/workflows/ci.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ jobs:
1717
distribution: temurin
1818
java-version: 21
1919

20+
- uses: actions/setup-python@v5
21+
with:
22+
python-version: '3.12'
23+
2024
- name: Install Nextflow
2125
run: |
2226
curl -s https://get.nextflow.io | bash
@@ -43,3 +47,9 @@ jobs:
4347
4448
- name: Verify outputs
4549
run: python scripts/validate_outputs.py results
50+
51+
- name: Test report portal
52+
run: |
53+
python -m pip install --upgrade pip
54+
pip install -r cloud/report-portal/requirements.txt
55+
pytest cloud/report-portal/tests

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ results/
66
data/
77
genome/
88
*.sra
9+
*.db
10+
*.sqlite
911
__pycache__/
12+
*.egg-info/
13+
.pytest_cache/
14+
.venv/
1015
*.pyc
1116
.DS_Store

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Designed around the [Himes et al. (2014)](https://doi.org/10.1371/journal.pone.0
1313

1414
- Full synthetic smoke test in GitHub Actions, including containerised FastQC, fastp, HISAT2, samtools, featureCounts, DESeq2 and MultiQC.
1515
- Docker, Singularity and AWS Batch profiles in `nextflow.config`.
16+
- Containerised FastAPI report portal under `cloud/report-portal/` for S3-hosted reports and Postgres run metadata.
1617
- `nextflow_schema.json` for parameter discovery in Seqera Platform and other launch tooling.
1718
- Nextflow execution report, timeline, trace and DAG written to `results/pipeline_info/` on every run.
1819
- `scripts/validate_outputs.py` checks count matrices, DESeq2 output, plots, MultiQC and run metadata in CI.
@@ -122,6 +123,16 @@ nextflow run Ekin-Kahraman/rnaseq-nextflow-pipeline \
122123
--outdir s3://my-rnaseq-bucket/results/airway
123124
```
124125

126+
### Report portal
127+
128+
The optional [cloud report portal](cloud/report-portal/) registers cloud runs and returns signed S3 URLs for Nextflow reports, timelines, traces, DAGs and MultiQC output. It is a small FastAPI service backed by Postgres in production and SQLite for local testing.
129+
130+
```bash
131+
cd cloud/report-portal
132+
pip install -r requirements.txt
133+
uvicorn app.main:app --reload --port 8000
134+
```
135+
125136
## Parameters
126137

127138
| Parameter | Default | Description |
@@ -157,6 +168,7 @@ results/
157168
- **BioContainers** — published containers from the Bioconda ecosystem. No custom Dockerfiles to maintain.
158169
- **Docker and Singularity**`-profile docker` for local, `-profile singularity` for HPC where Docker is typically unavailable.
159170
- **AWS Batch profile**`-profile awsbatch` runs the same containerised workflow on managed cloud compute with S3 work and output paths.
171+
- **Report portal separated from compute** — Nextflow stays responsible for execution; the FastAPI portal only stores run metadata and signs S3 artefact links, which keeps the cloud proof small and auditable.
160172
- **Run metadata by default** — Nextflow report, timeline, trace and DAG are emitted on every run so failures and performance can be audited after the fact.
161173
- **Reverse-stranded default**`--strandedness 2` because the airway dataset (and most modern Illumina dUTP protocols) produces reverse-stranded libraries. Users with older unstranded preps should set `--strandedness 0`.
162174
- **Configurable contrast**`--ref_condition` sets the DESeq2 reference level. Defaults to "untreated" for the airway dataset.

cloud/report-portal/Dockerfile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
FROM python:3.12-slim
2+
3+
ENV PYTHONDONTWRITEBYTECODE=1 \
4+
PYTHONUNBUFFERED=1
5+
6+
WORKDIR /app
7+
8+
COPY requirements.txt .
9+
RUN pip install --no-cache-dir -r requirements.txt
10+
11+
COPY app ./app
12+
13+
EXPOSE 8000
14+
15+
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

cloud/report-portal/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# RNA-seq Report Portal
2+
3+
Small FastAPI service for registering cloud RNA-seq runs and serving signed links to reports stored in S3.
4+
5+
This is intentionally separate from the Nextflow pipeline. The pipeline remains responsible for compute and published artefacts; the portal gives reviewers and collaborators a minimal cloud-facing surface for run status and report access.
6+
7+
## Architecture
8+
9+
```text
10+
Nextflow AWS Batch run
11+
-> s3://bucket/results/<run>/
12+
-> pipeline_info/report.html
13+
-> pipeline_info/timeline.html
14+
-> pipeline_info/trace.txt
15+
-> pipeline_info/dag.dot
16+
-> multiqc/multiqc_report.html
17+
-> POST /runs into this service
18+
-> collaborators request signed report URLs from /runs/{id}/artifacts/{artifact}/presign
19+
```
20+
21+
## Configuration
22+
23+
| Variable | Example | Purpose |
24+
| --- | --- | --- |
25+
| `DATABASE_URL` | `postgresql+psycopg://rnaseq:change_me@db:5432/rnaseq` | Metadata database. Defaults to local SQLite for development. |
26+
| `AWS_REGION` | `eu-west-2` | Region used by the AWS SDK. |
27+
| AWS credentials | IAM role, env vars, or workload identity | Required only for signed S3 URLs. |
28+
29+
## Local Smoke Run
30+
31+
```bash
32+
python -m venv .venv
33+
source .venv/bin/activate
34+
pip install -r requirements.txt
35+
uvicorn app.main:app --reload --port 8000
36+
```
37+
38+
Register a completed run:
39+
40+
```bash
41+
curl -X POST http://localhost:8000/runs \
42+
-H 'content-type: application/json' \
43+
-d '{
44+
"run_id": "airway-test-001",
45+
"name": "Synthetic CI airway test",
46+
"status": "succeeded",
47+
"s3_prefix": "s3://my-rnaseq-bucket/results/airway-test-001"
48+
}'
49+
```
50+
51+
Get a signed report URL:
52+
53+
```bash
54+
curl http://localhost:8000/runs/airway-test-001/artifacts/report/presign
55+
```
56+
57+
## Container
58+
59+
```bash
60+
docker build -t rnaseq-report-portal .
61+
docker run --rm -p 8000:8000 \
62+
-e DATABASE_URL=postgresql+psycopg://rnaseq:change_me@host.docker.internal:5432/rnaseq \
63+
rnaseq-report-portal
64+
```
65+
66+
## Tests
67+
68+
```bash
69+
pip install -r requirements.txt
70+
pytest tests
71+
```
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""FastAPI report portal for RNA-seq pipeline runs."""

cloud/report-portal/app/main.py

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
from __future__ import annotations
2+
3+
import os
4+
from datetime import datetime, timezone
5+
from typing import Literal
6+
from urllib.parse import urlparse
7+
from uuid import uuid4
8+
9+
from fastapi import Depends, FastAPI, HTTPException, Query, status
10+
from pydantic import BaseModel, Field
11+
from sqlalchemy import Column, DateTime, String, create_engine, select, text
12+
from sqlalchemy.orm import Session, declarative_base, sessionmaker
13+
14+
15+
DEFAULT_DATABASE_URL = "sqlite:///./runs.db"
16+
RunStatus = Literal["submitted", "running", "succeeded", "failed", "unknown"]
17+
ArtifactName = Literal["report", "timeline", "trace", "dag", "multiqc"]
18+
19+
Base = declarative_base()
20+
21+
22+
def utc_now() -> datetime:
23+
return datetime.now(timezone.utc)
24+
25+
26+
class RunModel(Base):
27+
__tablename__ = "runs"
28+
29+
id = Column(String, primary_key=True)
30+
name = Column(String, nullable=False)
31+
status = Column(String, nullable=False, default="submitted")
32+
s3_prefix = Column(String, nullable=True)
33+
nextflow_report = Column(String, nullable=False, default="pipeline_info/report.html")
34+
timeline = Column(String, nullable=False, default="pipeline_info/timeline.html")
35+
trace = Column(String, nullable=False, default="pipeline_info/trace.txt")
36+
dag = Column(String, nullable=False, default="pipeline_info/dag.dot")
37+
multiqc_report = Column(String, nullable=False, default="multiqc/multiqc_report.html")
38+
created_at = Column(DateTime(timezone=True), nullable=False, default=utc_now)
39+
updated_at = Column(DateTime(timezone=True), nullable=False, default=utc_now, onupdate=utc_now)
40+
41+
42+
class RunCreate(BaseModel):
43+
run_id: str | None = Field(default=None, description="Stable external run id. Defaults to a UUID.")
44+
name: str = Field(min_length=1)
45+
status: RunStatus = "submitted"
46+
s3_prefix: str | None = Field(default=None, description="S3 prefix containing published Nextflow results.")
47+
nextflow_report: str = "pipeline_info/report.html"
48+
timeline: str = "pipeline_info/timeline.html"
49+
trace: str = "pipeline_info/trace.txt"
50+
dag: str = "pipeline_info/dag.dot"
51+
multiqc_report: str = "multiqc/multiqc_report.html"
52+
53+
54+
class RunUpdate(BaseModel):
55+
name: str | None = Field(default=None, min_length=1)
56+
status: RunStatus | None = None
57+
s3_prefix: str | None = None
58+
nextflow_report: str | None = None
59+
timeline: str | None = None
60+
trace: str | None = None
61+
dag: str | None = None
62+
multiqc_report: str | None = None
63+
64+
65+
class RunOut(BaseModel):
66+
id: str
67+
name: str
68+
status: str
69+
s3_prefix: str | None
70+
nextflow_report: str
71+
timeline: str
72+
trace: str
73+
dag: str
74+
multiqc_report: str
75+
created_at: datetime
76+
updated_at: datetime
77+
78+
model_config = {"from_attributes": True}
79+
80+
81+
def model_data(model: BaseModel, *, exclude_unset: bool = False) -> dict:
82+
if hasattr(model, "model_dump"):
83+
return model.model_dump(exclude_unset=exclude_unset)
84+
return model.dict(exclude_unset=exclude_unset)
85+
86+
87+
def engine_options(database_url: str) -> dict:
88+
if database_url.startswith("sqlite"):
89+
return {"connect_args": {"check_same_thread": False}}
90+
return {"pool_pre_ping": True}
91+
92+
93+
def parse_s3_uri(uri: str) -> tuple[str, str]:
94+
parsed = urlparse(uri)
95+
if parsed.scheme != "s3" or not parsed.netloc or not parsed.path.strip("/"):
96+
raise HTTPException(status_code=400, detail=f"Invalid S3 URI: {uri}")
97+
return parsed.netloc, parsed.path.lstrip("/")
98+
99+
100+
def resolve_artifact_uri(run: RunModel, artifact: ArtifactName) -> str:
101+
attr = "nextflow_report" if artifact == "report" else "multiqc_report" if artifact == "multiqc" else artifact
102+
value = getattr(run, attr)
103+
if value.startswith("s3://"):
104+
return value
105+
if not run.s3_prefix:
106+
raise HTTPException(status_code=404, detail="Run has no s3_prefix for relative artefact paths")
107+
return f"{run.s3_prefix.rstrip('/')}/{value.lstrip('/')}"
108+
109+
110+
def get_s3_client():
111+
import boto3
112+
113+
return boto3.client("s3", region_name=os.getenv("AWS_REGION"))
114+
115+
116+
def create_app(database_url: str | None = None) -> FastAPI:
117+
db_url = database_url or os.getenv("DATABASE_URL", DEFAULT_DATABASE_URL)
118+
engine = create_engine(db_url, **engine_options(db_url))
119+
session_local = sessionmaker(bind=engine, autoflush=False, autocommit=False)
120+
Base.metadata.create_all(bind=engine)
121+
122+
app = FastAPI(
123+
title="RNA-seq Report Portal",
124+
version="0.1.0",
125+
description="Registers Nextflow RNA-seq runs and signs S3 report artefact URLs.",
126+
)
127+
128+
def get_db():
129+
db = session_local()
130+
try:
131+
yield db
132+
finally:
133+
db.close()
134+
135+
@app.get("/health")
136+
def health(db: Session = Depends(get_db)) -> dict:
137+
db.execute(text("SELECT 1"))
138+
return {"status": "ok"}
139+
140+
@app.post("/runs", response_model=RunOut, status_code=status.HTTP_201_CREATED)
141+
def create_run(payload: RunCreate, db: Session = Depends(get_db)) -> RunModel:
142+
run_id = payload.run_id or str(uuid4())
143+
if db.get(RunModel, run_id):
144+
raise HTTPException(status_code=409, detail=f"Run already exists: {run_id}")
145+
data = model_data(payload)
146+
data.pop("run_id", None)
147+
run = RunModel(id=run_id, **data)
148+
db.add(run)
149+
db.commit()
150+
db.refresh(run)
151+
return run
152+
153+
@app.get("/runs", response_model=list[RunOut])
154+
def list_runs(status_filter: RunStatus | None = Query(default=None, alias="status"), db: Session = Depends(get_db)):
155+
stmt = select(RunModel).order_by(RunModel.created_at.desc())
156+
if status_filter:
157+
stmt = stmt.where(RunModel.status == status_filter)
158+
return list(db.scalars(stmt))
159+
160+
@app.get("/runs/{run_id}", response_model=RunOut)
161+
def get_run(run_id: str, db: Session = Depends(get_db)) -> RunModel:
162+
run = db.get(RunModel, run_id)
163+
if not run:
164+
raise HTTPException(status_code=404, detail=f"Unknown run: {run_id}")
165+
return run
166+
167+
@app.patch("/runs/{run_id}", response_model=RunOut)
168+
def update_run(run_id: str, payload: RunUpdate, db: Session = Depends(get_db)) -> RunModel:
169+
run = db.get(RunModel, run_id)
170+
if not run:
171+
raise HTTPException(status_code=404, detail=f"Unknown run: {run_id}")
172+
for key, value in model_data(payload, exclude_unset=True).items():
173+
setattr(run, key, value)
174+
run.updated_at = utc_now()
175+
db.add(run)
176+
db.commit()
177+
db.refresh(run)
178+
return run
179+
180+
@app.get("/runs/{run_id}/artifacts/{artifact}/presign")
181+
def presign_artifact(
182+
run_id: str,
183+
artifact: ArtifactName,
184+
expires: int = Query(default=3600, ge=60, le=604800),
185+
db: Session = Depends(get_db),
186+
s3_client=Depends(get_s3_client),
187+
) -> dict:
188+
run = db.get(RunModel, run_id)
189+
if not run:
190+
raise HTTPException(status_code=404, detail=f"Unknown run: {run_id}")
191+
s3_uri = resolve_artifact_uri(run, artifact)
192+
bucket, key = parse_s3_uri(s3_uri)
193+
url = s3_client.generate_presigned_url(
194+
"get_object",
195+
Params={"Bucket": bucket, "Key": key},
196+
ExpiresIn=expires,
197+
)
198+
return {"artifact": artifact, "s3_uri": s3_uri, "url": url, "expires_in": expires}
199+
200+
return app
201+
202+
203+
app = create_app()
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
fastapi>=0.110
2+
uvicorn[standard]>=0.27
3+
sqlalchemy>=2.0
4+
psycopg[binary]>=3.1
5+
boto3>=1.34
6+
pytest>=8.0
7+
httpx>=0.27

0 commit comments

Comments
 (0)