idea: Integrated Project Triage Dashboard: Jira + Synapse + GitHub

Created with AI assistance from Claude

Prompt: Draft a new issue with the same format. For this new issue, we want to integrate data from Jira (specifically, the NF-OSI project service tickets), Synapse.org (to check if data has been uploaded, a validation schema besides the default schema has been bound, and if the data annotations are passing validation), and GitHub (specifically, the dcc-site repository to check data sharing plans available in this folder https://github.com/nf-osi/dcc-site/tree/main/json/dsp). The goal is to integrate the data to compile a table of projects to triage which ones need work and how urgently. This is important because project deadlines can change based on conversations in Jira but do not get updated in GitHub or on Synapse. This will be an improvement on an existing tool known as the "risk-report-manager". [provided README for nf-osi/risk-report-manager GitHub repository]

## Problem
Project deadlines and priorities are managed across three disconnected systems—**Jira service tickets** (NF-OSI project), **Synapse.org** (data upload status, validation schemas, annotation compliance), and **GitHub** (data sharing plans in `dcc-site` repository)—but none of them stay in sync. Deadlines negotiated in Jira conversations don't propagate to GitHub or Synapse metadata. Data managers must manually cross-reference all three platforms to triage which projects need urgent attention, leading to missed deadlines, duplicate work, and stale risk assessments in the existing `risk-report-manager` tool.

## Goal
Build an **integrated triage dashboard** that:
1. **Fetches real-time data** from Jira (service tickets, updated deadlines, SLA status), Synapse (upload status, schema binding, annotation validation), and GitHub (data sharing plans).
2. **Joins data** across platforms using project identifiers (Synapse ID, Jira ticket key, GitHub file mapping).
3. **Generates a unified triage table** ranking projects by urgency, completeness, and blockers.
4. **Replaces/enhances** the existing `risk-report-manager` with a single source of truth that reflects live conversations and technical status.

## User stories
- **Data manager**: "Which projects have overdue Jira tickets AND missing Synapse data?" → dashboard shows top 10 projects sorted by combined urgency score, with direct links to Jira, Synapse, and GitHub.
- **Program coordinator**: "A PI just renegotiated their deadline in Jira—does the triage table reflect this?" → yes, the next automated run (or manual refresh) pulls updated due dates from Jira and re-ranks projects.
- **Compliance officer**: "Show me all projects where data is uploaded but failing validation" → filter to `data_uploaded=True` AND `validation_passing=False`, export as CSV.
- **Curator**: "Flag projects missing a data sharing plan in GitHub" → dashboard highlights projects without a corresponding JSON file in `dcc-site/json/dsp/`.

## Scope (MVP)

### 1. Data sources & fields

| Source | What we fetch | Key fields |
|--------|---------------|------------|
| **Jira (NF-OSI project)** | Service tickets for data management | `key`, `summary`, `status`, `priority`, `duedate`, `updated`, `assignee`, `customfield_synapse_id`, `resolution`, `SLA` metrics (if available) |
| **Synapse.org** | Project metadata, file upload status, schema/annotation compliance | `id` (Synapse project ID), `projectStatus`, `dataStatus`, `fileCount`, `hasCustomSchema` (boolean: non-default schema bound?), `annotationValidation` (passing/failing/not run), `lastModified` |
| **GitHub (dcc-site)** | Data sharing plan (DSP) JSON files | Filename → Synapse ID mapping (parse `dcc-site/json/dsp/*.json`), `dsp_exists` (boolean), `dsp_last_updated` |

### 2. Data integration logic

**Join key hierarchy:**
1. **Primary**: `Synapse ID` (links all three systems)
2. **Fallback**: Project name fuzzy matching (when Synapse ID missing in Jira or GitHub)

**Integration steps:**
1. Fetch Jira tickets where `customfield_synapse_id` is populated OR project name contains "NF-" pattern.
2. For each Synapse ID, query Synapse API for project annotations, file count, schema binding, and validation status.
3. Check if `{synapse_id}.json` exists in `dcc-site/json/dsp/` (clone repo or use GitHub API).
4. Join all three datasets into a single table keyed by Synapse ID.

### 3. Triage scoring & prioritization

**Urgency score** (0–100, higher = more urgent):

```
score = 0

# Jira factors (40 points max)
if jira_status in ["In Progress", "Reopened"]: score += 10
if jira_priority == "Highest": score += 15
elif jira_priority == "High": score += 10
if days_until_jira_due < 0: score += 15  # overdue
elif days_until_jira_due < 30: score += 10  # due soon
if jira_sla_breached: score += 5

# Synapse factors (40 points max)
if synapse_data_status in ["Data Pending", "Under Embargo"]: score += 5
if synapse_file_count == 0 and synapse_project_status == "Active": score += 15  # active but no data
if not has_custom_schema and synapse_file_count > 0: score += 10  # data but no schema
if annotation_validation == "Failing": score += 10

# GitHub factors (20 points max)
if not dsp_exists: score += 15  # missing data sharing plan
if dsp_last_updated > 365 days ago: score += 5  # stale DSP
```

**Risk categories** (similar to existing `risk-report-manager`, but enhanced):
- **Critical**: Overdue Jira + no Synapse data + no DSP
- **High**: Jira due soon + data uploaded but failing validation
- **Medium**: Missing custom schema OR stale DSP
- **Low**: All green, monitoring only

### 4. Output: Triage table

**Columns:**
- `Synapse ID` (link)
- `Project Name`
- `Funder` (CTF/GFF/NTAP/Unknown, from Jira labels or Synapse annotations)
- `Jira Ticket` (link, status, priority, due date)
- `Jira Status` (Open, In Progress, Resolved, etc.)
- `Days Until Jira Due` (negative = overdue)
- `Synapse Status` (Active, Completed, etc.)
- `Data Uploaded?` (boolean, based on file count)
- `Custom Schema Bound?` (boolean)
- `Annotations Passing?` (Pass/Fail/Not Run)
- `DSP Exists?` (boolean, link to GitHub file)
- `DSP Last Updated` (date)
- `Urgency Score` (0–100)
- `Risk Category` (Critical/High/Medium/Low)
- `Action Items` (auto-generated list, e.g., "Upload data", "Fix annotations", "Create DSP")

**Sorting:** By urgency score (descending), then by Jira due date (ascending).

**Filtering:** By funder, risk category, Jira status, Synapse status, any boolean field.

**Export:** CSV, JSON, Markdown table.

### 5. Architecture

**Pipeline:**
```
┌─────────────────┐
│  Jira API       │──┐
│  (REST/JQL)     │  │
└─────────────────┘  │
                     ├──► Integration Script ──► Scoring Engine ──► Triage Table
┌─────────────────┐  │          (Python)              (scoring.py)      (CSV/MD/JSON)
│  Synapse API    │──┤
│  (Python client)│  │
└─────────────────┘  │
                     │
┌─────────────────┐  │
│  GitHub API     │──┘
│  (dcc-site repo)│
└─────────────────┘
```

**Key modules:**
- `fetch_jira_data.py`: Query NF-OSI Jira project, export tickets with Synapse IDs.
- `fetch_synapse_data.py`: For each Synapse ID, fetch annotations, file count, schema, validation status (reuse/refactor from `synapse_validator.py`).
- `fetch_github_dsp.py`: Clone or query `dcc-site` repo, build map of Synapse ID → DSP metadata.
- `integrate_data.py`: Join all three sources by Synapse ID; handle missing/mismatched IDs.
- `calculate_urgency.py`: Apply scoring rules, assign risk categories.
- `generate_triage_table.py`: Output sorted table(s), split by funder if needed.
- `run_triage_dashboard.sh`: Orchestrate all steps (replaces/extends `run_risk_report.sh`).

**Scheduling:**
- **Daily**: Automated run (cron/GitHub Actions) at 6 AM ET.
- **On-demand**: Manual refresh via CLI or web UI.

### 6. Discrepancy detection & alerts

**Auto-detect mismatches:**
- Jira due date ≠ Synapse `embargoEndDate` annotation (flag for manual review).
- Jira status = "Resolved" but Synapse data status = "Data Pending" (data manager needs to upload).
- Synapse project status = "Completed" but Jira ticket still open (close ticket).
- DSP exists in GitHub but Synapse project has no `dataContributionStatus` annotation (sync issue).

**Alert rules:**
- If urgency score ≥ 80: Slack/email notification to assigned data manager.
- If Jira ticket overdue by >30 days AND no Synapse activity: Escalate to PI/program manager.

### 7. UI/Output formats

**Primary output:** Markdown tables (like existing `risk_report_*.md` files).

**Enhanced outputs:**
- **Interactive HTML dashboard** (optional, using Datatables.js or similar):
  - Sortable, filterable columns.
  - Click row → expands to show detailed action items + links.
- **Synapse table**: Upload triage table to a Synapse project for sharing with non-technical users.
- **API endpoint** (future): REST API to query triage status programmatically.

## Acceptance criteria

1. ✅ Integration script successfully fetches data from all three sources (Jira, Synapse, GitHub) for ≥50 projects.
2. ✅ ≥90% of Synapse IDs from Jira tickets are successfully matched to Synapse projects.
3. ✅ Triage table includes all required columns with no null values for critical fields (Synapse ID, Jira ticket, urgency score).
4. ✅ Urgency scoring correctly ranks projects: at least 3 "Critical" projects appear in top 10 when manually verified against source data.
5. ✅ Discrepancy detection flags ≥5 real mismatches (e.g., Jira resolved but Synapse data missing) in test run.
6. ✅ Generated table is filterable by funder (CTF/GFF/NTAP) and exportable as CSV.
7. ✅ Automated run completes in ≤5 minutes for ~100 projects.
8. ✅ Dashboard detects when Jira due date changes (simulate by updating a test ticket) and re-ranks project in next run.
9. ✅ Action items list is auto-generated for ≥80% of projects based on detected gaps (e.g., "Upload data to Synapse", "Bind validation schema").
10. ✅ Comparison report generated: old `risk-report-manager` output vs. new integrated triage table, showing reduction in false positives and improved prioritization.

## Nice-to-haves (post-MVP)

- **Interactive web UI**: Sortable/filterable HTML dashboard hosted on GitHub Pages or internal server.
- **Historical trend tracking**: Store snapshots of triage table over time; generate "progress reports" (e.g., "10 projects moved from Critical to Low this month").
- **Automated Jira updates**: If Synapse data status changes to "Available", auto-transition Jira ticket to "Resolved" (requires Jira write permissions).
- **GitHub PR/issue integration**: If DSP is missing, auto-create a GitHub issue in `dcc-site` repo assigned to project PI.
- **Synapse schema validation details**: Drill down into which specific annotations are failing (not just pass/fail).
- **Slack bot**: `/triage <synapse_id>` command to fetch project status on demand.
- **Multi-funder dashboard**: Separate views/reports for each funder, similar to existing `risk_report_ctf.md` etc.
- **Predictive scoring**: ML model to predict which projects are at risk of missing deadlines based on historical patterns.

## Risks & mitigations

| Risk | Mitigation |
|------|-----------|
| **Jira API rate limits** | Cache Jira data; use JQL filters to fetch only updated tickets; batch requests. |
| **Synapse ID missing in Jira** | Fallback to project name fuzzy matching; manual curator queue for unmatched projects. |
| **GitHub DSP filenames don't match Synapse IDs** | Document naming convention; add mapping file in `dcc-site` if needed. |
| **Stale Jira due dates** | Include "Last Updated" timestamp in Jira data; flag tickets not touched in >90 days. |
| **Synapse schema validation not run** | Trigger validation API call if status is "Not Run"; document how to enable auto-validation. |
| **Scoring algorithm bias** | Validate weights with data managers; allow per-funder customization. |
| **Authentication fatigue** | Use credential manager (e.g., `.env` file, OS keyring) for Jira/Synapse/GitHub tokens. |
| **Data privacy** | Ensure Jira custom fields don't leak PII; restrict triage table access to authorized users. |

## Open questions

1. **Which Jira custom field** stores the Synapse ID? (Need exact field name/ID.)
2. **What defines "custom schema"** in Synapse? (Any non-default schema, or specific to NF data models?)
3. **How is annotation validation status exposed?** (Synapse API endpoint, annotation field, or separate service?)
4. **Should we fetch Jira comments** to detect deadline negotiations, or only rely on `duedate` field updates?
5. **DSP filename convention**: Is it always `{synapse_id}.json`, or are there exceptions?
6. **Who owns the triage dashboard**? (Where should output be published—GitHub repo, Synapse project, internal wiki?)
7. **Integration with existing `risk-report-manager`**: Replace entirely, or run in parallel during transition period?
8. **Alert thresholds**: What urgency score warrants a Slack/email notification?
9. **Write-back permissions**: Do we have Jira API write access to auto-close tickets, or read-only?
10. **Funder classification**: Should we trust Jira labels, Synapse `fundingAgency` annotation, or GitHub DSP `funder` field as ground truth?

## Related issues / references

- **Existing tool**: [`risk-report-manager`](https://github.com/nf-osi/risk-report-manager)
- **Data sharing plans**: https://github.com/nf-osi/dcc-site/tree/main/json/dsp
- **Synapse Python client**: https://python-docs.synapse.org/
- **Jira REST API**: https://developer.atlassian.com/cloud/jira/platform/rest/v3/
- **Related issue**: [Automate Synapse‑based Checks for Data Availability & Annotation Status](https://github.com/nf-osi/risk-report-manager/issues/3)
- **Related issue**: [Translate risk label method to dcc-site when ready for release](https://github.com/nf-osi/risk-report-manager/issues/8)

## Example output (Markdown table)

```markdown
# NF-OSI Project Triage Dashboard
*Generated: 2025-01-15 06:00 ET | Total projects: 87 | Critical: 12 | High: 23*

| Synapse ID | Project Name | Funder | Jira Ticket | Jira Status | Days Until Due | Data Uploaded? | Schema Bound? | Annotations OK? | DSP Exists? | Urgency Score | Risk | Action Items |
|------------|--------------|--------|-------------|-------------|----------------|----------------|---------------|-----------------|-------------|---------------|------|--------------|
| [syn12345678](https://synapse.org/syn12345678) | MPNST PDX Models | CTF | [NF-123](https://jira.../NF-123) | In Progress | **-15** 🔴 | ❌ | ❌ | N/A | ❌ | **95** | Critical | Upload data, Bind schema, Create DSP |
| [syn23456789](https://synapse.org/syn23456789) | NF2 Clinical Trial | GFF | [NF-456](https://jira.../NF-456) | Open | 5 🟡 | ✅ | ✅ | ❌ | ✅ | **75** | High | Fix annotation validation errors |
| [syn34567890](https://synapse.org/syn34567890) | Schwannoma scRNA-seq | NTAP | [NF-789](https://jira.../NF-789) | In Progress | 45 🟢 | ✅ | ❌ | ✅ | ✅ | **45** | Medium | Bind custom schema |
...
```

---

**Replaces/Enhances**: [`risk-report-manager`](https://github.com/nf-osi/risk-report-manager)

Risk	Mitigation
Jira API rate limits	Cache Jira data; use JQL filters to fetch only updated tickets; batch requests.
Synapse ID missing in Jira	Fallback to project name fuzzy matching; manual curator queue for unmatched projects.
GitHub DSP filenames don't match Synapse IDs	Document naming convention; add mapping file in `dcc-site` if needed.
Stale Jira due dates	Include "Last Updated" timestamp in Jira data; flag tickets not touched in >90 days.
Synapse schema validation not run	Trigger validation API call if status is "Not Run"; document how to enable auto-validation.
Scoring algorithm bias	Validate weights with data managers; allow per-funder customization.
Authentication fatigue	Use credential manager (e.g., `.env` file, OS keyring) for Jira/Synapse/GitHub tokens.
Data privacy	Ensure Jira custom fields don't leak PII; restrict triage table access to authorized users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: Integrated Project Triage Dashboard: Jira + Synapse + GitHub #25

Problem

Goal

User stories

Scope (MVP)

1. Data sources & fields

2. Data integration logic

3. Triage scoring & prioritization

4. Output: Triage table

5. Architecture

6. Discrepancy detection & alerts

7. UI/Output formats

Acceptance criteria

Nice-to-haves (post-MVP)

Risks & mitigations

Open questions

Related issues / references

Example output (Markdown table)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Source	What we fetch	Key fields
Jira (NF-OSI project)	Service tickets for data management	`key`, `summary`, `status`, `priority`, `duedate`, `updated`, `assignee`, `customfield_synapse_id`, `resolution`, `SLA` metrics (if available)
Synapse.org	Project metadata, file upload status, schema/annotation compliance	`id` (Synapse project ID), `projectStatus`, `dataStatus`, `fileCount`, `hasCustomSchema` (boolean: non-default schema bound?), `annotationValidation` (passing/failing/not run), `lastModified`
GitHub (dcc-site)	Data sharing plan (DSP) JSON files	Filename → Synapse ID mapping (parse `dcc-site/json/dsp/*.json`), `dsp_exists` (boolean), `dsp_last_updated`

idea: Integrated Project Triage Dashboard: Jira + Synapse + GitHub #25

Description

Problem

Goal

User stories

Scope (MVP)

1. Data sources & fields

2. Data integration logic

3. Triage scoring & prioritization

4. Output: Triage table

5. Architecture

6. Discrepancy detection & alerts

7. UI/Output formats

Acceptance criteria

Nice-to-haves (post-MVP)

Risks & mitigations

Open questions

Related issues / references

Example output (Markdown table)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions