-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Created with AI assistance from Claude
Prompt: Draft a new issue with the same format. For this new issue, we want to integrate data from Jira (specifically, the NF-OSI project service tickets), Synapse.org (to check if data has been uploaded, a validation schema besides the default schema has been bound, and if the data annotations are passing validation), and GitHub (specifically, the dcc-site repository to check data sharing plans available in this folder https://github.com/nf-osi/dcc-site/tree/main/json/dsp). The goal is to integrate the data to compile a table of projects to triage which ones need work and how urgently. This is important because project deadlines can change based on conversations in Jira but do not get updated in GitHub or on Synapse. This will be an improvement on an existing tool known as the "risk-report-manager". [provided README for nf-osi/risk-report-manager GitHub repository]
Problem
Project deadlines and priorities are managed across three disconnected systems—Jira service tickets (NF-OSI project), Synapse.org (data upload status, validation schemas, annotation compliance), and GitHub (data sharing plans in dcc-site repository)—but none of them stay in sync. Deadlines negotiated in Jira conversations don't propagate to GitHub or Synapse metadata. Data managers must manually cross-reference all three platforms to triage which projects need urgent attention, leading to missed deadlines, duplicate work, and stale risk assessments in the existing risk-report-manager tool.
Goal
Build an integrated triage dashboard that:
- Fetches real-time data from Jira (service tickets, updated deadlines, SLA status), Synapse (upload status, schema binding, annotation validation), and GitHub (data sharing plans).
- Joins data across platforms using project identifiers (Synapse ID, Jira ticket key, GitHub file mapping).
- Generates a unified triage table ranking projects by urgency, completeness, and blockers.
- Replaces/enhances the existing
risk-report-managerwith a single source of truth that reflects live conversations and technical status.
User stories
- Data manager: "Which projects have overdue Jira tickets AND missing Synapse data?" → dashboard shows top 10 projects sorted by combined urgency score, with direct links to Jira, Synapse, and GitHub.
- Program coordinator: "A PI just renegotiated their deadline in Jira—does the triage table reflect this?" → yes, the next automated run (or manual refresh) pulls updated due dates from Jira and re-ranks projects.
- Compliance officer: "Show me all projects where data is uploaded but failing validation" → filter to
data_uploaded=TrueANDvalidation_passing=False, export as CSV. - Curator: "Flag projects missing a data sharing plan in GitHub" → dashboard highlights projects without a corresponding JSON file in
dcc-site/json/dsp/.
Scope (MVP)
1. Data sources & fields
| Source | What we fetch | Key fields |
|---|---|---|
| Jira (NF-OSI project) | Service tickets for data management | key, summary, status, priority, duedate, updated, assignee, customfield_synapse_id, resolution, SLA metrics (if available) |
| Synapse.org | Project metadata, file upload status, schema/annotation compliance | id (Synapse project ID), projectStatus, dataStatus, fileCount, hasCustomSchema (boolean: non-default schema bound?), annotationValidation (passing/failing/not run), lastModified |
| GitHub (dcc-site) | Data sharing plan (DSP) JSON files | Filename → Synapse ID mapping (parse dcc-site/json/dsp/*.json), dsp_exists (boolean), dsp_last_updated |
2. Data integration logic
Join key hierarchy:
- Primary:
Synapse ID(links all three systems) - Fallback: Project name fuzzy matching (when Synapse ID missing in Jira or GitHub)
Integration steps:
- Fetch Jira tickets where
customfield_synapse_idis populated OR project name contains "NF-" pattern. - For each Synapse ID, query Synapse API for project annotations, file count, schema binding, and validation status.
- Check if
{synapse_id}.jsonexists indcc-site/json/dsp/(clone repo or use GitHub API). - Join all three datasets into a single table keyed by Synapse ID.
3. Triage scoring & prioritization
Urgency score (0–100, higher = more urgent):
score = 0
# Jira factors (40 points max)
if jira_status in ["In Progress", "Reopened"]: score += 10
if jira_priority == "Highest": score += 15
elif jira_priority == "High": score += 10
if days_until_jira_due < 0: score += 15 # overdue
elif days_until_jira_due < 30: score += 10 # due soon
if jira_sla_breached: score += 5
# Synapse factors (40 points max)
if synapse_data_status in ["Data Pending", "Under Embargo"]: score += 5
if synapse_file_count == 0 and synapse_project_status == "Active": score += 15 # active but no data
if not has_custom_schema and synapse_file_count > 0: score += 10 # data but no schema
if annotation_validation == "Failing": score += 10
# GitHub factors (20 points max)
if not dsp_exists: score += 15 # missing data sharing plan
if dsp_last_updated > 365 days ago: score += 5 # stale DSP
Risk categories (similar to existing risk-report-manager, but enhanced):
- Critical: Overdue Jira + no Synapse data + no DSP
- High: Jira due soon + data uploaded but failing validation
- Medium: Missing custom schema OR stale DSP
- Low: All green, monitoring only
4. Output: Triage table
Columns:
Synapse ID(link)Project NameFunder(CTF/GFF/NTAP/Unknown, from Jira labels or Synapse annotations)Jira Ticket(link, status, priority, due date)Jira Status(Open, In Progress, Resolved, etc.)Days Until Jira Due(negative = overdue)Synapse Status(Active, Completed, etc.)Data Uploaded?(boolean, based on file count)Custom Schema Bound?(boolean)Annotations Passing?(Pass/Fail/Not Run)DSP Exists?(boolean, link to GitHub file)DSP Last Updated(date)Urgency Score(0–100)Risk Category(Critical/High/Medium/Low)Action Items(auto-generated list, e.g., "Upload data", "Fix annotations", "Create DSP")
Sorting: By urgency score (descending), then by Jira due date (ascending).
Filtering: By funder, risk category, Jira status, Synapse status, any boolean field.
Export: CSV, JSON, Markdown table.
5. Architecture
Pipeline:
┌─────────────────┐
│ Jira API │──┐
│ (REST/JQL) │ │
└─────────────────┘ │
├──► Integration Script ──► Scoring Engine ──► Triage Table
┌─────────────────┐ │ (Python) (scoring.py) (CSV/MD/JSON)
│ Synapse API │──┤
│ (Python client)│ │
└─────────────────┘ │
│
┌─────────────────┐ │
│ GitHub API │──┘
│ (dcc-site repo)│
└─────────────────┘
Key modules:
fetch_jira_data.py: Query NF-OSI Jira project, export tickets with Synapse IDs.fetch_synapse_data.py: For each Synapse ID, fetch annotations, file count, schema, validation status (reuse/refactor fromsynapse_validator.py).fetch_github_dsp.py: Clone or querydcc-siterepo, build map of Synapse ID → DSP metadata.integrate_data.py: Join all three sources by Synapse ID; handle missing/mismatched IDs.calculate_urgency.py: Apply scoring rules, assign risk categories.generate_triage_table.py: Output sorted table(s), split by funder if needed.run_triage_dashboard.sh: Orchestrate all steps (replaces/extendsrun_risk_report.sh).
Scheduling:
- Daily: Automated run (cron/GitHub Actions) at 6 AM ET.
- On-demand: Manual refresh via CLI or web UI.
6. Discrepancy detection & alerts
Auto-detect mismatches:
- Jira due date ≠ Synapse
embargoEndDateannotation (flag for manual review). - Jira status = "Resolved" but Synapse data status = "Data Pending" (data manager needs to upload).
- Synapse project status = "Completed" but Jira ticket still open (close ticket).
- DSP exists in GitHub but Synapse project has no
dataContributionStatusannotation (sync issue).
Alert rules:
- If urgency score ≥ 80: Slack/email notification to assigned data manager.
- If Jira ticket overdue by >30 days AND no Synapse activity: Escalate to PI/program manager.
7. UI/Output formats
Primary output: Markdown tables (like existing risk_report_*.md files).
Enhanced outputs:
- Interactive HTML dashboard (optional, using Datatables.js or similar):
- Sortable, filterable columns.
- Click row → expands to show detailed action items + links.
- Synapse table: Upload triage table to a Synapse project for sharing with non-technical users.
- API endpoint (future): REST API to query triage status programmatically.
Acceptance criteria
- ✅ Integration script successfully fetches data from all three sources (Jira, Synapse, GitHub) for ≥50 projects.
- ✅ ≥90% of Synapse IDs from Jira tickets are successfully matched to Synapse projects.
- ✅ Triage table includes all required columns with no null values for critical fields (Synapse ID, Jira ticket, urgency score).
- ✅ Urgency scoring correctly ranks projects: at least 3 "Critical" projects appear in top 10 when manually verified against source data.
- ✅ Discrepancy detection flags ≥5 real mismatches (e.g., Jira resolved but Synapse data missing) in test run.
- ✅ Generated table is filterable by funder (CTF/GFF/NTAP) and exportable as CSV.
- ✅ Automated run completes in ≤5 minutes for ~100 projects.
- ✅ Dashboard detects when Jira due date changes (simulate by updating a test ticket) and re-ranks project in next run.
- ✅ Action items list is auto-generated for ≥80% of projects based on detected gaps (e.g., "Upload data to Synapse", "Bind validation schema").
- ✅ Comparison report generated: old
risk-report-manageroutput vs. new integrated triage table, showing reduction in false positives and improved prioritization.
Nice-to-haves (post-MVP)
- Interactive web UI: Sortable/filterable HTML dashboard hosted on GitHub Pages or internal server.
- Historical trend tracking: Store snapshots of triage table over time; generate "progress reports" (e.g., "10 projects moved from Critical to Low this month").
- Automated Jira updates: If Synapse data status changes to "Available", auto-transition Jira ticket to "Resolved" (requires Jira write permissions).
- GitHub PR/issue integration: If DSP is missing, auto-create a GitHub issue in
dcc-siterepo assigned to project PI. - Synapse schema validation details: Drill down into which specific annotations are failing (not just pass/fail).
- Slack bot:
/triage <synapse_id>command to fetch project status on demand. - Multi-funder dashboard: Separate views/reports for each funder, similar to existing
risk_report_ctf.mdetc. - Predictive scoring: ML model to predict which projects are at risk of missing deadlines based on historical patterns.
Risks & mitigations
| Risk | Mitigation |
|---|---|
| Jira API rate limits | Cache Jira data; use JQL filters to fetch only updated tickets; batch requests. |
| Synapse ID missing in Jira | Fallback to project name fuzzy matching; manual curator queue for unmatched projects. |
| GitHub DSP filenames don't match Synapse IDs | Document naming convention; add mapping file in dcc-site if needed. |
| Stale Jira due dates | Include "Last Updated" timestamp in Jira data; flag tickets not touched in >90 days. |
| Synapse schema validation not run | Trigger validation API call if status is "Not Run"; document how to enable auto-validation. |
| Scoring algorithm bias | Validate weights with data managers; allow per-funder customization. |
| Authentication fatigue | Use credential manager (e.g., .env file, OS keyring) for Jira/Synapse/GitHub tokens. |
| Data privacy | Ensure Jira custom fields don't leak PII; restrict triage table access to authorized users. |
Open questions
- Which Jira custom field stores the Synapse ID? (Need exact field name/ID.)
- What defines "custom schema" in Synapse? (Any non-default schema, or specific to NF data models?)
- How is annotation validation status exposed? (Synapse API endpoint, annotation field, or separate service?)
- Should we fetch Jira comments to detect deadline negotiations, or only rely on
duedatefield updates? - DSP filename convention: Is it always
{synapse_id}.json, or are there exceptions? - Who owns the triage dashboard? (Where should output be published—GitHub repo, Synapse project, internal wiki?)
- Integration with existing
risk-report-manager: Replace entirely, or run in parallel during transition period? - Alert thresholds: What urgency score warrants a Slack/email notification?
- Write-back permissions: Do we have Jira API write access to auto-close tickets, or read-only?
- Funder classification: Should we trust Jira labels, Synapse
fundingAgencyannotation, or GitHub DSPfunderfield as ground truth?
Related issues / references
- Existing tool:
risk-report-manager - Data sharing plans: https://github.com/nf-osi/dcc-site/tree/main/json/dsp
- Synapse Python client: https://python-docs.synapse.org/
- Jira REST API: https://developer.atlassian.com/cloud/jira/platform/rest/v3/
- Related issue: Automate Synapse‑based Checks for Data Availability & Annotation Status
- Related issue: Translate risk label method to dcc-site when ready for release
Example output (Markdown table)
# NF-OSI Project Triage Dashboard
*Generated: 2025-01-15 06:00 ET | Total projects: 87 | Critical: 12 | High: 23*
| Synapse ID | Project Name | Funder | Jira Ticket | Jira Status | Days Until Due | Data Uploaded? | Schema Bound? | Annotations OK? | DSP Exists? | Urgency Score | Risk | Action Items |
|------------|--------------|--------|-------------|-------------|----------------|----------------|---------------|-----------------|-------------|---------------|------|--------------|
| [syn12345678](https://synapse.org/syn12345678) | MPNST PDX Models | CTF | [NF-123](https://jira.../NF-123) | In Progress | **-15** 🔴 | ❌ | ❌ | N/A | ❌ | **95** | Critical | Upload data, Bind schema, Create DSP |
| [syn23456789](https://synapse.org/syn23456789) | NF2 Clinical Trial | GFF | [NF-456](https://jira.../NF-456) | Open | 5 🟡 | ✅ | ✅ | ❌ | ✅ | **75** | High | Fix annotation validation errors |
| [syn34567890](https://synapse.org/syn34567890) | Schwannoma scRNA-seq | NTAP | [NF-789](https://jira.../NF-789) | In Progress | 45 🟢 | ✅ | ❌ | ✅ | ✅ | **45** | Medium | Bind custom schema |
...Replaces/Enhances: risk-report-manager