Skip to content

Conversation

@mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Nov 17, 2025

Description

This PR tries to clarify the final analysis_status.txt which reports the status of the jobs. Currently in production it can be difficult to know what fails to pay attention to. This tries to clarify this by putting rerun jobs which later succeeded under a different heading.

This also fixes the issue with using scontrol which only has access to job statuses within 4 days time on our cluster, instead we're using sacct now.

Changed

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • [Document Name]

Tests

Feature Tests

  • Test running the updated script on a logdir with failed jobs that succeeded on rerun:
=== Job status check at 2025-11-17 13:20:03 ===
FAILED JOBS (no successful retry):
10717137         /full/path/CASE/logs/rule_all/10717137.log
10717187         /full/path/CASE/logs/rule_all/10717187.log

CANCELLED JOBS (no successful retry):
10717167        /full/path/CASE/logs/rule_all/10717167.log


NOTE:
Some jobs failed but succeeded on retry:
(jobid  log_path        original_state)
10714481        /full/path/CASE/logs/rule_samtools_fixmate/tumor_ACC12345/10714481.log      FAILED

Example below from a successful analysis:

=== Job status check at 2025-11-18 13:54:03 ===

SUCCESSFUL


NOTE:
Some jobs failed but succeeded on retry:
(jobid  log_path        original_state)
10714447        /home/proj/production/cancer/cases/credibleimpala/logs/rule_samtools_fixmate/tumor_ACC19308A2/10714447.log      FAILED

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified
  • TGA T/N Workflow
    • N/A
    • Verified
  • UMI T/O Workflow
    • N/A
    • Verified
  • UMI T/N Workflow
    • N/A
    • Verified
  • WGS T/O Workflow
    • N/A
    • Verified
  • WGS T/N Workflow
    • N/A
    • Verified
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

  • Atlas documentation
    • N/A
    • Updated: [Link]
  • Web portal for Clinical Genomics
    • N/A
    • Updated: [Link]

Panel of Normal specific criteria

User Changes

  • N/A
  • This PR affects the output files or results.
    • User feedback is considered unnecessary because [Justification].
    • Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Validation criteria

Validation criteria to be added to validation report PR: [LINK-TO-VALIDATION-REPORT-PR from the validations repository]

Version specific criteria

  • Text here or N/A

Important

One of the below checkboxes for validation need to be checked

  • Added version specific validation criteria to validation report
  • Changes validated in standard sections: [validation-section]
  • Validation criteria not necessary

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Validation criteria
    • Completed the validation criteria section of the template.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Validation criteria
    • The author has completed the validation criteria section of the template
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.33%. Comparing base (7d529e6) to head (8d1600b).
⚠️ Report is 141 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1636      +/-   ##
===========================================
- Coverage    99.48%   99.33%   -0.15%     
===========================================
  Files           40       40              
  Lines         1932     1960      +28     
===========================================
+ Hits          1922     1947      +25     
- Misses          10       13       +3     
Flag Coverage Δ
unittests 99.33% <ø> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mathiasbio mathiasbio marked this pull request as ready for review November 17, 2025 12:56
@mathiasbio mathiasbio requested a review from a team as a code owner November 17, 2025 12:56
@mathiasbio mathiasbio self-assigned this Nov 17, 2025
@mathiasbio mathiasbio linked an issue Nov 17, 2025 that may be closed by this pull request
3 tasks
Copy link
Contributor

@fevac fevac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is really needed but it's cleaner for sure 🌟

Copy link
Contributor

@fevac fevac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been tested?

@mathiasbio
Copy link
Collaborator Author

@fevac I have tested it yes : ) part of the change here has also been updated slightly, and I'll update the PR description. But basically I swapped out scontrol for sacct, as the scontrol had an issue where it's looking for JobId status in memory, and those are apparently cleaned every 4th day. Sacct info is stored indefinitely so this avoids us getting unknown statuses.

@mathiasbio mathiasbio linked an issue Nov 18, 2025 that may be closed by this pull request
3 tasks
@sonarqubecloud
Copy link

@mathiasbio mathiasbio merged commit 7df2302 into develop Nov 19, 2025
8 of 9 checks passed
@mathiasbio mathiasbio deleted the clarify_failjobs branch November 19, 2025 08:48
mathiasbio added a commit that referenced this pull request Nov 20, 2025
Changed:
^^^^^^^^
* moved default resource allocation to snakemake command #1632
* increased memory of samtools fixmate #1632
* increased runtime for rule all  #1632
* no rerun for rule all #1632
* increased head-job runtime to 7 days #1632
* improved information on failed job status #1636
* scontrol replaced with sacct in jobstatus script #1636
* add attempt mem bump to vep_somatic_research_sv #1632

Removed:
^^^^^^^^
* exome argument panel bed callback function #1632
* removed -l flag in head-job sbatch script #1632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Maintenance] Unknown job status [Maintenance] Clean up / Clarify restarted jobs status in final report

3 participants