Skip to content

BIG BUG: filehandle leak on latest version of MinKnow, "script error" on sequencing, terminates after 18hrs #80

@GabeAl

Description

@GabeAl

Ubuntu 24.04. After upgrading 2 weeks ago to the latest MinKnow version, we have been encountering a persistent error.

After ~18 hours of sequencing, highly multiplexed promethion runs (e.g. 2x96) runs fail with "Script error." The rapid kit shows this issue, but we have not tested with the native yet.

Digging into the error, I see the log files mention a very curious but telling error:

/var/log/minknow/mk_manager_svc_log-1.txt:298:                   std::terminate() called: Failed to create pod5_file: IOError: Failed to open local file '/var/lib/minknow/data/reads/tmp/P2S-03174-A/WP004_mk3/no_sample_id/20250805_1051_P2S-03174-A_PAY42344_f4ecd010/pod5/.5285fcae-479d-4408-b9b5-76351f0fc05b.tmp-reads'. Detail: [errno 24] Too many open files
/var/log/minknow/mk_manager_svc_log-1.txt:519:                   std::terminate() called: Failed to create pod5_file: IOError: Failed to open local file '/var/lib/minknow/data/reads/tmp/P2S-02548-A/WP005/WP005/20250807_1609_P2S-02548-A_PBE93237_fae0efa4/pod5/.1f2b1dd7-932a-4555-b854-c27531a01899.tmp-run-info'. Detail: [errno 24] Too many open files
/var/log/minknow/mk_manager_svc_log-2.txt:312:    detailed_error_info: Network transport error: Operation timed out after 60000 milliseconds with 0 bytes received: Timeout was reached

Indeed, monitoring the file handles open over the course of a run after forcibly removing the file descriptor limit on the services is damning:
Image

Something is screwing up in this version of MinKnow where it doesn't close its files properly. For small, short, or low-multiplex runs this doesn't rear its head as much, but for long heavily multiplexed runs it's a showstopper.

Please let me know what help I can provide to get this fixed. My temporary solution has been to adjust the .service file to
[Service]
LimitNOFILE=524288

and in bash

# bump limits for all relevant ONT processes
for p in $(pgrep -f 'minknow|control_server|basecall_manager|dorado'); do
  sudo prlimit --pid "$p" --nofile=524288:524288
done

But it's a band-aid. The problem seems to lie more with MinKnow. It's not closing the files it makes. Let me know what I can do to provide more information or help aid in the quest to discover and fix the problem for good. With files that are not properly closed, there is a real risk of corruption if, e.g. a deflate stream isn't closed properly, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions