-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Ubuntu 24.04. After upgrading 2 weeks ago to the latest MinKnow version, we have been encountering a persistent error.
After ~18 hours of sequencing, highly multiplexed promethion runs (e.g. 2x96) runs fail with "Script error." The rapid kit shows this issue, but we have not tested with the native yet.
Digging into the error, I see the log files mention a very curious but telling error:
/var/log/minknow/mk_manager_svc_log-1.txt:298: std::terminate() called: Failed to create pod5_file: IOError: Failed to open local file '/var/lib/minknow/data/reads/tmp/P2S-03174-A/WP004_mk3/no_sample_id/20250805_1051_P2S-03174-A_PAY42344_f4ecd010/pod5/.5285fcae-479d-4408-b9b5-76351f0fc05b.tmp-reads'. Detail: [errno 24] Too many open files
/var/log/minknow/mk_manager_svc_log-1.txt:519: std::terminate() called: Failed to create pod5_file: IOError: Failed to open local file '/var/lib/minknow/data/reads/tmp/P2S-02548-A/WP005/WP005/20250807_1609_P2S-02548-A_PBE93237_fae0efa4/pod5/.1f2b1dd7-932a-4555-b854-c27531a01899.tmp-run-info'. Detail: [errno 24] Too many open files
/var/log/minknow/mk_manager_svc_log-2.txt:312: detailed_error_info: Network transport error: Operation timed out after 60000 milliseconds with 0 bytes received: Timeout was reached
Indeed, monitoring the file handles open over the course of a run after forcibly removing the file descriptor limit on the services is damning:

Something is screwing up in this version of MinKnow where it doesn't close its files properly. For small, short, or low-multiplex runs this doesn't rear its head as much, but for long heavily multiplexed runs it's a showstopper.
Please let me know what help I can provide to get this fixed. My temporary solution has been to adjust the .service file to
[Service]
LimitNOFILE=524288
and in bash
# bump limits for all relevant ONT processes
for p in $(pgrep -f 'minknow|control_server|basecall_manager|dorado'); do
sudo prlimit --pid "$p" --nofile=524288:524288
done
But it's a band-aid. The problem seems to lie more with MinKnow. It's not closing the files it makes. Let me know what I can do to provide more information or help aid in the quest to discover and fix the problem for good. With files that are not properly closed, there is a real risk of corruption if, e.g. a deflate stream isn't closed properly, etc.