Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Use earlyoom to terminate a process using too much memory #1402

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion deploy/.gitignore

This file was deleted.

10 changes: 2 additions & 8 deletions deploy/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,17 @@
FROM python:3.11-slim-bookworm

# Setup SSH
RUN mkdir /root/.ssh
COPY id_ed25519 /root/.ssh/id_ed25519
COPY known_hosts /root/.ssh/known_hosts
RUN chmod -R 0600 /root/.ssh

# Install Git, Git LFS. How to silence apt-get commands:
# https://peteris.rocks/blog/quiet-and-unattended-installation-with-apt-get/
RUN apt-get update -qq
RUN DEBIAN_FRONTEND=noninteractive apt-get install -qq -y git git-lfs < /dev/null > /dev/null
RUN DEBIAN_FRONTEND=noninteractive apt-get install -qq -y git git-lfs earlyoom < /dev/null > /dev/null
# Cleanup apt cache
RUN rm -rf /var/lib/apt/lists/*
# Configure git lfs
RUN git lfs install

# Clone the TLOModel repository and move in it
ARG BRANCH_NAME="master"
RUN git clone --branch "${BRANCH_NAME}" git@github.com:UCL/TLOmodel.git /TLOmodel
RUN git clone --branch "${BRANCH_NAME}" https://github.com/UCL/TLOmodel.git /TLOmodel
WORKDIR /TLOmodel
RUN git gc --aggressive

Expand Down
1 change: 0 additions & 1 deletion deploy/id_ed25519.pub

This file was deleted.

1 change: 0 additions & 1 deletion deploy/known_hosts

This file was deleted.

19 changes: 19 additions & 0 deletions src/tlo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,25 @@ def batch_submit(ctx, scenario_file, asserts_on, more_memory, keep_pool_alive, i
task_dir = "${{AZ_BATCH_TASK_DIR}}"
gzip_pattern_match = "{{txt,log}}"
command = f"""
# Logfile for `earlyoom`
EARLYOOM_LOG=/tmp/earlyoom.log

function terminate_and_log() {{{{
# Terminate the `earlyoom` process
kill "${{{{EARLYOOM_PID}}}}"
# Print line(s) in the log file referencing terminated processes, if any.
grep "^sending SIGTERM to process" "${{{{EARLYOOM_LOG}}}}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catches also SIGKILL:

Suggested change
grep "^sending SIGTERM to process" "${{{{EARLYOOM_LOG}}}}"
grep "^sending \w+ to process" "${{{{EARLYOOM_LOG}}}}"

}}}}

# Add a trap at exit to terminate `earlyoom` and print relevant log lines.
trap terminate_and_log EXIT

# Delete existing log files, if any.
rm -f "${{{{EARLYOOM_LOG}}}}"
# Start the `earlyoom` daemon.
nohup earlyoom -m 95 -s 100 &> "${{{{EARLYOOM_LOG}}}}" &
Comment on lines +252 to +253
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the argument -m 5 backward, the argument is the amount of memory available, not used:

Suggested change
# Start the `earlyoom` daemon.
nohup earlyoom -m 95 -s 100 &> "${{{{EARLYOOM_LOG}}}}" &
# Start the `earlyoom` daemon. Ignore swap usage, terminate process if less than 5% of main memory available.
nohup earlyoom -m 5 -s 100 &> "${{{{EARLYOOM_LOG}}}}" &

EARLYOOM_PID="${{{{!}}}}"

git fetch origin {commit.hexsha}
git checkout {commit.hexsha}
pip install -r requirements/base.txt
Expand Down