-
Notifications
You must be signed in to change notification settings - Fork 242
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Within 5-6 days of operations my dashboard (2.4 installed with nvflare dashboard --cloud aws docker install) became unresponsive (https connection timeout) . Unfortunately there was nothing I could find in the logs so i went ahead a created a monitoring script, that restarts the container. Not ideal but works
https://github.com/dirkpetersen/nvflare-cancer#about-2-restart-on-error
#!/bin/bash
url="https://myproject.mydomain.edu" # Set the website URL
search_string='name="viewport"' # Set the search string in the HTML source
timeout_duration=15 # Set the timeout duration in seconds
date=$(date) # Get the current date for logging
# Check if the search string exists in the HTML source code
if curl -k -s -m $timeout_duration $url | grep -q "$search_string"; then
echo "${date}: OK ! The search string '$search_string' was found in the HTML source code of $url"
else
echo "${date}: Error ! The search string '$search_string' was not found in the HTML source code of $url or the connection timed out after $timeout_duration seconds"
# Run the commands if the search string is not found or the connection times out
echo "${date}: Restarting NVFlare dashboard"
$HOME/.local/bin/nvflare dashboard --stop
sleep 3
$HOME/.local/bin/nvflare dashboard --start -f ~
fiadding an hourly cron job:
(crontab -l 2>/dev/null; echo "59 * * * * \$HOME/monitor.sh >> /var/tmp/nvflare-monitor.log 2>&1") | crontab
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working