-
Notifications
You must be signed in to change notification settings - Fork 28
Tweak milabench realtime event tracking #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
* Add job push button * Add status helper to avoid rsync when cache is good
515cfaf
to
c88db6d
Compare
c88db6d
to
556aa28
Compare
c13b037
to
c7919ef
Compare
c8f9224
to
8adb21c
Compare
953b4f0
to
5cb60da
Compare
5cb60da
to
031f1b4
Compare
9070279
to
417457d
Compare
8fb880e
to
0abd286
Compare
0abd286
to
4a57574
Compare
bb49359
to
2ae3c96
Compare
nonlocal process_registry | ||
|
||
if process_registry.get(hostname) is None: | ||
proc = subprocess.Popen(cmd) |
Check failure
Code scanning / CodeQL
Uncontrolled command line Critical
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
To fix this problem:
- Only allow well-formed hostnames to be passed to the SSH command, and reject or sanitize anything else.
- Make a whitelist validation before using the value in
reverse_ssh_tunnel
and subsequently in any subprocess call. - The recommended solution is to use a regular expression to restrict hostnames to valid DNS hostnames or IP addresses (rejecting whitespace, control characters, and metacharacters).
- The check should be added in the
/api/metric/<string:hostname>
route and possibly also in thereverse_ssh_tunnel
function for defense in depth. - Required: Add
import re
for regex validation.
Edits must be:
- Add an
import re
. - At the start of
open_reverse_ssh
, validate the hostname with a regex that only allows DNS hostnames or IP addresses. - If validation fails, return an error response (
400 Bad Request
).
-
Copy modified line R4 -
Copy modified lines R75-R80
@@ -1,6 +1,7 @@ | ||
import os | ||
import requests | ||
import subprocess | ||
import re | ||
from threading import Thread, Lock, Event | ||
import json | ||
|
||
@@ -71,6 +72,12 @@ | ||
|
||
@app.route('/api/metric/<string:hostname>') | ||
def open_reverse_ssh(hostname: str): | ||
# Validate hostname: allow only DNS hostnames or IPv4/IPv6 addresses, no spaces or metacharacters | ||
hostname_regex = r"^(?!-)[A-Za-z0-9.-]{1,253}(?<!-)$" | ||
ipv4_regex = r"^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$" | ||
ipv6_regex = r"^[0-9a-fA-F:]+$" | ||
if not (re.match(hostname_regex, hostname) or re.match(ipv4_regex, hostname) or re.match(ipv6_regex, hostname)): | ||
return {"status": "error", "message": "invalid hostname"}, 400 | ||
cmd = reverse_ssh_tunnel(hostname) | ||
|
||
nonlocal process_registry |
subprocess.check_call(scp_cmd, shell=True) | ||
# rsync the job to remote | ||
result = local_command("rsync", "-az", new_local_dir + "/", f"{SLURM_HOST}:{new_remote_dir}") | ||
if not result['success']: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
To address the Uncontrolled data used in path expression
issue, the code should validate and normalize the constructed path to ensure that it cannot be manipulated to escape the intended cache directory. Specifically, we should:
- After constructing the full path used in the file write operation (
f"{new_local_dir}/cmd.sh"
), useos.path.normpath
to normalize the path. - Check that the normalized path starts with the intended root (
JOBRUNNER_LOCAL_CACHE
). - If it does not, abort the operation (e.g., raise an Exception or return an error response).
- Only attempt to open the file after validating the path.
This will ensure that no constructed path can escape the intended job cache directory, even if a crafted jr_job_id
attempts directory traversal.
Add any needed imports only if not present (in this case, os
is already imported).
Locate the code block starting at line 1060 and add normalization and validation before the with open(...)
call.
-
Copy modified lines R1060-R1065
@@ -1057,7 +1057,12 @@ | ||
|
||
new_cmd = "'" + old_cmd.replace(old_jr_job_id, new_jr_job_id) + "'" | ||
|
||
with open(f"{new_local_dir}/cmd.sh", "w") as fp: | ||
output_path = os.path.normpath(f"{new_local_dir}/cmd.sh") | ||
# Ensure path is within JOBRUNNER_LOCAL_CACHE | ||
cache_root = os.path.abspath(JOBRUNNER_LOCAL_CACHE) | ||
if not output_path.startswith(cache_root): | ||
return jsonify({'error': 'Invalid job id: path outside of cache root'}), 400 | ||
with open(output_path, "w") as fp: | ||
fp.write(new_cmd[1:-1]) | ||
fp.flush() | ||
|
return jsonify({'error': result['stderr']}), 500 | ||
|
||
@app.route('/api/slurm/jobs/<jr_job_id>/earlysync/<job_id>') | ||
def api_early_sync(jr_job_id, job_id): |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 16 days ago
To fix the information exposure issue, we should avoid returning the raw exception message in API responses visible to the end user. Instead, the exception (with or without its stack trace) should be logged on the server side using proper logging methods, while the client only receives a generic error message. This adjustment should be made in local_command
, replacing the user-visible stderr: str(e)
field with a fixed friendly message (e.g., "Internal error occurred" or just "Command failed"), and logging the actual error for server diagnostics.
- Edit
local_command
inmilabench/web/slurm.py
, changing the return value in the exception handler. - Add proper logging (preferably using the standard
logging
library) for the exception, including the stack trace. - All usages of
local_command
(as visible in the provided snippet) will now propagate the generic message, reducing information leak.
-
Copy modified line R15 -
Copy modified line R747 -
Copy modified line R749 -
Copy modified line R753
@@ -12,6 +12,7 @@ | ||
import uuid | ||
from functools import wraps | ||
import traceback | ||
import logging | ||
import time | ||
import threading | ||
from filelock import FileLock, Timeout | ||
@@ -743,12 +744,13 @@ | ||
'returncode': result.returncode | ||
} | ||
except Exception as e: | ||
import logging | ||
import traceback | ||
traceback.print_exc() | ||
logging.error("Exception in local_command: %s", traceback.format_exc()) | ||
return { | ||
'success': False, | ||
'stdout': '', | ||
'stderr': str(e), | ||
'stderr': 'An internal error occurred.', # Do not expose details | ||
'returncode': -1 | ||
} | ||
|
ecab6b6
to
3039a21
Compare
3039a21
to
df0cf52
Compare
de42aee
to
7e62f19
Compare
7e62f19
to
096141d
Compare
No description provided.