feat: add watch mode for continuous checkpoint evaluation#857
feat: add watch mode for continuous checkpoint evaluation#857
Conversation
222a5b7 to
3e9acd5
Compare
3e9acd5 to
7f80004
Compare
25a57eb to
b311bd6
Compare
agronskiy
left a comment
There was a problem hiding this comment.
Couple of small things, generally LGTM after those fixed -- unblocking with one-shot approve
|
|
||
| WATCH_STATE_FILE = os.getenv( | ||
| "NEMO_EVALUATOR_WATCH_STATE_FILE", | ||
| Path.home() / ".nemo-evaluator" / "watch-state" / "watch-state.v1.jsonl", |
There was a problem hiding this comment.
bug: when the env var is set, the getenv would get str, not Path, and down at the calling site it at line 38 I think it will crash because str does not have those methods.
| assert local_source.is_dir() | ||
| remote_destination_str = f"{username}@{hostname}:{remote_target}" | ||
| local_sources_str = " ".join(map(str, local_sources)) | ||
| rsync_upload_command = f"rsync -qcaz {local_sources_str} {remote_destination_str}" |
There was a problem hiding this comment.
question: on mac, oftentimes the local paths have paths w/ spaces (yeah we linux users should think about those mac ppl), would not that break? I suggest hacvin command as a list with sub-parts - this way the .Run below will pass them to bash properly tokenized.
There was a problem hiding this comment.
This code was just moved from one place to another, so I assume it's functional, but better be on the safe side 👍 Done in 59dba76
| """ | ||
| if socket is None: | ||
| return | ||
| ssh_command = f"ssh -O exit -S {socket} {username}@{hostname}" |
There was a problem hiding this comment.
question: I would recommand to check on all the commands to be passed as lists -- shlex.split later will break if socket path on Mac contains spaces
There was a problem hiding this comment.
...and it's inconsistent with how we open it 👍 Fixed in 6fda968
| if socket is None: | ||
| return | ||
| ssh_command = f"ssh -O exit -S {socket} {username}@{hostname}" | ||
| completed_process = subprocess.run(args=shlex.split(ssh_command)) |
There was a problem hiding this comment.
bug(?): the default of stderr param is None, I think this would lead to double-raise later when stderr.decode is called
There was a problem hiding this comment.
Actually there's no reason for us to throw an error here. I replaced with similar log.error message to the one we have when opening failed (6fda968)
Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com> Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
- Add nel-watch script entry in pyproject.toml - Remove watch subcommand from nel CLI (cli/main.py) - Simplify cli/watch.py: remove --tmux, --env-file, _build_watch_command, _launch_in_tmux; add main() entrypoint; make CLI a thin 1:1 wrapper over the Python API - Remove tmux/env-file/build_watch_command tests from test_watch.py - Remove unused MagicMock import Addresses team feedback: Unix philosophy (do one thing well), thin CLI wrapper, separate entrypoint for watch functionality. Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com> Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
…n API) Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
…e execution.sbatch_dependency Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
bb467ab to
ba136ad
Compare
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
|
/ok to test bfed76c |
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
|
/ok to test 9df4cf9 |
Signed-off-by: Marta Stepniewska-Dziubinska <martas@nvidia.com>
|
/ok to test 25c436f |
No description provided.