Add benchmark script facade#268
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #268 +/- ##
==========================================
- Coverage 56.88% 50.35% -6.53%
==========================================
Files 51 57 +6
Lines 2099 2373 +274
==========================================
+ Hits 1194 1195 +1
- Misses 905 1178 +273 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
allenporter
left a comment
There was a problem hiding this comment.
Very good idea to make these scripts simpler!
script/benchmark
Outdated
| REPO_ROOT = pathlib.Path(__file__).resolve().parent.parent | ||
|
|
||
| # Add repo root to path so we can import project modules | ||
| sys.path.insert(0, str(REPO_ROOT)) |
There was a problem hiding this comment.
Not sure about this. Seems like just running in a virtual environment should be able to solve this. I can include a script/setup if needed.
script/benchmark
Outdated
| _ASSIST_FAMILY_SET: set[str] = set(ASSIST_FAMILY_DATASETS) | ||
| for _ds in ASSIST_FAMILY_DATASETS: | ||
| for _lang in LANGUAGES: | ||
| _ASSIST_FAMILY_SET.add(f"{_ds}-{_lang}") |
There was a problem hiding this comment.
Can we create this once without updating? e.g. something like this
| _ASSIST_FAMILY_SET: set[str] = set(ASSIST_FAMILY_DATASETS) | |
| for _ds in ASSIST_FAMILY_DATASETS: | |
| for _lang in LANGUAGES: | |
| _ASSIST_FAMILY_SET.add(f"{_ds}-{_lang}") | |
| _ASSIST_FAMILY_SET: set[str] = set({ASSIST_FAMILY_DATASETS}) | set({ | |
| f"{_ds}-{_lang}" | |
| for _ds in ASSIST_FAMILY_DATASETS: | |
| for _lang in LANGUAGES | |
| }) |
script/benchmark
Outdated
| def get_ha_version() -> str: | ||
| """Get the installed Home Assistant version.""" | ||
| try: | ||
| from importlib.metadata import version |
There was a problem hiding this comment.
Can this be at the top of the file?
script/benchmark
Outdated
| raise SystemExit(1) # unreachable, for type checker | ||
|
|
||
|
|
||
| def build_env(synthetic_home_dir: pathlib.Path) -> dict[str, str]: |
There was a problem hiding this comment.
I feel like I might rather this setup be some other step/script to prepare the environment? It feels weird to me to much with PYTHONPATH in the middle of this script
script/benchmark
Outdated
|
|
||
| tasks = _build_collect_tasks(datasets, model, dry_run=dry_run) | ||
|
|
||
| if parallel and len(tasks) > 1: |
There was a problem hiding this comment.
I would prefer if this didnt fork two separate paths that run all the same setup code (here and below). Can we have one path with a maximum number that run in parallel at once so we can have the same code do the base case of 1 at a time and the parallel case?
script/benchmark
Outdated
| ) | ||
|
|
||
|
|
||
| def main() -> int: |
There was a problem hiding this comment.
I'm surprised to see this create a new CLI instead of improving the one that exists?
It is setup already with different subcommands and modules so you can put each subcommand in a separate package rather than all in one really large file.
Have you considered just adding a collect and eval subcommand in the existing tool?
(fine to keep openrouter script separate)
Move benchmark logic from standalone script/benchmark into home_assistant_datasets.tool.benchmark package, following the existing subcommand pattern (create_arguments + run). The script/benchmark file becomes a thin bash wrapper that delegates to: python -m home_assistant_datasets.tool benchmark
Instead of modifying PYTHONPATH at runtime to make the synthetic home custom component importable, have script/bootstrap create a .pth file in the venv's site-packages pointing to the synthetic home checkout. This removes build_env(), find_synthetic_home(), and the --synthetic-home-dir flag from the benchmark CLI.
The .pth file linking the synthetic home custom component belongs in .devcontainer/setup (environment creation) rather than script/bootstrap (dependency installation).
Add run_tasks() that handles both parallel and sequential modes, removing duplicated if/else branches from collect.py and eval.py.
7a33e8b to
7be8ed1
Compare
Add missing NoReturn import in script/openrouter_fetch_model.py and align pre-commit codespell excludes with CI config (skip datasets/ and .ipynb files).
allenporter
left a comment
There was a problem hiding this comment.
This is very nice work, thank you so much for making this huge usability improvement. One more round of nits/style things.
| for candidate in "${SYNTHETIC_HOME_CANDIDATES[@]}"; do | ||
| if [ -d "$candidate/custom_components/synthetic_home" ]; then | ||
| SITE_PACKAGES=$(python3 -c "import sysconfig; print(sysconfig.get_path('purelib'))") | ||
| PTH_FILE="${SITE_PACKAGES}/synthetic-home-component.pth" |
There was a problem hiding this comment.
I have never seen this before. Is this legit?
Our current instructions say to set PYTHONPATH which seems more the norm.
| @@ -0,0 +1,23 @@ | |||
| """Benchmark subcommand. | |||
There was a problem hiding this comment.
Want to explain what it is here? compared to the other tools
| def get_ha_version() -> str: | ||
| """Get the installed Home Assistant version.""" | ||
| try: | ||
| from importlib.metadata import version |
There was a problem hiding this comment.
imports should be at the top of the file
| def _read_progress(log_file: pathlib.Path) -> str: | ||
| """Read the last pytest progress percentage from a log file.""" | ||
| try: | ||
| with open(log_file, "r") as f: |
There was a problem hiding this comment.
Given you have a Path here this can use log_file.open(...) here and below.
| matches = re.findall(r"\[\s*(\d+)%\]", tail) | ||
| if matches: |
There was a problem hiding this comment.
| matches = re.findall(r"\[\s*(\d+)%\]", tail) | |
| if matches: | |
| if matches := re.findall(r"\[\s*(\d+)%\]", tail) |
| """Check that a dataset directory exists.""" | ||
| ds_dir = DATASETS_DIR / dataset_name | ||
| if not ds_dir.is_dir(): | ||
| die(f"Dataset directory not found: {ds_dir}") |
There was a problem hiding this comment.
Not a huge deal but i'd rather leverage exceptions. While it's a CLI, it's still a library.
| if accept_rc is None: | ||
| accept_rc = {0} | ||
|
|
||
| if parallel and len(tasks) > 1: |
There was a problem hiding this comment.
I don't think we should have two different ways to start the tests, parallel and non-parallel. I can provide more specific suggestions if needed.
| """ | ||
| if accept_rc is None: | ||
| accept_rc = {0} | ||
| if dry_run: |
There was a problem hiding this comment.
I'd say share more code between the two paths for dry run or not. I can give specific suggestions if that would be helpful.
| time.sleep(1) | ||
|
|
||
| # Final summary (non-TTY or to ensure it's visible) | ||
| if not is_tty: |
There was a problem hiding this comment.
Optional: you could have some status writer like thing to abstract away the difference betwen tty and not tty e.g.
| """Home Assistant Datasets command line tools. | ||
|
|
||
| ``` | ||
| usage: home-assistant-datasets [-h] [--debug] {leaderboard,convert} ... |
There was a problem hiding this comment.
refresh this too if you don't mind with running with --help
Description
This PR adds a
benchmarkscript to act as a single entrypoint for collecting outputs, evaluating them and building the leaderboards.