Add benchmark script facade by victorigualada · Pull Request #268 · allenporter/home-assistant-datasets

victorigualada · 2026-03-02T11:25:08Z

Description

This PR adds a benchmark script to act as a single entrypoint for collecting outputs, evaluating them and building the leaderboards.

codecov-commenter · 2026-03-02T11:27:14Z

Codecov Report

❌ Patch coverage is 0.36364% with 274 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.35%. Comparing base (7ab8fc2) to head (a88664a).

Files with missing lines	Patch %	Lines
home_assistant_datasets/tool/benchmark/_common.py	0.00%	180 Missing ⚠️
home_assistant_datasets/tool/benchmark/collect.py	0.00%	27 Missing ⚠️
home_assistant_datasets/tool/benchmark/eval.py	0.00%	26 Missing ⚠️
home_assistant_datasets/tool/benchmark/all.py	0.00%	24 Missing ⚠️
...e_assistant_datasets/tool/benchmark/leaderboard.py	0.00%	13 Missing ⚠️
home_assistant_datasets/tool/benchmark/__init__.py	0.00%	3 Missing ⚠️
home_assistant_datasets/tool/__main__.py	0.00%	1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (7ab8fc2) and HEAD (a88664a). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (7ab8fc2) HEAD (a88664a)

2 1

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   56.88%   50.35%   -6.53%     
==========================================
  Files          51       57       +6     
  Lines        2099     2373     +274     
==========================================
+ Hits         1194     1195       +1     
- Misses        905     1178     +273

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

allenporter

Very good idea to make these scripts simpler!

allenporter · 2026-03-07T05:17:31Z

script/benchmark

+REPO_ROOT = pathlib.Path(__file__).resolve().parent.parent
+
+# Add repo root to path so we can import project modules
+sys.path.insert(0, str(REPO_ROOT))


Not sure about this. Seems like just running in a virtual environment should be able to solve this. I can include a script/setup if needed.

allenporter · 2026-03-07T05:20:52Z

script/benchmark

+_ASSIST_FAMILY_SET: set[str] = set(ASSIST_FAMILY_DATASETS)
+for _ds in ASSIST_FAMILY_DATASETS:
+    for _lang in LANGUAGES:
+        _ASSIST_FAMILY_SET.add(f"{_ds}-{_lang}")


Can we create this once without updating? e.g. something like this

Suggested change

_ASSIST_FAMILY_SET: set[str] = set(ASSIST_FAMILY_DATASETS)

for _ds in ASSIST_FAMILY_DATASETS:

for _lang in LANGUAGES:

_ASSIST_FAMILY_SET.add(f"{_ds}-{_lang}")

_ASSIST_FAMILY_SET: set[str] = set({ASSIST_FAMILY_DATASETS}) | set({

f"{_ds}-{_lang}"

for _ds in ASSIST_FAMILY_DATASETS:

for _lang in LANGUAGES

})

script/benchmark

allenporter · 2026-03-07T05:21:53Z

script/benchmark

+def get_ha_version() -> str:
+    """Get the installed Home Assistant version."""
+    try:
+        from importlib.metadata import version


Can this be at the top of the file?

allenporter · 2026-03-07T05:23:16Z

script/benchmark

+    raise SystemExit(1)  # unreachable, for type checker
+
+
+def build_env(synthetic_home_dir: pathlib.Path) -> dict[str, str]:


I feel like I might rather this setup be some other step/script to prepare the environment? It feels weird to me to much with PYTHONPATH in the middle of this script

allenporter · 2026-03-07T05:27:11Z

script/benchmark

+
+    tasks = _build_collect_tasks(datasets, model, dry_run=dry_run)
+
+    if parallel and len(tasks) > 1:


I would prefer if this didnt fork two separate paths that run all the same setup code (here and below). Can we have one path with a maximum number that run in parallel at once so we can have the same code do the base case of 1 at a time and the parallel case?

allenporter · 2026-03-07T05:29:41Z

script/benchmark

+    )
+
+
+def main() -> int:


I'm surprised to see this create a new CLI instead of improving the one that exists?

home-assistant-datasets/home_assistant_datasets/tool/__main__.py

Line 53 in 7ab8fc2

def main() -> int:

It is setup already with different subcommands and modules so you can put each subcommand in a separate package rather than all in one really large file.

Have you considered just adding a collect and eval subcommand in the existing tool?

(fine to keep openrouter script separate)

Move benchmark logic from standalone script/benchmark into home_assistant_datasets.tool.benchmark package, following the existing subcommand pattern (create_arguments + run). The script/benchmark file becomes a thin bash wrapper that delegates to: python -m home_assistant_datasets.tool benchmark

Instead of modifying PYTHONPATH at runtime to make the synthetic home custom component importable, have script/bootstrap create a .pth file in the venv's site-packages pointing to the synthetic home checkout. This removes build_env(), find_synthetic_home(), and the --synthetic-home-dir flag from the benchmark CLI.

The .pth file linking the synthetic home custom component belongs in .devcontainer/setup (environment creation) rather than script/bootstrap (dependency installation).

Add run_tasks() that handles both parallel and sequential modes, removing duplicated if/else branches from collect.py and eval.py.

Add missing NoReturn import in script/openrouter_fetch_model.py and align pre-commit codespell excludes with CI config (skip datasets/ and .ipynb files).

allenporter

This is very nice work, thank you so much for making this huge usability improvement. One more round of nits/style things.

allenporter · 2026-03-12T03:04:01Z

.devcontainer/setup

+for candidate in "${SYNTHETIC_HOME_CANDIDATES[@]}"; do
+  if [ -d "$candidate/custom_components/synthetic_home" ]; then
+    SITE_PACKAGES=$(python3 -c "import sysconfig; print(sysconfig.get_path('purelib'))")
+    PTH_FILE="${SITE_PACKAGES}/synthetic-home-component.pth"


I have never seen this before. Is this legit?

Our current instructions say to set PYTHONPATH which seems more the norm.

allenporter · 2026-03-12T03:04:32Z

home_assistant_datasets/tool/benchmark/__init__.py

@@ -0,0 +1,23 @@
+"""Benchmark subcommand.


Want to explain what it is here? compared to the other tools

allenporter · 2026-03-12T03:08:16Z