5b918bf2_emhass_2026-04-23T15-10-30.195Z.log
Describe the bug
When two EMHASS optimization calls overlap in time, all internal post_data() calls (in
_publish_standard_forecasts, _publish_deferrable_loads, and _publish_battery_data)
read and write /share/entities/metadata.json concurrently. This causes the file to be
written by one request while another is reading it, resulting in a corrupted JSON file and
a subsequent crash.
The symptom is a repeating cascade of errors in the EMHASS log:
ERROR in retrieve_hass: Corrupted metadata file found at /share/entities/metadata.json. Creating a new one.
ERROR in app: Exception on request POST /action/naive-mpc-optim
Traceback (most recent call last):
File "/app/src/emhass/retrieve_hass.py", line 1343, in post_data
metadata = orjson.loads(content)
orjson.JSONDecodeError: unexpected content after document: line 44 column 3 (char 1149)
During handling of the above exception, another exception occurred:
File "/app/src/emhass/retrieve_hass.py", line 1351, in post_data
os.rename(metadata_path, corrupt_path)
FileNotFoundError: [Errno 2] No such file or directory:
'/share/entities/metadata.json' -> '/share/entities/metadata_corrupt.json'
The FileNotFoundError on os.rename() indicates that a parallel thread already deleted
the file in the same error-handling path — confirming this is a race condition between
concurrent requests, not a one-off file corruption.
Note: This analysis was performed with the help of an AI (Claude by Anthropic) via
log file analysis. The root cause description below is based on static code analysis
of the EMHASS source and the log patterns observed. It has not been confirmed by
running a debugger.
To Reproduce
- Use EMHASS as a Home Assistant add-on (v0.17.2, HA OS).
- Configure a PyScript that posts to
/action/naive-mpc-optim every hour (:05).
- Add any logic that causes the PyScript to
await task.sleep(30) after the first
successful POST (e.g. waiting for sensor data to be written back before reading it).
- While the PyScript is sleeping inside the first call, the next hourly trigger fires
and issues a second POST to EMHASS.
- EMHASS is now processing two
naive_mpc_optim requests concurrently. Each call
invokes publish_data(), which internally makes multiple sequential post_data()
calls. These interleave with each other and corrupt metadata.json.
The cascade then self-perpetuates: each failed POST in the error handler may trigger
another retry POST, which overlaps again with the next scheduled call.
Root cause (AI-assisted analysis)
naive_mpc_optim in command_line.py calls publish_data(), which internally calls:
_publish_standard_forecasts() → post_data()
_publish_deferrable_loads() → post_data()
_publish_battery_data() → post_data()
Each post_data() in retrieve_hass.py reads, modifies, and re-writes
/share/entities/metadata.json without any file-level lock. If two naive_mpc_optim
requests are handled concurrently (ASGI/Quart allows this), their post_data() calls
interleave and corrupt the file.
PR #716 (merged in v0.17.0) added a try/except around orjson.loads() and attempts
os.rename() on the corrupt file. However, it does not prevent concurrent writes —
it only handles the symptom. When two threads hit the error handler simultaneously,
the first thread deletes the file and the second thread's os.rename() fails with
FileNotFoundError, crashing the entire request.
Expected behavior
Concurrent post_data() calls should not corrupt metadata.json. Each write should be
atomic. Either:
- A per-file asyncio lock (e.g.
asyncio.Lock) should serialize all post_data() calls
that touch the same file, or
- Writes should use atomic replace semantics (write to a temp file, then
os.replace()).
Screenshots / Log excerpt
Log excerpt (2026-04-28, EMHASS v0.17.2, HA OS):
00:00:48 ERROR retrieve_hass: Corrupted metadata file found at /share/entities/metadata.json. Creating a new one.
00:00:48 ERROR retrieve_hass: Corrupted metadata file found at /share/entities/metadata.json. Creating a new one.
00:00:48 ERROR app: Exception on request POST /action/naive-mpc-optim
orjson.JSONDecodeError: unexpected content after document: line 44 column 3 (char 1149)
...
FileNotFoundError: '/share/entities/metadata.json' -> '/share/entities/metadata_corrupt.json'
01:00:40 ERROR retrieve_hass: Corrupted metadata file found (×5 in 73ms)
01:00:40 ERROR app: Exception on request POST /action/naive-mpc-optim
orjson.JSONDecodeError: unexpected content after document: line 20 column 1 (char 486)
FileNotFoundError: '/share/entities/metadata.json' -> '/share/entities/metadata_corrupt.json'
The error repeats every hour, and multiple "Corrupted metadata file" messages appear
within milliseconds of each other (73 ms apart at 01:00:40), which is consistent with
concurrent threads hitting the same error path simultaneously.
Home Assistant installation type
Your hardware
- OS: Home Assistant OS
- Architecture: amd64
EMHASS installation type
Additional context
- EMHASS version: 0.17.2
- The bug was present in every hour of the attached log (2026-04-28, 00:00 – 09:05).
- After the EMHASS add-on is restarted (via HA watchdog automation), the next call
succeeds — confirming the issue is in-memory/file state that is reset on restart.
- A possible minimal fix in
retrieve_hass.py would be to replace the write pattern
with atomic temp-file semantics:
import tempfile, os
tmp_path = metadata_path + ".tmp"
with open(tmp_path, "wb") as f:
f.write(orjson.dumps(metadata))
os.replace(tmp_path, metadata_path) # atomic on POSIX and Windows
This would not fully prevent a read racing with a write, but would at least ensure
no partial write is ever visible to a reader. A proper fix would additionally add an
asyncio.Lock shared across all post_data() calls for the same metadata file path.
5b918bf2_emhass_2026-04-23T15-10-30.195Z.log
Describe the bug
When two EMHASS optimization calls overlap in time, all internal
post_data()calls (in_publish_standard_forecasts,_publish_deferrable_loads, and_publish_battery_data)read and write
/share/entities/metadata.jsonconcurrently. This causes the file to bewritten by one request while another is reading it, resulting in a corrupted JSON file and
a subsequent crash.
The symptom is a repeating cascade of errors in the EMHASS log:
The
FileNotFoundErroronos.rename()indicates that a parallel thread already deletedthe file in the same error-handling path — confirming this is a race condition between
concurrent requests, not a one-off file corruption.
Note: This analysis was performed with the help of an AI (Claude by Anthropic) via
log file analysis. The root cause description below is based on static code analysis
of the EMHASS source and the log patterns observed. It has not been confirmed by
running a debugger.
To Reproduce
/action/naive-mpc-optimevery hour (:05).await task.sleep(30)after the firstsuccessful POST (e.g. waiting for sensor data to be written back before reading it).
and issues a second POST to EMHASS.
naive_mpc_optimrequests concurrently. Each callinvokes
publish_data(), which internally makes multiple sequentialpost_data()calls. These interleave with each other and corrupt
metadata.json.The cascade then self-perpetuates: each failed POST in the error handler may trigger
another retry POST, which overlaps again with the next scheduled call.
Root cause (AI-assisted analysis)
naive_mpc_optimincommand_line.pycallspublish_data(), which internally calls:_publish_standard_forecasts()→post_data()_publish_deferrable_loads()→post_data()_publish_battery_data()→post_data()Each
post_data()inretrieve_hass.pyreads, modifies, and re-writes/share/entities/metadata.jsonwithout any file-level lock. If twonaive_mpc_optimrequests are handled concurrently (ASGI/Quart allows this), their
post_data()callsinterleave and corrupt the file.
PR #716 (merged in v0.17.0) added a
try/exceptaroundorjson.loads()and attemptsos.rename()on the corrupt file. However, it does not prevent concurrent writes —it only handles the symptom. When two threads hit the error handler simultaneously,
the first thread deletes the file and the second thread's
os.rename()fails withFileNotFoundError, crashing the entire request.Expected behavior
Concurrent
post_data()calls should not corruptmetadata.json. Each write should beatomic. Either:
asyncio.Lock) should serialize allpost_data()callsthat touch the same file, or
os.replace()).Screenshots / Log excerpt
Log excerpt (2026-04-28, EMHASS v0.17.2, HA OS):
The error repeats every hour, and multiple "Corrupted metadata file" messages appear
within milliseconds of each other (73 ms apart at 01:00:40), which is consistent with
concurrent threads hitting the same error path simultaneously.
Home Assistant installation type
Your hardware
EMHASS installation type
Additional context
succeeds — confirming the issue is in-memory/file state that is reset on restart.
retrieve_hass.pywould be to replace the write patternwith atomic temp-file semantics:
no partial write is ever visible to a reader. A proper fix would additionally add an
asyncio.Lockshared across allpost_data()calls for the same metadata file path.