-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enphase Envoy contant disconnect from Envoy R #126162
Comments
Hey there @bdraco, @cgarwood, @joostlek, @catsmanac, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) enphase_envoy documentation |
When I enabled DEBUG logs, it seems the integration is wrongly trying to use If the integration has detected we are running firmware D3.18.10, then it should only make use of the following APIs:
|
Hi @luis-castro, You are correct about the 2 API's it should use. Before I dive into the issue, I've got some question to assist with the troubleshooting.
Does it ever restore from unavailable without restarting Home-Assistant?
Looking at the information you provided (also see below) it seems as if the /api/v1/production/inverters is not reliably providing data. If this occurs during normal collection runs, the data will go to unavailable and come back when it works again. If this happens during a Home Assistant restart, the data will be considered as unavailable on the Envoy and not even be collected (some versions don't have this data). If this happens you don't need to restart Home Assistant completely. If you open settings / devices & services / Enphase Envoy you can use the menu to reload the Envoy Integration. That will restart the communication and retry building all connections. This may work or not, but it avoids the whole restart of Home Assistant. And no, it's not a solution to the issue, just a simpler way to try to get it going again. debug logAs for the debug log details, it shows is the startup sequence during which it determines what is available or not. This is not primary driving by the firmware version, but rather by the responses of the Envoy. This will include some failures depending on the Envoy model and firmware. It received a successful reply for the /api/vi/production 2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.envoy] Request reply in 0.0 sec from http://192.168.1.15/api/v1/production status 200: application/json b'{\n "wattHoursToday": 3272,\n "wattHoursSevenDays": 26896,\n "wattHoursLifetime": 15043584,\n "wattsNow": 49\n}\n' and as a result it will use this for production data. The lines right below it is still part of the inspection and will lead to not trying to use the /production.json endpoint. 2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.json] Unable to decode response from Envoy endpoint /production.json?details=1: unexpected character: line 1 column 5 (char 4)
2024-09-17 18:06:15.116 DEBUG (MainThread) [pyenphase.updaters.production] Production endpoint not found at /production.json?details=1: unexpected character: line 1 column 5 (char 4) The result is also reflected in the diagnostic file where both end-points are shown used. Raw_data is the data that was collected from the Envoy in the data last collection before the diagnostic report was created. "raw_data": {
"/api/v1/production": {
"wattHoursToday": 3260,
"wattHoursSevenDays": 26887,
"wattHoursLifetime": 15043572,
"wattsNow": 46
},
"/api/v1/production/inverters": [
{
"serialNumber": "121726021970",
"lastReportDate": 1726616739,
"lastReportWatts": 11,
"maxReportWatts": 218
},
{
"serialNumber": "121726033042",
The next section trying to get the data from the inverters shows some issue though. No data is returned by the Envoy. 2024-09-17 18:06:15.117 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:16.801 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:19.201 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.839 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.879 DEBUG (MainThread) [pyenphase.updaters.api_v1_production_inverters] Production endpoint not found at /api/v1/production/inverters: Server disconnected without sending a response. What can be seen is 4 connection attempts that fail right away and with varying wait time of 1-3 seconds in between. As a result the inverter data will probably not be collected after this restart and it would need another restart/reload on this Envoy in Home Assistant to try again. The reported error |
Hi @catsmanac, Thanks for taking a look at it. Answering your question: Yes, it has been working for at least a year. It started failing around 2 months ago. I believe the issue is in the pyenphase library itself. I've been playing around and have the following findings:
I can keep this repeating with no issues.
Here is the modification I applied to
However, when I called envoy.update(), I got this:
It seems to me the library is not hitting the API with Digest Authentication (but I am not 100% certain of that). For reference this is the Python script I am using for this test:
What are your thoughts about this? |
It seems I was wrong about the library not using Digest Authentication, however something is not working. Here is a complete dump of the output. I still don't understand what the problem is:
BTW: As this is not a home-assistant issue per se, let me know if I should file an issue with the pyenvoy team. |
So your test program shows the issue as well. In the end it's probably a pyenhase library issue, before we move there I want to verify some things. I'm member of the pyenphase team as well, so we can switch when needed. In your test code can you change start of main to: from pyenphase import register_updater
from pyenphase.envoy import get_updaters
from pyenphase.updaters.base import EnvoyUpdater
from pyenphase.updaters.production import EnvoyProductionJsonUpdater, EnvoyProductionUpdater, EnvoyProductionJsonFallbackUpdater
logging.basicConfig(level=logging.DEBUG)
async def main():
updaters: list[type[EnvoyUpdater]] = get_updaters()
if EnvoyProductionJsonUpdater in updaters:
_LOGGER.debug("Removing EnvoyProductionJsonUpdater from Pyenphase")
updaters.remove(EnvoyProductionJsonUpdater)
if EnvoyProductionUpdater in updaters:
_LOGGER.debug("Removing EnvoyProductionUpdater from Pyenphase")
updaters.remove(EnvoyProductionUpdater)
if EnvoyProductionJsonFallbackUpdater in updaters:
_LOGGER.debug("Removing EnvoyProductionJsonFallbackUpdater from Pyenphase")
updaters.remove(EnvoyProductionJsonFallbackUpdater)
envoy = Envoy("192.168.1.15")
print('Calling envoy.setup()...') This will remove the production endpoint updaters and prevent reading the production pages at startup probe to see if that removes the issue or not. Let it start and run for 2 or 3 cycles and then upload the debug log here. |
Done. BTW: I got an error with Here is the output: |
Can try with username='envoy' as well? |
Sure, here is the output, but there is no change: Anyway, I think "installer" should work as per my tests with curl. |
Agree Ok, well not ok yet, on to next step. Let's eliminate the meters and tariff collection. Should only leave info, v1/production and inverters for collection. Add from pyenphase.updaters.meters import EnvoyMetersUpdater
from pyenphase.updaters.tariff import EnvoyTariffUpdater and if EnvoyMetersUpdater in updaters:
_LOGGER.debug("Removing EnvoyMetersUpdater from Pyenphase")
updaters.remove(EnvoyMetersUpdater)
if EnvoyTariffUpdater in updaters:
_LOGGER.debug("Removing EnvoyTariffUpdater from Pyenphase")
updaters.remove(EnvoyTariffUpdater) right after the already added sections |
Applied the modifications and ran the new test. See attached log: For reference, this is how the test code looks now: import asyncio
from pyenphase import Envoy, EnvoyData
from pyenphase import register_updater
from pyenphase.envoy import get_updaters
from pyenphase.updaters.base import EnvoyUpdater
from pyenphase.updaters.production import EnvoyProductionJsonUpdater, EnvoyProductionUpdater, EnvoyProductionJsonFallbackUpdater
from pyenphase.updaters.meters import EnvoyMetersUpdater
from pyenphase.updaters.tariff import EnvoyTariffUpdater
import logging
logging.basicConfig(level=logging.DEBUG)
async def main():
updaters: list[type[EnvoyUpdater]] = get_updaters()
if EnvoyProductionJsonUpdater in updaters:
logging.debug("Removing EnvoyProductionJsonUpdater from Pyenphase")
updaters.remove(EnvoyProductionJsonUpdater)
if EnvoyProductionUpdater in updaters:
logging.debug("Removing EnvoyProductionUpdater from Pyenphase")
updaters.remove(EnvoyProductionUpdater)
if EnvoyProductionJsonFallbackUpdater in updaters:
logging.debug("Removing EnvoyProductionJsonFallbackUpdater from Pyenphase")
updaters.remove(EnvoyProductionJsonFallbackUpdater)
if EnvoyMetersUpdater in updaters:
logging.debug("Removing EnvoyMetersUpdater from Pyenphase")
updaters.remove(EnvoyMetersUpdater)
if EnvoyTariffUpdater in updaters:
logging.debug("Removing EnvoyTariffUpdater from Pyenphase")
updaters.remove(EnvoyTariffUpdater)
envoy = Envoy("192.168.1.15")
logging.debug('Calling envoy.setup()...')
await envoy.setup()
print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')
logging.debug('Calling envoy.authenticate()...')
await envoy.authenticate(username="installer")
while True:
logging.debug('Calling envoy.update()...')
data: EnvoyData = await envoy.update()
print(f'TodaysEnergy: {data.system_production.watt_hours_today}')
for inverter in data.inverters:
print(f'{inverter} SN: {data.inverters[inverter].serial_number}')
print(f'{inverter} W: {data.inverters[inverter].last_report_watts}')
print('Waiting 10 seconds...')
await asyncio.sleep(10)
if __name__ == "__main__":
logging.debug('Calling main...')
asyncio.run(main()) |
I thought the Envoy was blacklisting my IP, but I ran curl from the same machine I'm running python and it works:
|
So behavior is pretty consistent with access denied on the inverters endpoint.
Any idea what version upgrade that was? I.e. if you made any backup of home assistant before upgrades you can see version of the backup in you backup list. I'll do some search in what all we changed in pyenphase around that time, since like 3 months ago. That will be tomorrow as hours in the day are running out quickly here now. |
Unfortunately the earliest backup I keep is from August 24. Back then I was running Home Assistant 2024.8.1 (same version I'm currently running), but I think I applied updates to some add-ons. I appreciate all your efforts! Let me know if more information or tests are needed. If you need access to this Envoy, I can arrange that too. |
So on September 6 when the issue started you were running 2024.8.1 that you installed before August, 24. That version became avaialble on August 10. Yesterday, Sep 18 you upgraded to 2024.9.2. The last changes to the Envoy communication itself was released in 2024.8.0, which was an update in pyenphase to report some more data, but that data was already available. For the moment no direct leads to anything suspicious right away. Looking at all information again I noted you used a password: $ curl -u installer:REDACTED --digest But no password with the Authentication in the test program. If that is an actual password that that is another test to do.: await envoy.authenticate(username="installer",password="REDACTED") |
And a simple test to just read the inverters using pyenphase import asyncio
import httpx
from pyenphase import Envoy, EnvoyData
import logging
logging.basicConfig(level=logging.DEBUG)
async def main():
envoy = Envoy("192.168.1.15")
logging.debug('Calling envoy.setup()...')
await envoy.setup()
print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')
logging.debug('Calling envoy.authenticate()...')
await envoy.authenticate(username="installer")
myresponse: httpx.Response = await envoy.request('api/v1/production/inverters')
status_code = response.status_code
print(status_code)
print(myresponse)
if __name__ == "__main__":
logging.debug('Calling main...')
asyncio.run(main()) |
Hello!!
I got that password by calling the function If I understand the code right, pyenphase uses the same method when the user is Here is the output of the test (I had to change the request parameter, because it was lacking the initial I even tried using the password: await envoy.authenticate(username="installer", password="REDACTED") But with the same results. |
Something very strange happened: I added this to line 251 of _LOGGER.debug("Authenticating with: %s:%s", self.auth.local_username, self.auth.local_password) After that I am getting results from the inverters after a few retries. Attaching the new output: The results are not 100% reproducible: sometimes it takes just 1 retry to get the results, others 4 retries are attempted and it finally crashes. |
That statement also introduces some delay I guess. How about adding import time and a sleep before the request in the test program. time.sleep(5)
myresponse: httpx.Response = await envoy.request('api/v1/production/inverters')
status_code = response.status_code |
Same results: Sometimes it works, other it doesn't. However, when using curl it always works with no delay at all. I suspect the httpx object is related somehow, some additional configuration is required. I'll try to run a packet capture and compare both curl and the test script to look for any differences. |
I've checked and httpx is on version 0.27.0 for over 6 months. No recent change. Doesn't say it's not an httpx issue but not a lead either. You probably did this already, but just checking anything that may help, did you power cycle the Envoy already? |
Hi @catsmanac, here is a promising finding: When using httpx.get in interactive mode, it fails the first time:
However, if I repeat it, now it works, no matter how many times I send the request:
I haven't tried to reboot the Envoy, but will try that later. Does this help to implement a change in pyenphase? |
pyenphase is already trying multiple times. The @retry(
retry=retry_if_exception_type(
(
httpx.NetworkError,
httpx.TimeoutException,
httpx.RemoteProtocolError,
orjson.JSONDecodeError,
)
),
wait=wait_random_exponential(multiplier=2, max=5),
stop=stop_after_delay(MAX_REQUEST_DELAY)
| stop_after_attempt(MAX_REQUEST_ATTEMPTS),
reraise=True,
)
async def request( which will perform the request again in case of any of the 4 errors listed in the decorator until the MAX_REQUEST_DELAY (50sec) time elapsed and no more then MAX_REQUEST_ATTEMPTS (4) attempts. Between each attempt there's a random wait time. In your first debug log these repeats actually show 2024-09-17 18:06:15.117 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:16.801 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:19.201 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0)
2024-09-17 18:06:20.839 DEBUG (MainThread) [pyenphase.envoy] Requesting http://192.168.1.15/api/v1/production/inverters with timeout Timeout(connect=10.0, read=45.0, write=10.0, pool=10.0) So it's not that straight forward. Not sure why this is not working and throwing these errors for the inverters. Nor is there a real understanding what changed that caused this to fail after Sep 7. What was the error you got on the first attempt in your last try? Again the RemoteProtocolError? |
Yes, the same error: I ran a packet capture running curl, httpx and the test with pyenphase. With
The exact same behavior is observed with When using the script with pyenphase, this is the sequence:
(see attached: capture-pyenphase.pcapng`) I conclude Envoy expects the call |
The problem is HTTPX is reusing the same connection for following authentication requests after getting a 401. The Envoy does not like that. So I modified the initialization code of the Envoy class in self._client = client or httpx.AsyncClient(
verify=NO_VERIFY_SSL_CONTEXT,
limits=httpx.Limits(max_keepalive_connections=0)
) # nosec Now the testing code in #126162 (comment) works flawlessly. However, when I applied this to the homeassistant container, I'm still getting the same problem. I'm sure you'll have a better idea on how to implement this change (disabling keepalives) only when reading the inverters endpoint. What do you think? |
Thanks for all that research @luis-castro, excellent data-set. Your analysis on what's happening is to the point. Dropping keep-alive altogether may not be without potential impact. Although it seems a solution, it may be only a band-aid for an underlying (httpx) issue, not sure yet. Needs some taught on how best to proceed. What is still puzzling is the sudden change after September 6.
The RemoteProtocolError probably indicates a closed connection on the Envoy side triggering HTTPX to open a new one, but its not using the already received auth information ending in the same cycle of events. I'd still be interested to see if an Envoy reboot changes anything. How this is going, probably not... To use a modified lib in the container you need to use current version: uv pip freeze | grep pyenphase
pyenphase==1.22.0 local version pip freeze | grep pyenphase
-e local version For testing I use a Home-Assistant dev container. As for the client, when using the pyenphase library one can provide its own client, thus overriding the default client used. That would allow your (test) program to avoid keep-alive without the need to change to library. my_client = httpx.AsyncClient(
verify=NO_VERIFY_SSL_CONTEXT,
limits=httpx.Limits(max_keepalive_connections=0)
)
envoy = Envoy("192.168.1.15",client=my_client)
print('Calling envoy.setup()...')
await envoy.setup()
print(f'Envoy: {envoy.host}\nFirmare: {envoy.firmware}\nSerial: {envoy.serial_number}\n')
print('Calling envoy.authenticate()...')
await envoy.authenticate(username="installer")
print('Calling envoy.update()...')
data: EnvoyData = await envoy.update()
print(f'TodaysEnergy: {data.system_production.watt_hours_today}')
for inverter in data.inverters:
print(f'{inverter} SN: {data.inverters[inverter].serial_number}')
print(f'{inverter} W: {data.inverters[inverter].last_report_watts}') |
Hi @catsmanac! Thank you very much for all your input. I'm glad you found my analysis useful. To be honest, these last 3 days I've learned a lot. Regarding rebooting the Envoy. I did that last night, but there was no change. Also, I applied the modifications in the container by directly editing the file Since I modified the file, the inverters information has been more stable (with some dropoffs still), but I'm not sure this is the best approach. If you agree, let's keep this Issue open for a while, in case other users report having the same problem. Additionally, at the Enlighten dashboards, I see a "Ensemble Update" event on September 7. Not sure if it's related or not, but I suspect something changed in the firmware recently. Anyway, I'll try to figure out a more elegant solution and will share any insights here. Thanks again. |
Not closing this one yet indeed. Too much still unclear. The Envoy getting a new firmware could be an explanation. But if so it's troubling if the behavior changed that much. Wouldn't be the first time though. The fact that there's still some drop-outs isn't satisfying either. No hurry, but I'd be interested in a fresh debug log with some failures in it. |
Hi @catsmanac! So I modified the enphase_envoy initialization code to use a custom client, and now the integration has been running for over an hour with no single glitch. Not sure what the implications are of using my own client instead of calling hass's Here are the changes I applied, in case you'd like to give it a try and/or enhance it: --- core/homeassistant/components/enphase_envoy/__init__.py.bak 2024-09-20 16:20:03.976546179 -0600
+++ core/homeassistant/components/enphase_envoy/__init__.py 2024-09-20 16:21:12.176602804 -0600
@@ -13,12 +13,13 @@
from .const import DOMAIN, PLATFORMS
from .coordinator import EnphaseConfigEntry, EnphaseUpdateCoordinator
+import htppx
async def async_setup_entry(hass: HomeAssistant, entry: EnphaseConfigEntry) -> bool:
"""Set up Enphase Envoy from a config entry."""
host = entry.data[CONF_HOST]
- envoy = Envoy(host, get_async_client(hass, verify_ssl=False))
+ envoy = Envoy(host, client=httpx.AsyncClient(verify=False, limits=httpx.Limits(max_keepalive_connections=0)))
coordinator = EnphaseUpdateCoordinator(hass, envoy, entry)
await coordinator.async_config_entry_first_refresh() |
Is that last change using the original pyenphase code again or still a modified pyenphase? Your last change is part of one of the options on my list how to implement this once we have confirmed the results. This change will also change behavior of your second envoy running 7.3.617 and when implemented as-is on all communication. Therefore a fix will probably need to be an option that can be enabled as needed or automatically. I'll get you an example how that can be done later.
To see if there was an Envoy firmware update you can open a backup file. Buried in there will be the .storage folder with the core.device_registry file. In that file there's an entry for the Envoy which shows the firmware in |
Overview of current findings: Envoy firmware 3.18.10 observations show that obtaining data from the endpoint /api/v1/production/inverters is prone to failures with connection pool enabled. Until now this is the only firmware reporting this. Observations:
Side-obeservations:
Test shows that setting the client to As for solutions available:
For feasibility and complexity:
For now I'd propose to discuss the issue with HTTPX, prepare a HA PR to implement 1. as bug fix for now, until HTTPX solutions comes available, if ever. Change can be reverted then. |
@luis-castro, I've started a discussion with httpx team as well. Can you provide another packet capture showing some packets where all is working fine in your new setup? |
Some code to try, with configure option. Haven't been able to fully test yet. Reload the integration after changing the bottom option. |
It is using the original pyenphase code.
Sure! Here is a packet capture taken from my dev environment with the modified
Thank you!!! Will try out this later today and let you know the results. |
Any luck testing this? |
Hello! I'm testing it right now. I enabled the "disable_keep_alive" option and so far it is working with no issues. I wasnt' sure how to upload the files, so this how I installed the files:
Let me know if there is a more appropriate procedure to do it. |
Thanks @luis-castro, delay is no problem at all, we all have day things to do. The process you used is fine, you got the "disable_keep_alive" option and as that is working fine it is activated. Keep in mind that when you upgrade to the just released 2024.10 HA version it will overwrite your changes! I'll prepare a code update for a future release. Thanks for all your testing !!!. |
The problem
My Envoy Enphase integration is configured with 2 gateways: 1 is running firmware D3.18.10 (Envoy R), and the other is running firmware 7.3.617 (Envoy IQ).
The integration is constantly losing data from the inverters connected to the Envoy R, showing them as Unavailable.
This is sometimes solved by restarting homeassist, others I have to delete the integration and add it again.
What version of Home Assistant Core has the issue?
core-2024.8.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
enphase_envoy
Link to integration documentation on our website
https://www.home-assistant.io/integrations/enphase_envoy
Diagnostics information
config_entry-enphase_envoy-01J815RY1A02ECBF94NKK8NKNV (1).json
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response
The text was updated successfully, but these errors were encountered: