Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure all mqtt messages report state to core #140001

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

elupus
Copy link
Contributor

@elupus elupus commented Mar 6, 2025

Proposed change

Core will filter the state of entities and report state changed events in case of changed data and state reported event if no data was changed.

This allow helpers like derivate and integrate to work correctly with constant entities as well as things like last_reported.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

Core will filter the state of entities and report state changed
events in case of changed data and state reported event if
no data was changed.

This allow helpers like derivate and integrate to work correctly
with constant entities as well as things like last_reported.
@home-assistant
Copy link

home-assistant bot commented Mar 6, 2025

Hey there @emontnemery, @jbouwh, @bdraco, mind taking a look at this pull request as it has been labeled with an integration (mqtt) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of mqtt can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign mqtt Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@elupus
Copy link
Contributor Author

elupus commented Mar 6, 2025

This was originally introduced in #100438 but has since been reworked a bit. Was any perf measurements done during that pull?

@frenck
Copy link
Member

frenck commented Mar 6, 2025

@elupus Please update the PR title, we do not use conventional commits.

Thanks 👍

../Frenck

@jbouwh
Copy link
Contributor

jbouwh commented Mar 6, 2025

I'd like to have a closer look at this change before it is merged.

@elupus elupus changed the title fix(mqtt): ensure all messages report state to core Ensure all mqtt messages report state to core Mar 6, 2025
@home-assistant home-assistant bot marked this pull request as draft March 6, 2025 21:41
@home-assistant
Copy link

home-assistant bot commented Mar 6, 2025

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@jbouwh
Copy link
Contributor

jbouwh commented Mar 6, 2025

When a payload arrives an is processed by the entity platform the attribute filter is used to detect if any attribute was changed. If not, the method returns. Adding a state write would cause additional overhead. To add it back where it was in the past would require a major refactoring. It was never intended to always do a state write an all entities when a message comes in.

@elupus
Copy link
Contributor Author

elupus commented Mar 6, 2025

There is an open issue with a test still. Seems the event entity relied on this behaviour in test_handling_retained_event_payloads which i can't fully figure out. Sort of seem it ignored messages that did not change the event data. But sort of did it in the background.

I do understand that this is a performance tradeoff. I kinda do like last_reported info and being able to rely on reported events. I suppose it could be opt-in, but those type of config choices are hard for anybody to get right.

@elupus elupus marked this pull request as ready for review March 6, 2025 22:33
@home-assistant home-assistant bot requested a review from jbouwh March 6, 2025 22:33
Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While getting last_reported is nice, there are so many power sensors (mostly from Z2M) on the market that flood the state machine with data we would effectively be forcing those users to buy better hardware or move to a different solution. I think this should be opt-in.

@home-assistant home-assistant bot marked this pull request as draft March 6, 2025 23:08
@jbouwh jbouwh removed the small-pr PRs with less than 30 lines. label Mar 7, 2025
@erkr
Copy link

erkr commented Mar 7, 2025

BDraco's concern is a valid one. Maybe wise to rate limit 'last_reported' updates to something like max 1 per second per device

@jbouwh
Copy link
Contributor

jbouwh commented Mar 7, 2025

Maybe wise to rate limit 'last_reported' updates to something like max 1 per second per device

Not sure that will be enough, as the MQTT protocol has some overhead in continuesly reporting states and we do not want to write states when they are not changes. For sensors that need consequent writing we have force_update, but even using that I would would not recomment to use.

@erkr
Copy link

erkr commented Mar 7, 2025

Maybe wise to rate limit 'last_reported' updates to something like max 1 per second per device
... force_update, but even using that I would would not recomment to use.

Basically, I'm fine to use something like force_update for specific sensors only (in case of performance impact ).
But currently force_update does more to my understanding. It also updated last_updated and last_changed, even when both the state and attributes didn't alter. That has the side effect it unnecessarily triggers state event used in automations and templates.

From performance perspective I seriously doubt that not updating last_reported solves the performance impact due power devices and quality sensors spamming MQTT:

  • The main hit is in MQTT broker itself and the MQTT integration has to register/receive all topic updates anyway.
  • Second hit is the MQTT integration that compares the content to decide to update the entity state (last_updated and/or last_changed)
  • Only leaving out the last hit;, updating last_reported in case state was not changed, is probably not significant
  • If it is still noticeable on the total penalty (a given), a rate limit is what I proposed

@jbouwh
Copy link
Contributor

jbouwh commented Mar 7, 2025

Another note to make is that a lot of MQTT platforms use value templates. In many cases, there is no active update, when the template returns empty. This PR is not addressing this. E.g.:

if not preset_mode:
_LOGGER.debug("Ignoring empty preset_mode from '%s'", msg.topic)
return

This pattern is seen at many places. Payloads contain details that are to be ignored as they are not in the context of the entity.
The attribute filter tackled that.

@jbouwh
Copy link
Contributor

jbouwh commented Mar 7, 2025

Openened: home-assistant/home-assistant.io#37856

I think we if we want to enable state writes it should be an opt in instead.

@bdraco
Copy link
Member

bdraco commented Mar 7, 2025

From performance perspective I seriously doubt that not updating last_reported solves the performance impact due power devices and quality sensors spamming MQTT

Think about all the objects, syscalls (time and others), dict merges that need to be created to fire EVENT_STATE_REPORTED.
Next think about all the downstream consumers of EVENT_STATE_REPORTED.
Than add in the cost of putting it in the recorder queue.
Than add in the cost of the database update.

And that's far from everything that has to happen to fire that event.

While we have done a great job optimizing all that, very little of it has a native code implementation besides generating the ulid (ulid-transform) and hashing (fnv-hash-fast) in the recorder. The rest is implemented mainly in Python code, which is hard to optimize without rewriting most core objects and event paths into a better-performing language. While that is possible, it would make the core more challenging to maintain. Even if we move all the expensive object creation to something like Cython, the objects still have to be passed back to Python, limiting optimizing for this path.

@erkr
Copy link

erkr commented Mar 7, 2025

@bdraco
You are definitely more knowledgeable. So your option counts.
My remark was a relative one: Frequent topic updates will put a lot of extra stress on the MQTT broker (add-on on the same system in my case) and bog down handling of all the related updates by HA MQTT integration. That part is doing expensive JSON or string parsing.
I was assuming the state_reported will remain in the shadow of that.

@elupus
Copy link
Contributor Author

elupus commented Mar 8, 2025

If we want to make it optional it should be opt out of state reported not opt in. If we have a feature that by default large integrations ignore, nothing can rely on it.

We could maybe default it to on, but on upgrade default it to off. Ie new users get it on by default. Old users stay on off untill they enable.

We could also remember all unchanged entities and trigger write on those with a delay, but that get complicated fast.

@bdraco
Copy link
Member

bdraco commented Mar 8, 2025

If we want to make it optional it should be opt out of state reported not opt in. If we have a feature that by default large integrations ignore, nothing can rely on it.

That's already the case, as what constitutes a reported state will vary widely between each integration. A typical pattern for integrations and libs is suppressing duplicate data when nothing changes. Suppose I think about the integrations I use that do this. In that case, it includes: nearly all passive Bluetooth devices, anything that relies on Bluetooth RSSI updates, Bluetooth advertisements that don't change (some of that happens at the BlueZ level), HomeKit accessory protocol updates, and ESPHome does it on device for various sensor reads as well as the integration. That's far from an exhaustive list, and only the most common ones I can think of off the top of my head.

@elupus
Copy link
Contributor Author

elupus commented Mar 8, 2025

Im leaning towards abandoning this change. jbouwh might have something better when it comes to opting in in the works.

However on a tangent. Should there be a capabilities flag or something to indicate if an entity can report last reported? We could indicate in gui based on that and helpers could take it into account. What do you guys think?

@emontnemery
Copy link
Contributor

emontnemery commented Mar 10, 2025

Not suppressing state changes is today mainly of interest to sensor helpers to ensure correct calculations since we don't show the last_reported timestamp in frontend.
I think we should thus do this only for numerical mqtt sensors, not for all mqtt platforms. A flag to opt-in as suggested before by @bdraco and @jbouwh could make sense, but it seems like almost too much detail.

We could also make this an entity option, configurable by the user via the more details dialog.

there are so many power sensors (mostly from Z2M) on the market that flood the state machine with data we would effectively be forcing those users to buy better hardware or move to a different solution

What does flood mean in this case?

@emontnemery
Copy link
Contributor

emontnemery commented Mar 11, 2025

@bdraco You mention this work needs to be done:

Think about all the objects, syscalls (time and others), dict merges that need to be created to fire EVENT_STATE_REPORTED.

I agree with this concern, although we try to do this as lazily as possible, for example only creating some objects if there are subscribers to EVENT_STATE_REPORTED etc.

Next think about all the downstream consumers of EVENT_STATE_REPORTED.

We have implemented a lot of restrictions on subscribing to EVENT_STATE_REPORTED:

  • EVENT_STATE_REPORTED is not included when subscribing to all events
  • When subscribing to EVENT_STATE_REPORTED a filter function needs to be supplied (typically to subscribe to EVENT_STATE_REPORTED only for a single entity
  • It's not allowed to use EVENT_STATE_REPORTED to trigger automations or trigger based template entities

Add in the cost of putting it in the recorder queue.
Add in the cost of the database update.

True, but as you also mention, this is very highly optimized. For starters, STATE_REPORTED events are not even recorded. The recorder instead bumps the last_reported timestamp of the last known state on state changes:

states_manager.update_pending_last_reported(
old_state_id, old_state.last_reported_timestamp
)

This means there is at worst a doubling of database accesses (if every state change needs the previous state's last_reported timestamp to be updated), no matter how many redundant state writes an entity causes.

I don't mean the concerns are not valid, I just want to point out that we have spent a lot of effort on minimizing the impact of spammy entities.

@bdraco
Copy link
Member

bdraco commented Mar 11, 2025

Think about all the objects, syscalls (time and others), dict merges that need to be created to fire EVENT_STATE_REPORTED.

I agree with this concern, although we try to do this as lazily as possible, for example only creating some objects if there are subscribers to EVENT_STATE_REPORTED etc.

I'm more concerned about this part of the update cycle than the other parts that we spent all the time optimizing.

Specifically __async_calculate_state, _async_write_ha_state and async_set_internal are almost always the top hitters in the profiles I get from MQTT users.

I don't have a good way to optimize those without invasive changes (ie Cython, Rust or something similar).

The database update is still mildly concerning as well because updates are so much more expensive than inserts since they have to lookup old data as well.

@bdraco
Copy link
Member

bdraco commented Mar 11, 2025

For some perspective I took a non-scientific look at a what would happen if the duplicate state writes suppression was removed from the most common push integrations I use:

Passive Bluetooth entities - additional ~700 state writes per minute
HomeKit Controller - additional ~1200 state writes per minute
Unifi Protect - additional ~7200 state writes per minute (I have multiple cameras on a busy street that have many sensors which fire ~1600 events per minute)

Additionally I have unifi (network) integration disabled because it generates far too many state writes already and produces a noticeable lag on my Blue when it's turned on with a network size of ~300 devices.

@bdraco
Copy link
Member

bdraco commented Mar 11, 2025

There is a great discussion in #140318 (comment) that is relevant to this

The problem with these types of changes is that we really don’t know what the stability and performance impact on users will be, as we have no telemetry data. As a result, we have to rely on past experiences and data that users have manually provided when they encountered issues which is inherently a small sample size. This makes it difficult to make an informed decision, as there will always be users who end up being pushed over the edge as we try to gather more data. Without telemetry, it’s hard to know how many people will be negatively impacted, making it tough to make a well-informed choice. Ultimately, we have to rely on the assumption that we didn’t overwhelm too many users with each change, and our users find out the hard way when we’re wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants