Currently when one subsystem fails to process the blockbeat in time we exit and don't thread the block through other subsystems if the block processing of the subsystem depends on each other. We log an error but we do act on it but hope that it resolves itself. However I think we need to be more rigorous here and create a retrying system which after x attempt if not able to process the block shuts the daemon done because this prevents hidden bugs where we are not able to process the beat in time.
Detailed analysis and design proposal will come when I have a bit more time.