BT lockless queue#867
Conversation
|
An important note: Again, this doesn't seem to be a real issue as most people won't set TimeBtwRead to 1 and OMG instantly reconnects. |
|
Thanks for the PR, I did not had time to get my hands into it but will do it soon! |
|
I'm wondering if we should not add also into the queue the message processed by It could be the cause of the wifi disconnection below: The gateway is configured here with BT, RF and IR. |
|
@Legion2 would you be interested in reviewing this PR? |
Legion2
left a comment
There was a problem hiding this comment.
I can not make a comment on the functional aspects of this PR, but besides from that the code looks good to me.
No problem, I take it in charge.
Thanks |
... You are right! # ifdef subjectHomePresence
if (advertisedDevice->haveRSSI() && !publishOnlySensors) {
haRoomPresence(BLEdata); // this device has an rssi and we don't want only sensors so in consequence we can use it for home assistant room presence component
}
# endif
and tries to use the buffer element for direct wifi sending: I can imagine that haRoomPresence in the BLEAdvertisedDeviceCallbacks code should just add all the necessary tags (distance etc.). After that the PublishDeviceData (-> pubBT) -> pubBTMainCore would detect that a pub_custom_topic also should be made to "home_presence". |
|
Hmmm. Could you please try out the following changes: void pubBTMainCore(JsonObject& data) {
String mac_address = data["id"].as();
mac_address.replace(":", "");
String mactopic = subjectBTtoMQTT + String("/") + mac_address;
pub((char*)mactopic.c_str(), data);
if (data.containsKey("distance")) {
data.remove("servicedatauuid");
data.remove("servicedata");
String topic = String(Base_Topic) + "home_presence/" + String(gateway_name);
pub_custom_topic((char*)topic.c_str(), data, false);
}
}
# ifdef subjectHomePresence
void haRoomPresence(JsonObject& HomePresence) {
int BLErssi = HomePresence["rssi"];
Log.trace(F("BLErssi %d" CR), BLErssi);
int txPower = HomePresence["txpower"] | 0;
if (txPower >= 0)
txPower = -59; //if tx power is not found we set a default calibration value
Log.trace(F("TxPower: %d" CR), txPower);
double ratio = BLErssi * 1.0 / txPower;
double distance;
if (ratio < 1.0) {
distance = pow(ratio, 10);
} else {
distance = (0.89976) * pow(ratio, 7.7095) + 0.111;
}
HomePresence["distance"] = distance;
}
# endif
I already poluted my code somewhat with an extra delay 5: // should run from the main core
void emptyBTQueue() {
for (bool first = true;;) {
int next = atomic_load_explicit(&jsonBTBufferQueueNext, ::memory_order_seq_cst); // use namespace std -> ambiguous error...
int last = atomic_load_explicit(&jsonBTBufferQueueLast, ::memory_order_seq_cst); // use namespace std -> ambiguous error...
if (last == next) break;
if (first) {
int diff = (2 * BTQueueSize + last - next) % (2 * BTQueueSize);
btQueueLengthSum += diff;
btQueueLengthCount++;
first = false;
delay(5); // Lets not use the shared antenna yet and wait 5ms for more BT messages
}
pubBTMainCore(*jsonBTBufferQueue[next % BTQueueSize].object);
atomic_store_explicit(&jsonBTBufferQueueNext, (next + 1) % (2 * BTQueueSize), ::memory_order_seq_cst); // use namespace std -> ambiguous error...
}
}
Still got a reconnect, but after 9 mins, instead of the usual 2-5 mins. |
It is working better indeed!
Thanks for pointing this, I think we should add a note in the doc regarding this (minRssi impact presence detection...) Another point, should we remove the |
|
We can keep the minRssi agnostic behavior if you need; we can make the PublishDeviceData to call pubBT if "distance" present and also make the pubBTMainCore to skip the simple pub call if the rssi restriction is not met. Regarding start and stopProcessing: yes, we shouldn't need those anymore for standard cases. Should I add the changes to this branch? |
Yes could be nice, especially for the people that doesn't read the docs ;-)
Yes please |
| @@ -676,14 +683,14 @@ void coreTask(void* pvParameters) { | |||
| Log.trace(F("BT Task running on core: %d" CR), xPortGetCoreID()); | |||
| if (!ProcessLock) { | |||
There was a problem hiding this comment.
is this if not redundant with the condition added into the while loop and of the else if?
There was a problem hiding this comment.
If there was a connection phase before that we may already waited seconds and the validity of ProcessLock long gone.
Also, we don't want to take too much time in anything that has delay() in it.
I even added a delay(Scan_duration < 2000 ? Scan_duration : 2000); to the stopprocessing: this gives more time to leave the BT loop gracefully. Unfortunately waiting to much will result in a timeout at platformio.
I needed these as I am more aggressive in the BT parameterwise and I am able to kill the OTA easily (ofc with the lockless queue)...
|
As you can see, I have updated the branch, but I have a big issue, and I don't know why is that happening. (we have a bug here with the value log with %D) But, with the new lockless codepath only the passive mode works (similarly to the development branch), the active ble tracking results with logs like: I see the data there, I need to check it what happens. |
|
Actually, it's like one char less? |
|
Got it: |
|
I will need one more day, it's 00:51 (24h) here... :) |
|
No problem, have a good night |
|
This was way harder than expected. [02:31:07]T: Buffersize for 0 : 432 -> 501 -> 543
...
Received json : {"id":"58:2D:34:3A:78:0C","name":"MJ_HT_V1","rssi":-85,"distance":15.12852,"model":"LYWSDCGQ","tem":22.4,"tempc":22.4,"tempf":72.32,"hum":44.2,"servicedata":"5020aa01710c783a342d580d1004e000ba01","servicedatauuid":"0xfe95"}
|
|
We are running an old version of arduinoJson. It is maybe time to upgrade.
Thanks for pointing this. We may add it int the comment of config_BT.h |
|
It seems like Arduino 6+ would need even bigger buffers than 5.13, at least from this: |
|
If you are ok, I propose to keep it like that and add a comment into the config_BT.h. Do you think we are ready to merge? |
|
After some hours here is what I get: {"uptime":43087,"version":"version_tag","freemem":176472,"rssi":-49,"SSID":"","ip":"192.168.1.2","mac":"C4","wifiprt":0,"lowpowermode":0,"btqblck":4,"btqsum":2369,"btqsnd":2366,"btqavg":1.001268,"interval":55555}Is it coherent with your results also? |
|
Yes, how do you have so much free mem? |
|
It varies depending on the measure moment I think: {"uptime":43687,"version":"version_tag","freemem":144312,"rssi":-49,"SSID":"","ip":"192.168.1.25","mac":"C4"wifiprt":0,"lowpowermode":0,"btqblck":4,"btqsum":2404,"btqsnd":2401,"btqavg":1.001249,"interval":55555} |
|
This is new (so far hidden?): [13:10:36]T: Pub json :{"id":"A4:C1:38:77:7C:52","name":"ATC_777C52","rssi":-70,"distance":3.472448,"model":"LYWSD03MMC_ATC","tempc":23.6,"tempf":74.48,"hum":42,"batt":77,"volt":2.894} into custom topic: home/home_presence/OpenMQTTGateway_ESP32_BLE
[13:10:57]W: MQTT connection...
[13:11:15][E][WiFiClient.cpp:258] connect(): socket error on fd 56, errno: 113, "Software caused connection abort"
[13:11:15]W: failure_number_mqtt: 1
[13:11:15]W: failed, rc=-2
[13:11:20]W: disconnection_handling, failed 1 times
[13:11:20]W: Attempt to reinit wifi: 0
And now stuck mqtt disconnected. [EDIT] |
Yes try to remove this line, I suspect it causes more problem than solutions: Line 899 in 4431af4 |
|
Without the WifiManualSetup, it works perfectly (something to fix after this... ;) ). |
|
let's go! |
* Lockless json queue for ESP32 BT to avoid SYStoMQTT concurrency issue * BT haRoomPresence handle with lockless queue
Lockless queue for ESP32 BT as discussed.
Note, in theory the dumpDevices also could crash at MQTTtoBT, didn't touched that (yet).