Description
Describe the bug
Connecting to AWS IOT using iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth
works fine on the initial connection.
After 24h the broker closed the socket (this is normal and expected) so the client tries to reconnect but fails, forever, with an error: reasonString CONNACK:ClientId is invalid:...
, reasonCode 133
.
Some investigation:
- Logging the initial connection event's connack the clientId it looks something like
assignedClientIdentifier: '$GEN/8b42676c-3300-4b4d-8c4f-9b75a17f7999'
while the failed attempts' event connack statesreasonString: 'CONNACK:ClientId is invalid:c91911b9-d401-8fe2-261c-e4374f470aff'
. I'm not sure where the$GEN/
is coming from but maybe it's a clue.
Expected Behavior
The re-connection succeeds, just like the initial connection.
Current Behavior
The Mqtt5Client
fires a connectionFailure
event on re-connection and tries to reconnect forever. Restarting the process/client helps and the new initial connection succeeds immediately.
Reproduction Steps
- Make sure AWS credentials (like AWS_SECRET_ACCESS_KEY in the env) are present and have the policy
AWSIoTDataAccess
granted. - Put your AWS IoT core endpoint url into the env at BROKER_ENDPOINT
import { iot, mqtt5 } from 'aws-iot-device-sdk-v2';
async function init() {
const builder = iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth(
process.env.BROKER_ENDPOINT as string,
{ region: 'eu-central-1' },
);
builder.withConnectProperties({ keepAliveIntervalSeconds: 120 });
const client = new mqtt5.Mqtt5Client(builder.build());
client.on('error', (err) => { console.log('MQTT error', err); });
client.on('attemptingConnect', () => { console.log('Attempting Connect event'); });
client.on('connectionSuccess', (eventData: mqtt5.ConnectionSuccessEvent) => { console.log('Connection Success event', eventData); });
client.on('connectionFailure', (eventData: mqtt5.ConnectionFailureEvent) => { console.log('Connection failure event', eventData); });
client.start();
}
init().catch(console.log);
Notice the initial connection succeeding via the console log of connectionSuccess
.
Now wait 24h for the broker to close the websocket 😄
...or here is a method to force-close the TCP socket like this (under linux). Maybe there is an easier way by disconnecting the network interface but I haven't tried that.
netstat -tnp
to find the PID of the node process connected to AWS IoT (foreign address should be port 443 as we are using mqtt over ws)lsof -np $PID
where $PID is the PID found via the previous step. Look for the fileDescriptor column, or "FD". Remember the FD of the TCP connection to the broker (the one connected to remote port 443)gdb -p $PID
where $PID is from step 1- in the gdb console run
call close($FD)
where $FD is the fileDescriptor obtained by runninglsof
- type
quit
to exit gdb - wait max 2 minutes for the client to heartbeat and realize the socket is closed and attampt to reconnect
Here is my log output, attached as log.txt file.
log.txt
Possible Solution
I read in the docs that the client id is regenerated on re-connection, which is precisely what we want, but maybe it is malformed. Also, the $GEN/ part in the initial connection's id might be a clue.
Additional Information/Context
As for authentication I tried both Fargate task roles and simple access keys in a local test environment, same behavior. The assigned policy was AWSIoTDataAccess
which is pretty permissive, so I think I've ruled out an authorization issue.
SDK version used
1.17.0
Environment details (OS name and version, etc.)
Docker container running node:20.9 image, Linux host OS.
Activity