Skip to content

"ClientId is invalid" error after automatic reconnection to broker #446

Open
@tobyfoo

Description

Describe the bug

Connecting to AWS IOT using iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth works fine on the initial connection.

After 24h the broker closed the socket (this is normal and expected) so the client tries to reconnect but fails, forever, with an error: reasonString CONNACK:ClientId is invalid:..., reasonCode 133.

Some investigation:

  • Logging the initial connection event's connack the clientId it looks something like assignedClientIdentifier: '$GEN/8b42676c-3300-4b4d-8c4f-9b75a17f7999' while the failed attempts' event connack states reasonString: 'CONNACK:ClientId is invalid:c91911b9-d401-8fe2-261c-e4374f470aff'. I'm not sure where the $GEN/ is coming from but maybe it's a clue.

Expected Behavior

The re-connection succeeds, just like the initial connection.

Current Behavior

The Mqtt5Client fires a connectionFailure event on re-connection and tries to reconnect forever. Restarting the process/client helps and the new initial connection succeeds immediately.

Reproduction Steps

  • Make sure AWS credentials (like AWS_SECRET_ACCESS_KEY in the env) are present and have the policy AWSIoTDataAccess granted.
  • Put your AWS IoT core endpoint url into the env at BROKER_ENDPOINT
import { iot, mqtt5 } from 'aws-iot-device-sdk-v2';

async function init() {
  const builder = iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth(
    process.env.BROKER_ENDPOINT as string,
    { region: 'eu-central-1' },
  );
  builder.withConnectProperties({ keepAliveIntervalSeconds: 120 });
  const client = new mqtt5.Mqtt5Client(builder.build());

  client.on('error', (err) => { console.log('MQTT error', err); });
  client.on('attemptingConnect', () => { console.log('Attempting Connect event'); });
  client.on('connectionSuccess', (eventData: mqtt5.ConnectionSuccessEvent) => { console.log('Connection Success event', eventData); });
  client.on('connectionFailure', (eventData: mqtt5.ConnectionFailureEvent) => { console.log('Connection failure event', eventData); });

  client.start();
}

init().catch(console.log);

Notice the initial connection succeeding via the console log of connectionSuccess.

Now wait 24h for the broker to close the websocket 😄

...or here is a method to force-close the TCP socket like this (under linux). Maybe there is an easier way by disconnecting the network interface but I haven't tried that.

  • netstat -tnp to find the PID of the node process connected to AWS IoT (foreign address should be port 443 as we are using mqtt over ws)
  • lsof -np $PID where $PID is the PID found via the previous step. Look for the fileDescriptor column, or "FD". Remember the FD of the TCP connection to the broker (the one connected to remote port 443)
  • gdb -p $PID where $PID is from step 1
  • in the gdb console run call close($FD) where $FD is the fileDescriptor obtained by running lsof
  • type quit to exit gdb
  • wait max 2 minutes for the client to heartbeat and realize the socket is closed and attampt to reconnect

Here is my log output, attached as log.txt file.
log.txt

Possible Solution

I read in the docs that the client id is regenerated on re-connection, which is precisely what we want, but maybe it is malformed. Also, the $GEN/ part in the initial connection's id might be a clue.

Additional Information/Context

As for authentication I tried both Fargate task roles and simple access keys in a local test environment, same behavior. The assigned policy was AWSIoTDataAccess which is pretty permissive, so I think I've ruled out an authorization issue.

SDK version used

1.17.0

Environment details (OS name and version, etc.)

Docker container running node:20.9 image, Linux host OS.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.p2This is a standard priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions