- 
                Notifications
    
You must be signed in to change notification settings  - Fork 4.5k
 
Open
Description
Overview of the Issue
Upgrading from 1.12.9 to any higher version leads to a failure to start the cluster with a snapshot issue
Tried the upgrade to 1.13.9, 1.14.11 and 1.15.10 (Did all the necessary configuration adjustments before)
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Reproduction Steps
Steps to reproduce this issue, eg:
- We have a single node, as it's not our production environment
 - Adjust the config files under /etc/consul directory to match the version you want to upgrade
 - Run apt install consul=version
 - See the error in the log down there
 
Consul info for both Client and Server
Client info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
{
  "log_level": "TRACE",
  "enable_syslog": true,
  "log_file": "/var/log/consul/consul.log",
  "syslog_facility": "LOCAL0",
  "log_rotate_duration": "24h",
  "log_rotate_max_files": 3,
  "enable_script_checks": false,
  "server_name": "integration-dt-ci",
  "datacenter": "integration",
  "primary_datacenter": "integration",
  "bind_addr": "172.30.8.179",
  "client_addr": "127.0.0.1",
  "data_dir":"/var/lib/consul",
  "tls": {
    "defaults": {
      "key_file": "/etc/consul/ssl/integration-dt-ci.key",
      "cert_file": "/etc/consul/ssl/integration-dt-ci.pem",
      "ca_file": "/etc/consul/ssl/ca.pem",
      "verify_incoming": true,
      "verify_outgoing": true
    },
    "internal_rpc": {
      "verify_server_hostname": true
    },
    "grpc": {
      "verify_incoming": false,
      "use_auto_cert": true
    }
  },
  "enable_central_service_config": true,
  "enable_local_script_checks": true,
  "ui_config": {
    "enabled": true,
    "metrics_provider": "prometheus",
    "metrics_proxy": {
       "base_url": "http://prometheus.service.consul:9090"
    }
  },
  "connect": {
      "enabled": true
  },
  "addresses": {
      "http": "{{ GetAllInterfaces | include \"flags\" \"loopback\" | join \"address\" \" \" }} {{ GetInterfaceIP \"nomad\" }}"
  },
  "ports": {
    "grpc": 8502,
    "grpc_tls": 8503
  },
  "acl": {
    "enabled": false,
    "default_policy": "deny",
    "down_policy": "extend-cache",
    "enable_token_persistence": true,
    "enable_token_replication": true,
    "tokens": {
      "agent": ""
    }
  },
  "limits": {
    "http_max_conns_per_client": 2000
  }
}
Server info
consul info
Error querying agent: Get "http://127.0.0.1:8500/v1/agent/self": dial tcp 127.0.0.1:8500: connect: connection refused
Server agent HCL config
Operating system and Environment details
VM on GCP running with Debian10 managed via terraform and Puppet
Log Fragments
Sep 19 14:31:39 integration-dt-ci systemd[1]: Starting consul agent...
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: connect: Connection refused
Sep 19 14:31:40 integration-dt-ci bash[5167]: /bin/bash: /dev/tcp/localhost/8502: Connection refused
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Starting Consul agent...
Sep 19 14:31:40 integration-dt-ci consul[5166]:               Version: '1.15.10'
Sep 19 14:31:40 integration-dt-ci consul[5166]:            Build Date: '2024-02-13 18:30:20 +0000 UTC'
Sep 19 14:31:40 integration-dt-ci consul[5166]:               Node ID: '93a0a9fe-bf84-eb66-4165-c1453a578c54'
Sep 19 14:31:40 integration-dt-ci consul[5166]:             Node name: 'integration-dt-ci'
Sep 19 14:31:40 integration-dt-ci consul[5166]:            Datacenter: 'integration' (Segment: '<all>')
Sep 19 14:31:40 integration-dt-ci consul[5166]:                Server: true (Bootstrap: true)
Sep 19 14:31:40 integration-dt-ci consul[5166]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, gRPC-TLS: 8503, DNS: 8600)
Sep 19 14:31:40 integration-dt-ci consul[5166]:          Cluster Addr: 172.30.8.179 (LAN: 9301, WAN: 8302)
Sep 19 14:31:40 integration-dt-ci consul[5166]:     Gossip Encryption: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Auto-Encrypt-TLS: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Reporting Enabled: false
Sep 19 14:31:40 integration-dt-ci consul[5166]:             HTTPS TLS: Verify Incoming: true, Verify Outgoing: true, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]:      Internal RPC TLS: Verify Incoming: true, Verify Outgoing: true (Verify Hostname: true), Min Version: TLSv1_2
Sep 19 14:31:40 integration-dt-ci consul[5166]: ==> Log data will now stream in as it occurs:
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.517Z [WARN]  agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.524Z [TRACE] agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Resolver state updated: {
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "Addresses": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "ServiceConfig": null,
Sep 19 14:31:40 integration-dt-ci consul[5166]:   "Attributes": null
Sep 19 14:31:40 integration-dt-ci consul[5166]: } ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [DEBUG] agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.525Z [TRACE] agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.611Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.619Z [TRACE] agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.620Z [TRACE] agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.622Z [TRACE] agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.635Z [INFO]  agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO]  agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [ERROR] agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.641Z [INFO]  agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [ERROR] agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: 2025-09-19T14:31:40.643Z [INFO]  agent: Exit code: code=1
Sep 19 14:31:40 integration-dt-ci systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingALPNRPCWrapper: version=1
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] original dial target is: "consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] parsed dial target is: {Scheme:consul Authority:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 URL:{Scheme:consul Opaque: User: Host:integration.93a0a9fe-bf84-eb66-4165-c1453a578c54 Path:/server.integration RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel authority set to "server.integration"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Resolver state updated: {
                                                  "Addresses": null,
                                                  "ServiceConfig": null,
                                                  "Attributes": null
                                                } ()
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel switches to new LB policy "consul-internal"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: creating balancer: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.grpc.balancer: switching server: target=consul://integration.93a0a9fe-bf84-eb66-4165-c1453a578c54/server.integration from=<none> to=<none>
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Channel #1] Channel Connectivity change to TRANSIENT_FAILURE
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.auto_config: bootstrap = true: do not enable unless necessary
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: Update: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: IncomingGRPConfig: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: [core][Server #2] Server created
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.tlsutil: OutgoingRPCWrapper: version=2
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: starting restore from snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: snapshot restore progress: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 read-bytes=53 percent-complete="0.02%"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server.raft: failed to restore snapshot: id=2-16482438-1758289018472 last-index=16482438 last-term=2 size-in-bytes=291125 error="object missing primary index"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent.server: shutting down server
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Error starting agent: error="Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots"
Sep 19 14:31:40 integration-dt-ci consul[5166]: agent: Exit code: code=1
Metadata
Metadata
Assignees
Labels
No labels