Description
There were a pegasus cluster with 3 meta servers and 5 replica servers. The authentication was enabled. And a script is written to drop a great number of tables.
While the script was being executed, the meta server failed with nothing but got signal id: 11
and following dmesg
:
[Tue Nov 12 15:32:39 2024] meta.meta_stat[681978]: segfault at 40 ip 00007faa351ea839 sp 00007faa0d48abc0 error 4 in libdsn_utils.so[7faa35124000+115000]
[Tue Nov 12 15:32:39 2024] Code: 23 f9 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 48 8d 45 cf 53 4c 8d 67 08 48 83 ec 28 <4c> 8b 7f 18 48 89 45 b8 48 8d 45 ce 4d 39 e7 48 89 45 b0 74 6c 48
In the logs of failed meta server (namely primary meta server), lots of errors are also found:
E2024-11-12 15:32:45.711 (1731396765711870108 a67f5) meta.meta_server4.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:32:45.713 (1731396765713890084 a67f5) meta.meta_server4.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:32:52.225 (1731396772225348205 a67f6) meta.meta_server5.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:32:52.226 (1731396772226529887 a67f6) meta.meta_server5.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:32:58.919 (1731396778919427343 a67f3) meta.meta_server2.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:32:58.921 (1731396778921276545 a67f3) meta.meta_server2.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:33:06.374 (1731396786374687523 a67f6) meta.meta_server5.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:33:06.376 (1731396786376019669 a67f6) meta.meta_server5.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:33:14.775 (1731396794775332362 a67f2) meta.meta_server1.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:33:14.777 (1731396794777299007 a67f2) meta.meta_server1.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:33:30.679 (1731396810679840313 a67f3) meta.meta_server2.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:33:30.681 (1731396810681638580 a67f3) meta.meta_server2.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:33:37.501 (1731396817501816052 a67f7) meta.meta_server6.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:33:37.503 (1731396817503027320 a67f7) meta.meta_server6.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
E2024-11-12 15:33:44.338 (1731396824338693868 a67f4) meta.meta_server3.01010000000009fa: ranger_resource_policy_manager.cpp:641:sync_policies_to_app_envs(): ERR_INVALID_PARAMETERS: set_app_envs failed.
E2024-11-12 15:33:44.339 (1731396824339976731 a67f4) meta.meta_server3.01010000000009fa: ranger_resource_policy_manager.cpp:304:update_policies_from_ranger_service(): ERR_INVALID_PARAMETERS: Sync policies to app envs failed.
After that, other standby meta servers also failed while they tried to take over. See following logs:
E2024-11-12 15:34:33.624 (1731396873624300621 19c265) meta.meta_server0.010200030000042e: server_state.cpp:689:operator()(): assertion expression: false
F2024-11-12 15:34:33.624 (1731396873624310529 19c265) meta.meta_server0.010200030000042e: server_state.cpp:689:operator()(): invalid status(app_status::AS_DROPPING) for app(abc(1)) in remote storage
AS_DROPPING
was found persistent on the remote meta storage (namely ZooKeeper) as the status of the table.
{"status":"app_status::AS_DROPPING","app_type":"pegasus","app_name":"abc","app_id":1,"partition_count":8, ...}
However, this state is just an intermediate state, which should not be found on ZooKeeper
.
Then, all meta server were never be started normally: they exited immediately after they were started.
Activity