Description
Installation details
Kernel version: 5.11.0-1022-aws
Scylla version (or git commit hash): 4.6.rc5-0.20220203.5694ec189 with build-id f5d85bf5abe6d2f9fd3487e2469ce1c34304cc14
Cluster size: 4 nodes (i3en.3xlarge)
Scylla running with shards number (live nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-1 (16.170.220.3 | 10.0.3.180): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-2 (13.48.106.98 | 10.0.1.75): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-4 (13.51.193.35 | 10.0.3.6): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-5 (16.171.64.136 | 10.0.0.210): 12 shards
Scylla running with shards number (terminated nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-3 (16.170.157.129 | 10.0.3.67): 12 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-099a011bd5f16a168
(aws: eu-north-1)
Test: longevity-large-partition-4days-test
Test name: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):
- longevity-large-partition-4days.yaml
Issue description
====================================
Two loader nodes running scylla-bench v0.1.8 got 3 core dumps:
2022-02-04 19:12:38.443: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=808c9044-c851-4fe5-884a-c7217aa8d4c7 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-2 [16.170.143.136 | 10.0.3.155] (seed: False)
2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)
2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)
It looks like SCT encountered a problem uploading the coredump to s3:
< t:2022-02-04 19:30:02,331 f:file_logger.py l:89 c:sdcm.sct_events.file_logger p:INFO > 2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 19:30:33,724 f:coredump.py l:389 c:sdcm.cluster_aws p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 19:13:57 UTC (16min ago)' (Fri 2022-02-
04 19:13:57 UTC), due to error: time data 'Fri 2022-02-04 19:13:57 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 19:30:33,725 f:coredump.py l:220 c:sdcm.cluster_aws p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[859] has inaccessible corefile, can't upload it
< t:2022-02-04 20:50:58,050 f:file_logger.py l:89 c:sdcm.sct_events.file_logger p:INFO > 2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 20:51:58,811 f:coredump.py l:389 c:sdcm.cluster_aws p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 20:35:03 UTC (16min ago)' (Fri 2022-02-
04 20:35:03 UTC), due to error: time data 'Fri 2022-02-04 20:35:03 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 20:51:58,811 f:coredump.py l:220 c:sdcm.cluster_aws p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[6632] has inaccessible corefile, can't upload it
====================================
Restore Monitor Stack command: $ hydra investigate show-monitor e2adc2e9-28de-4aab-8dd3-5420deabc259
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs e2adc2e9-28de-4aab-8dd3-5420deabc259
Test id: e2adc2e9-28de-4aab-8dd3-5420deabc259
Logs:
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3FfBA-mhjEIAtL-7F3JgxY)
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2ZecaaF9ftF-uj5bd3z65d)
critical - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2MVPU_TmCiDmQRnWM5Tp-M)
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3oalZn4yCxPZrxAc2h4CwH)
debug - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw1E-ViLa0LBqslYlLheyX8u)
email_data - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2TPSFU8f0o1GrJ-r1k-uyB)
error - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0P2XY0WTQlEhEEeGTqoQZ1)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3LQ3ZZVbO9RjibrU8vd2B-)
left_processes - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw35nrd7VqCxRMYLrdwMt7cH)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw04odK_tFaz86XDrmqv8RXC)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3NsARY3BuTdlUrZFShnOyq)
normal - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2YLTrPn9946TN4w2NuOiFv)
output - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0GwpuKgyScv2xgXFOE-5q0)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2uG-2E9ad99zdPsbBdv-w8)
sct - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2LM6z_sWept67xziDCKTYY)
summary - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3BRMZOe4Lz_KWgA9irnCnC)
warning - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw29-SvCMANTsOUzGrZgpt57)
Jenkins job URL