Skip to content

Loader node with scylla-bench v0.1.8 got a core dump #90

Open
@yarongilor

Description

@yarongilor

Installation details
Kernel version: 5.11.0-1022-aws
Scylla version (or git commit hash): 4.6.rc5-0.20220203.5694ec189 with build-id f5d85bf5abe6d2f9fd3487e2469ce1c34304cc14
Cluster size: 4 nodes (i3en.3xlarge)
Scylla running with shards number (live nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-1 (16.170.220.3 | 10.0.3.180): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-2 (13.48.106.98 | 10.0.1.75): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-4 (13.51.193.35 | 10.0.3.6): 12 shards
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-5 (16.171.64.136 | 10.0.0.210): 12 shards
Scylla running with shards number (terminated nodes):
longevity-large-partitions-4d-4-6-db-node-e2adc2e9-3 (16.170.157.129 | 10.0.3.67): 12 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-099a011bd5f16a168 (aws: eu-north-1)

Test: longevity-large-partition-4days-test
Test name: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):

  • longevity-large-partition-4days.yaml

Issue description

====================================

Two loader nodes running scylla-bench v0.1.8 got 3 core dumps:

2022-02-04 19:12:38.443: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=808c9044-c851-4fe5-884a-c7217aa8d4c7 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-2 [16.170.143.136 | 10.0.3.155] (seed: False)
2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)
2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False)

It looks like SCT encountered a problem uploading the coredump to s3:

< t:2022-02-04 19:30:02,331 f:file_logger.py  l:89   c:sdcm.sct_events.file_logger p:INFO  > 2022-02-04 19:30:02.330: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=39c8e89c-222a-4e89-bfef-a0ad19fe9903 node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 19:30:33,724 f:coredump.py     l:389  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 19:13:57 UTC (16min ago)' (Fri 2022-02-
04 19:13:57 UTC), due to error: time data 'Fri 2022-02-04 19:13:57 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 19:30:33,725 f:coredump.py     l:220  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[859] has inaccessible corefile, can't upload it
< t:2022-02-04 20:50:58,050 f:file_logger.py  l:89   c:sdcm.sct_events.file_logger p:INFO  > 2022-02-04 20:50:58.048: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=c3e3ce43-fa0c-4381-aa90-2bfd04b8eb7c node=Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125
] (seed: False)
< t:2022-02-04 20:51:58,811 f:coredump.py     l:389  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: Failed to convert date 'Timestamp: Fri 2022-02-04 20:35:03 UTC (16min ago)' (Fri 2022-02-
04 20:35:03 UTC), due to error: time data 'Fri 2022-02-04 20:35:03 UTC' does not match format '%a %Y-%m-%d %H:%M:%S %z'
< t:2022-02-04 20:51:58,811 f:coredump.py     l:220  c:sdcm.cluster_aws     p:ERROR > Node longevity-large-partitions-4d-4-6-loader-node-e2adc2e9-1 [13.48.13.196 | 10.0.3.125] (seed: False): CoredumpExportSystemdThread: CoreDump[6632] has inaccessible corefile, can't upload it

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor e2adc2e9-28de-4aab-8dd3-5420deabc259
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs e2adc2e9-28de-4aab-8dd3-5420deabc259

Test id: e2adc2e9-28de-4aab-8dd3-5420deabc259

Logs:
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220204_211840-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3FfBA-mhjEIAtL-7F3JgxY)
grafana - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_211619/grafana-screenshot-overview-20220204_211619-longevity-large-partitions-4d-4-6-monitor-node-e2adc2e9-1.png)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2ZecaaF9ftF-uj5bd3z65d)
critical - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/critical-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2MVPU_TmCiDmQRnWM5Tp-M)
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/db-cluster-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw3oalZn4yCxPZrxAc2h4CwH)
debug - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/debug-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw1E-ViLa0LBqslYlLheyX8u)
email_data - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/email_data-e2adc2e9.json.tar.gz)&source=gmail-html&ust=1644400806565000&usg=AOvVaw2TPSFU8f0o1GrJ-r1k-uyB)
error - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/error-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0P2XY0WTQlEhEEeGTqoQZ1)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3LQ3ZZVbO9RjibrU8vd2B-)
left_processes - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/left_processes-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw35nrd7VqCxRMYLrdwMt7cH)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/loader-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw04odK_tFaz86XDrmqv8RXC)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/monitor-set-e2adc2e9.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3NsARY3BuTdlUrZFShnOyq)
normal - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/normal-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2YLTrPn9946TN4w2NuOiFv)
output - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/output-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw0GwpuKgyScv2xgXFOE-5q0)
event - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/raw_events-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2uG-2E9ad99zdPsbBdv-w8)
sct - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/sct-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw2LM6z_sWept67xziDCKTYY)
summary - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/summary-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw3BRMZOe4Lz_KWgA9irnCnC)
warning - [https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz](https://www.google.com/url?q=https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz%255D(https://cloudius-jenkins-test.s3.amazonaws.com/e2adc2e9-28de-4aab-8dd3-5420deabc259/20220204_213006/warning-e2adc2e9.log.tar.gz)&source=gmail-html&ust=1644400806566000&usg=AOvVaw29-SvCMANTsOUzGrZgpt57)

Jenkins job URL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions