Description
This issue happened with a cloud cluster.
Installation details
Kernel Version: 5.13.0-1031-aws
Scylla version (or git commit hash): 2021.1.12-20220620.e23889f17
with build-id 00bd36a6a55a00508aad03c6e9a11932a6c3b1a6
Cluster size: 6 nodes (i3.large)
Scylla Nodes used in this run:
No resources left at the end of the run
OS / Image: ami-0763124bdca4e1fe3
(aws: eu-west-1)
Test: scylla-cloud-operations-24h-aws
Test id: d0cef262-8b05-437f-88e4-d5ba31a2fd90
Test name: siren-tests/longevity-tests/scylla-cloud-operations-24h-aws
Test config file(s):
Issue description
At 2022-08-07 05:40:08,996
, we attempted to use the suspend the manager, which usually takes mere seconds. However, in this occurrance, it took over two minutes, seemingly due to a running backup task taking too long to stop:
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.817Z","N":"scheduler","M":"Suspending cluster","cluster_id":"1f1d57ed-a987-484e-bf54-73963fead353","_trace_id":"4KUfAD02S5CQ8vpcsLBYWg"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.14.51","id":1659845952,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.14.170","id":1659846005,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.12.164","id":1659845958,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.12.224","id":1659845973,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.13.10","id":1659845994,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:13 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:13.818Z","N":"backup.upload","M":"Stop job","host":"172.21.13.238","id":1659845983,"_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:40:15 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:15.225Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.13.10","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/691aeb2d-267b-4aec-b0d0-c5574d91ed09/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:15 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:15.226Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.13.10","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/691aeb2d-267b-4aec-b0d0-c5574d91ed09/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:20 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:20.241Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.13.238","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/67fd035a-4ea8-471d-97ce-ad0abe3fa3de/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:20 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:20.241Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.13.238","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/67fd035a-4ea8-471d-97ce-ad0abe3fa3de/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:23 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:23.819Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"1.070815568s","error":"net/http: TLS handshake timeout"}
Aug 07 05:40:30 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:30.756Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.12.164","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/8dc22bd3-93ae-4bd2-ab71-043520cc83e6/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:30 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:30.756Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.12.164","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/8dc22bd3-93ae-4bd2-ab71-043520cc83e6/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:43 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:43.074Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.14.51","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/98a8eea5-d7cd-410a-aecd-ee266ffa3fb9/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:43 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:43.074Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.14.51","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/98a8eea5-d7cd-410a-aecd-ee266ffa3fb9/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:46 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:46.755Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.14.170","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/15f8131b-841a-470b-900a-8803dbb67044/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:46 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:40:46.755Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.14.170","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/15f8131b-841a-470b-900a-8803dbb67044/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:40:46 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:46.988Z","N":"cluster.client","M":"HTTP retry backoff","operation":"CoreStatsDelete","wait":"899.446088ms","error":"net/http: TLS handshake timeout"}
Aug 07 05:40:58 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:40:58.004Z","N":"cluster.client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"after 30s: context deadline exceeded","_trace_id":"b5mOUFSDRGi_Erwo6dzeZQ"}
Aug 07 05:41:17 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:41:17.889Z","N":"cluster.client","M":"HTTP retry backoff","operation":"CoreStatsDelete","wait":"1.608674078s","error":"after 30s: context deadline exceeded"}
Aug 07 05:41:28 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:41:28.004Z","N":"cluster.client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"after 30s: context deadline exceeded","_trace_id":"b5mOUFSDRGi_Erwo6dzeZQ"}
Aug 07 05:41:29 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:41:29.499Z","N":"cluster.client","M":"HTTP retry backoff","operation":"CoreStatsDelete","wait":"3.380851189s","error":"net/http: TLS handshake timeout"}
Aug 07 05:41:33 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:41:33.680Z","N":"cluster.client","M":"HTTP retry now","operation":"GossiperEndpointLiveGet","error":"after 30s: context deadline exceeded","_trace_id":"7ISjiJ1lSsKmLUmtX9jgyA"}
Aug 07 05:41:54 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:41:54.244Z","N":"cluster.client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"net/http: TLS handshake timeout","_trace_id":"b5mOUFSDRGi_Erwo6dzeZQ"}
Aug 07 05:42:18 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:42:18.767Z","N":"backup.upload","M":"Upload dir failed","host":"172.21.12.224","from":"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC","to":"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/fb3b4f81-20a3-4d35-b9ad-fe45e8e26608/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001","error":"fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).waitJob\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:231\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:189\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:166\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:88\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:42:18 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:42:18.768Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"172.21.12.224","error":"keyspace1.standard1: upload snapshot: copy \"data:keyspace1/standard1-f4de777015a911edaf9a000000000001/snapshots/sm_20220807052140UTC\" to \"s3:scylla-cloud-backup-21217-21320-279qkl/backup/sst/cluster/1f1d57ed-a987-484e-bf54-73963fead353/dc/AWS_EU_WEST_1/node/fb3b4f81-20a3-4d35-b9ad-fe45e8e26608/keyspace/keyspace1/table/standard1/f4de777015a911edaf9a000000000001\": fetch job info: context canceled","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1.2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:71\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).uploadHost.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:89\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69\nruntime.goexit\n\truntime/asm_amd64.s:1371\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func2\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:29\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.hostsInParallel.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/parallel.go:80\ngithub.com/scylladb/scylla-manager/v3/pkg/util/parallel.Run.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/util/parallel/parallel.go:69"}
Aug 07 05:42:18 ip-172-21-15-182 scylla-manager[8724]: {"L":"ERROR","T":"2022-08-07T05:42:18.768Z","N":"backup.upload","M":"Uploading snapshot files failed see exact errors above","duration":"20m5.66554442s","_trace_id":"b6b0tOiESSiciDJ5oPJDlA","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:20\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*worker).Upload\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/worker_upload.go:26\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*Service).Backup.func8\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/service.go:764\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*Service).Backup.func11\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/service.go:808\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.(*Service).Backup\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/service.go:812\ngithub.com/scylladb/scylla-manager/v3/pkg/service/backup.Runner.Run\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/backup/runner.go:26\ngithub.com/scylladb/scylla-manager/v3/pkg/service/scheduler.PolicyRunner.Run\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/scheduler/policy.go:32\ngithub.com/scylladb/scylla-manager/v3/pkg/service/scheduler.(*Service).run\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/scheduler/service.go:432\ngithub.com/scylladb/scylla-manager/v3/pkg/scheduler.(*Scheduler).asyncRun.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scheduler/scheduler.go:412"}
Aug 07 05:42:22 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:42:22.876Z","N":"scheduler.1f1d57ed","M":"Run ended","task":"backup/1bf10e85-7c1c-47c7-9a58-325ab03e7087","status":"STOPPED","duration":"20m47.244272306s","_trace_id":"b6b0tOiESSiciDJ5oPJDlA"}
Aug 07 05:42:22 ip-172-21-15-182 scylla-manager[8724]: {"L":"INFO","T":"2022-08-07T05:42:22.876Z","N":"http","M":"PUT /api/v1/cluster/%2321217/suspended?start_tasks=false","from":"3.216.19.95:14476","status":0,"bytes":0,"duration":"129063ms","_trace_id":"4KUfAD02S5CQ8vpcsLBYWg"}
This was the fifth time out of nine we attempted to suspend the manager, with all other 8 times were successful, and none of them taking over 6 seconds. Is this an expected behaviour?
Timestamps of the other, successful suspensions: 2022-08-06 22:41:51,892
, 2022-08-07 00:22:39,373
, 2022-08-07 01:25:39,421
, 2022-08-07 03:08:39,368
, 2022-08-07 06:13:36,005
, 2022-08-07 11:43:57,822
, 2022-08-07 14:31:59,444
, 2022-08-07 15:36:44,543
.
- Restore Monitor Stack command:
$ hydra investigate show-monitor d0cef262-8b05-437f-88e4-d5ba31a2fd90
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs d0cef262-8b05-437f-88e4-d5ba31a2fd90
##Logs:
https://docs.google.com/document/d/1QpbJYYNkvSoU7JHD4E6R5Q3yloVmEywqKpxXYlHvg8Y/edit