Skip to content

The result channel doesn't get closed after load ends blocking the results file generation #385

Open
@vponomaryov

Description

@vponomaryov

Issue description

Running following gemini command:

gemini -d --duration 3h --warmup 30m -c 50 -m mixed -f --non-interactive --cql-features normal \
    --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 \
    --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" \
    --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" \
    --test-cluster=10.12.1.102,10.12.2.40,10.12.2.200 --outfile /gemini/gemini_result_dd524c59-4d74-41f7-a8bd-ada4d99c9e23.log \
    --seed 70 --request-timeout 180s --connect-timeout 120s --oracle-cluster=10.12.3.145

Resulted in the following:

{"L":"INFO","T":"2023-07-10T16:29:32.918Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-07-10T19:59:32.884Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-07-10T19:59:32.919Z","M":"Test run completed. Exiting."}

But in normal case it looks like following:

{"L":"INFO","T":"2023-06-26T09:29:11.605Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-06-26T12:59:11.579Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-06-26T12:59:11.608Z","M":"result channel closed"}
{"L":"INFO","T":"2023-06-26T12:59:11.609Z","M":"Test run completed. Exiting."}

So, the result channel closed message is absent in the current test run failure.
It blocked the generation of the results file.

Impact

Results file from the gemini cannot be taken.

How frequently does it reproduce?

Observed first time

Installation details

Kernel Version: 5.15.0-1039-aws
Scylla version (or git commit hash): 5.2.4-20230708.9f79c9f41d6e with build-id edaa90c2e7660d794d2d308e93c1ba956e829d7d
Gemini version: 1.7.8

Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

  • gemini-with-nemesis-3h-normal-5-2-oracle-db-node-e576d8f7-1 (44.203.218.69 | 10.12.3.145) (shards: 2)
  • gemini-with-nemesis-3h-normal-5-2-db-node-e576d8f7-3 (44.204.201.39 | 10.12.2.200) (shards: 2)
  • gemini-with-nemesis-3h-normal-5-2-db-node-e576d8f7-2 (3.239.28.146 | 10.12.2.40) (shards: 2)
  • gemini-with-nemesis-3h-normal-5-2-db-node-e576d8f7-1 (18.213.151.8 | 10.12.1.102) (shards: 2)

OS / Image: ami-0a69901a7e05f1029 (aws: us-east-1)

Test: gemini-3h-with-nemesis-test
Test id: e576d8f7-262b-4864-b411-4e5c65631b55
Test name: scylla-5.2/gemini-/gemini-3h-with-nemesis-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor e576d8f7-262b-4864-b411-4e5c65631b55
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs e576d8f7-262b-4864-b411-4e5c65631b55

Logs:

Jenkins job URL
Argus

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions