Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self.watch_rest_for_alive(other_node) fails en-mass during bootstrap in debug mode #477

Open
bhalevy opened this issue Jul 24, 2023 · 19 comments · Fixed by #480
Open

self.watch_rest_for_alive(other_node) fails en-mass during bootstrap in debug mode #477

bhalevy opened this issue Jul 24, 2023 · 19 comments · Fixed by #480
Assignees

Comments

@bhalevy
Copy link
Member

bhalevy commented Jul 24, 2023

See for example https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split094___test_rebuild_many_keyspaces/

self = <rebuild_test.TestRebuild object at 0x7f1b7ed0bf50>

    def test_rebuild_many_keyspaces(self):
        """
        Test rebuilding many keyspaces in same dc works as expected.
        """
    
        num_keys = 100
        num_keyspaces = 50
        num_tables = 2
    
        cluster = self.cluster
        cluster.set_configuration_options(values={'endpoint_snitch': 'GossipingPropertyFileSnitch'})
        node1 = self.add_node(1)
    
        # start node in dc1
        node1.start(wait_for_binary_proto=True)
    
        # populate data in dc1
        session = self.patient_exclusive_cql_connection(node1)
        dc = 'dc1'
        logger.debug(f"Creating {num_keyspaces} keyspaces")
        keyspaces = ['ks_{:04d}'.format(i) for i in range(0, num_keyspaces)]
        for ks in keyspaces:
            create_ks(session, ks, {dc: 1})
    
        logger.debug(f"Creating {num_tables} table(s) in each ks")
        tables = ['cf_{:04d}'.format(i) for i in range(0, num_tables)]
        for ks in keyspaces:
            for cf in tables:
                create_c1c2_table(session, cf=f'{ks}.{cf}', debug_query=False)
                insert_c1c2(session, n=num_keys, ks=ks, cf=cf, consistency=ConsistencyLevel.ALL)
    
        keys = [i for i in range(0, num_keys)]
        self._check_data(session, keyspaces, tables, keys)
        session.shutdown()
    
        logger.debug("Bootstrapping node2 with {auto_bootstrap: false}")
        node2 = self.add_node(2)
>       node2.start(wait_other_notice=True, wait_for_binary_proto=True)

rebuild_test.py:315: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:669: in start
    scylla_process = self._start_scylla(args, marks, update_pid,
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:313: in _start_scylla
    node.watch_rest_for_alive(self, timeout=t)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ccmlib.scylla_node.ScyllaNode object at 0x7f1b7d2c3410>
nodes = <ccmlib.scylla_node.ScyllaNode object at 0x7f1b7d3b7b10>, timeout = 360

    def watch_rest_for_alive(self, nodes, timeout=120):
        """
        Use the REST API to wait until this node detects that the nodes listed
        in "nodes" become fully operational and knows of its tokens.
        This is similar to watch_log_for_alive but uses ScyllaDB's REST API
        instead of the log file and waits for the node to be really useable,
        not just "UP" (see issue #461)
        """
        tofind = nodes if isinstance(nodes, list) else [nodes]
        tofind = set([node.address() for node in tofind])
        url_live = f"http://{self.address()}:10000/gossiper/endpoint/live"
        url_joining = f"http://{self.address()}:10000/storage_service/nodes/joining"
        url_tokens = f"http://{self.address()}:10000/storage_service/tokens/"
        endtime = time.time() + timeout
        while time.time() < endtime:
            live = set()
            response = requests.get(url=url_live)
            if response.text:
                live = set(response.json())
            response = requests.get(url=url_joining)
            if response.text:
                live = live - set(response.json())
            # Verify that node knows not only about the existance of the
            # other node, but also its tokens:
            if tofind.issubset(live):
                # This node thinks that all given nodes are alive and not
                # "joining", we're almost done, but still need to verify
                # that the node knows the others' tokens.
                check = tofind
                tofind = set()
                for n in check:
                    response = requests.get(url=url_tokens+n)
                    if response.text == '[]':
                        tofind.add(n)
            if not tofind:
                return
            time.sleep(0.1)
>       raise TimeoutError(f"watch_rest_for_alive() timeout after {timeout} seconds")
E       ccmlib.node.TimeoutError: watch_rest_for_alive() timeout after 360 seconds

../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1379: TimeoutError

It appears like it simply takes more than 360 seconds for the node to bootstrap in debug mode so watch_rest_for_alive fails.

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node1.log

INFO  2023-07-23 08:17:27,230 [shard 0] gossip - InetAddress 127.0.10.2 is now UP, status = UNKNOWN
INFO  2023-07-23 08:17:27,372 [shard 0] compaction - [Compact system.scylla_local 5c4bec80-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_1gaps20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0mxm_2ela820fttl8pke33g-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:27,435 [shard 0] compaction - [Compact system.scylla_local 5c4bec80-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_28sy820fttl8pke33g-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 47ms = 254kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:41,278 [shard 0] compaction - [Compact system.scylla_local 6495cfa0-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_1cnjk20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n13_28sy820fttl8pke33g-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:41,337 [shard 0] compaction - [Compact system.scylla_local 6495cfa0-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_1ovcw20fttl8pke33g-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 43ms = 278kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:18:02,203 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Added node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:18:02,445 [shard 0] cdc - Starting to use generation (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673)
INFO  2023-07-23 08:18:02,469 [shard 0] compaction - [Compact system.cdc_local 71332e60-2931-11ee-8451-d2e563b95f2c] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0n22_22swg20fttl8pke33g-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0mxn_1zdg020fttl8pke33g-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:18:02,631 [shard 0] cdc - CDC description table successfully updated with generation (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673).
INFO  2023-07-23 08:18:02,912 [shard 0] compaction - [Compact system.cdc_local 71332e60-2931-11ee-8451-d2e563b95f2c] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node1/data/system/cdc_local-0bcaffd40c833eadad13dc1d5015b77c/me-3g7w_0n22_3dyfk20fttl8pke33g-big-Data.db:level=0]. 68kB to 63kB (~92% of original) in 419ms = 163kB/s. ~256 total partitions merged to 2.
INFO  2023-07-23 08:23:22,888 [shard 0] compaction_manager - Asked to stop

https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node2.log

INFO  2023-07-23 08:17:40,585 [shard 0] storage_service - entering JOINING mode
INFO  2023-07-23 08:17:40,639 [shard 1] compaction - [Compact system.scylla_local 642affe0-2931-11ee-8618-2a2c7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1g_3gqqo21et193d2r75p-big-Data.db:level=0]. 10kB to 5kB (~53% of original) in 45ms = 234kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:40,728 [shard 0] storage_service - waiting for ring information
INFO  2023-07-23 08:17:40,728 [shard 0] storage_service - waiting for schema information to complete
INFO  2023-07-23 08:17:41,437 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:41,438 [shard 0] migration_manager - Schema merge with 127.0.10.1:0 completed
INFO  2023-07-23 08:17:41,460 [shard 0] compaction - [Compact system_schema.columns 62df4600-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-6-big-Data.db:level=0]. 53kB to 32kB (~59% of original) in 2931ms = 18kB/s. ~256 total partitions merged to 56.
INFO  2023-07-23 08:17:41,483 [shard 0] compaction - [Compact system.local 64b390d0-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1g_3io6o2g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/mc-6-big-Data.db:level=0:origin=compaction]
INFO  2023-07-23 08:17:41,853 [shard 0] compaction - [Compact system.local 64b390d0-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1h_34ycw2g7jx5i3pll31-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 248ms = 56kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:17:41,884 [shard 0] compaction - [Compact system.scylla_local 64ef8860-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1g_44j802g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/md-3g7w_0n1g_39ge82g7jx5i3pll31-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/mc-2-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:42,178 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:42,283 [shard 0] compaction - [Compact system.scylla_local 64ef8860-2931-11ee-a120-2a2d7da362fd] Compacted 3 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g7w_0n1h_5jbxc2g7jx5i3pll31-big-Data.db:level=0]. 17kB to 6kB (~36% of original) in 272ms = 64kB/s. ~384 total partitions merged to 1.
INFO  2023-07-23 08:17:42,296 [shard 0] compaction - [Compact system_schema.tables 65314c50-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/mc-4-big-Data.db:level=0:origin=memtable,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/mc-2-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:42,942 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:43,432 [shard 0] compaction - [Compact system_schema.tables 65314c50-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/me-3g7w_0n1i_20g0w2g7jx5i3pll31-big-Data.db:level=0]. 51kB to 29kB (~57% of original) in 1013ms = 51kB/s. ~256 total partitions merged to 56.
INFO  2023-07-23 08:17:43,652 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:44,300 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:44,948 [shard 0] schema_tables - Schema version changed to 2ac6cf52-bddc-35b0-a88e-67919ba676ba
INFO  2023-07-23 08:17:45,019 [shard 0] storage_service - Checking bootstrapping/leaving nodes: ok
INFO  2023-07-23 08:17:45,019 [shard 0] storage_service - getting bootstrap token
INFO  2023-07-23 08:17:45,020 [shard 0] boot_strapper - Get random bootstrap_tokens={-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}
INFO  2023-07-23 08:17:45,020 [shard 0] system_distributed_keyspace - system_distributed keyspace is already present. Not creating
INFO  2023-07-23 08:17:45,020 [shard 0] system_distributed_keyspace - system_distributed_everywhere keyspace is already present. Not creating
INFO  2023-07-23 08:17:45,021 [shard 0] system_distributed_keyspace - All tables are present on start
INFO  2023-07-23 08:17:45,021 [shard 0] system_distributed_keyspace - All schemas are uptodate on start
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - enable_repair_based_node_ops=true, allowed_repair_based_node_ops={decommission ,bootstrap ,rebuild ,removenode ,replace}
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - entering BOOTSTRAP mode
INFO  2023-07-23 08:17:45,104 [shard 0] storage_service - Wait until local node knows tokens of peer nodes
INFO  2023-07-23 08:17:45,104 [shard 0] gossip - Waiting for pending range setup...
INFO  2023-07-23 08:17:45,104 [shard 0] compaction - [Compact system.local 66ddc3d0-2931-11ee-a120-2a2d7da362fd] Compacting [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1h_34ycw2g7jx5i3pll31-big-Data.db:level=0:origin=compaction,/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1l_0601s2g7jx5i3pll31-big-Data.db:level=0:origin=memtable]
INFO  2023-07-23 08:17:45,164 [shard 0] compaction - [Compact system.local 66ddc3d0-2931-11ee-a120-2a2d7da362fd] Compacted 2 sstables to [/jenkins/workspace/scylla-master/dtest-daily-debug/scylla/.dtest/dtest-ofv8ztjs/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g7w_0n1l_0nkrk2g7jx5i3pll31-big-Data.db:level=0]. 18kB to 12kB (~66% of original) in 43ms = 440kB/s. ~256 total partitions merged to 1.
INFO  2023-07-23 08:18:02,105 [shard 0] gossip - No gossip backlog; proceeding
INFO  2023-07-23 08:18:02,107 [shard 0] cdc - Inserting new generation data at UUID cfbbe7d7-7806-4e2f-b6fc-8755390d4673
INFO  2023-07-23 08:18:02,177 [shard 0] cdc - New CDC generation: (2023/07/23 08:18:27, cfbbe7d7-7806-4e2f-b6fc-8755390d4673)
INFO  2023-07-23 08:18:02,179 [shard 0] storage_service - Starting to bootstrap...
INFO  2023-07-23 08:18:02,179 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap operation: node=b0643dbe-37b6-4e64-86c4-bded508a5c69/127.0.10.2
INFO  2023-07-23 08:18:02,180 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: sync_nodes=bootstrap, ignore_nodes=b0643dbe-37b6-4e64-86c4-bded508a5c69
INFO  2023-07-23 08:18:02,193 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started heartbeat_updater (interval=10s)
INFO  2023-07-23 08:18:02,194 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap_prepare[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:18:02,204 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Added node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:18:02,740 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Finished bootstrap_prepare[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:18:06,006 [shard 0] repair - bootstrap_with_repair: started with keyspaces={ks_0040, ks_0039, ks_0044, ks_0006, ks_0036, ks_0021, ks_0031, ks_0011, ks_0038, ks_0047, ks_0025, ks_0014, ks_0048, ks_0045, ks_0016, ks_0015, ks_0000, ks_0046, ks_0026, ks_0024, ks_0037, system_distributed_everywhere, ks_0043, ks_0035, ks_0019, ks_0042, ks_0023, system_distributed, ks_0017, ks_0032, ks_0034, ks_0003, ks_0004, ks_0041, ks_0030, system_auth, ks_0029, ks_0033, ks_0022, ks_0028, ks_0007, ks_0013, ks_0002, ks_0027, ks_0010, ks_0049, ks_0001, system_traces, ks_0005, ks_0020, ks_0012, ks_0018, ks_0009, ks_0008}, nr_ranges_total=31498
INFO  2023-07-23 08:18:06,143 [shard 0] repair - bootstrap_with_repair: started with keyspace=ks_0040, nr_ranges=512
...
INFO  2023-07-23 08:23:12,310 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: sync data for keyspace=system_distributed, status=started
INFO  2023-07-23 08:23:12,313 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 1 out of 4 tables in keyspace=system_distributed, table=cdc_generation_timestamps, table_id=fdf455c4-cfec-3e00-9719-d7a45436c89d, repair_reason=bootstrap
INFO  2023-07-23 08:23:12,316 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 1 out of 4 tables in keyspace=system_distributed, table=cdc_generation_timestamps, table_id=fdf455c4-cfec-3e00-9719-d7a45436c89d, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,099 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 2 out of 4 tables in keyspace=system_distributed, table=service_levels, table_id=b8c556bd-212d-37ad-9484-690c73a5994b, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,707 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Started to repair 2 out of 4 tables in keyspace=system_distributed, table=service_levels, table_id=b8c556bd-212d-37ad-9484-690c73a5994b, repair_reason=bootstrap
INFO  2023-07-23 08:23:19,708 [shard 1] compaction_manager - Starting off-strategy compaction for system_distributed.cdc_generation_timestamps compaction_group=0/1, 1 candidates were found
INFO  2023-07-23 08:23:19,711 [shard 1] compaction_manager - Done with off-strategy compaction for system_distributed.cdc_generation_timestamps compaction_group=0/1
INFO  2023-07-23 08:23:22,888 [shard 0] compaction_manager - Asked to stop
WARN  2023-07-23 08:23:22,889 [shard 0] gossip - failure_detector_loop: Got error in the loop, live_nodes={127.0.10.1}: seastar::sleep_aborted (Sleep is aborted)
INFO  2023-07-23 08:23:22,890 [shard 0] gossip - failure_detector_loop: Finished main loop
INFO  2023-07-23 08:23:22,890 [shard 1] compaction_manager - Asked to stop
INFO  2023-07-23 08:23:22,890 [shard 0] compaction_manager - Stopped
INFO  2023-07-23 08:23:22,892 [shard 1] compaction_manager - Stopped
WARN  2023-07-23 08:23:22,892 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Repair job aborted by user, job=4e640e08-0a35-4b0e-b128-fef3a47563ac, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}
WARN  2023-07-23 08:23:22,895 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: Repair job aborted by user, job=4e640e08-0a35-4b0e-b128-fef3a47563ac, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}
INFO  2023-07-23 08:23:22,921 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: stats: repair_reason=bootstrap, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, ranges_nr=513, round_nr=802, round_nr_fast_path_already_synced=802, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=4010, tx_hashes_nr=0, rx_hashes_nr=0, duration=10.609721 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.10.1, 0}, {127.0.10.2, 0}}, row_from_disk_nr={{127.0.10.1, 0}, {127.0.10.2, 0}}, row_from_disk_bytes_per_sec={{127.0.10.1, 0}, {127.0.10.2, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.10.1, 0}, {127.0.10.2, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
WARN  2023-07-23 08:23:22,921 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true
INFO  2023-07-23 08:23:22,937 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: stats: repair_reason=bootstrap, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, ranges_nr=513, round_nr=766, round_nr_fast_path_already_synced=765, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=3828, tx_hashes_nr=0, rx_hashes_nr=1, duration=10.624115 seconds, tx_row_nr=0, rx_row_nr=2, tx_row_bytes=0, rx_row_bytes=216, row_from_disk_bytes={{127.0.10.1, 216}, {127.0.10.2, 0}}, row_from_disk_nr={{127.0.10.1, 2}, {127.0.10.2, 0}}, row_from_disk_bytes_per_sec={{127.0.10.1, 1.93893e-05}, {127.0.10.2, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.10.1, 0.188251}, {127.0.10.2, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={{127.0.10.1, 2}}
WARN  2023-07-23 08:23:22,937 [shard 1] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true
WARN  2023-07-23 08:23:22,939 [shard 0] repair - repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: sync data for keyspace=system_distributed, status=failed: std::runtime_error ({shard 0: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true), shard 1: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true)})
ERROR 2023-07-23 08:23:22,940 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Operation failed, sync_nodes={127.0.10.1, 127.0.10.2}: std::runtime_error ({shard 0: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true), shard 1: std::runtime_error (repair[4e640e08-0a35-4b0e-b128-fef3a47563ac]: 1 out of 2052 ranges failed, keyspace=system_distributed, tables={cdc_generation_timestamps, service_levels, cdc_streams_descriptions_v2, view_build_status}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=true)})
INFO  2023-07-23 08:23:22,941 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Stopped heartbeat_updater
INFO  2023-07-23 08:23:22,941 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Started bootstrap_abort[4b61683f-0431-4d19-b651-9f87bdc4e86e]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{127.0.10.2 -> {-8395587159826219290, -5464666828924826936, -7222252513962729396, 135100649381944257, 6576852586953413498, 325464549287023086, 4904036563849780271, -6546981841352807480, 2865406470152825579, -394784590446060672, 8051423441471499559, 6844991086976765892, -5614594334687664535, 1086175589025211548, 2408945801985234454, 8122767265582843160, -2956082443385573348, -960961978041855340, -3643595389008655452, 6857527436554476253, 5707640721068606399, 8897833340820015336, 2703363383166069333, -6588602489652425955, 5440287864758530396, -3094282848608922741, -1833526704171643022, -5105986572299303535, 7343359728267118239, -1834910154028555994, 1359791063466712191, -6507608401952339437, 566681375887275487, -306335652492972568, -4759202512546421207, -3129379551594474653, -5245088751617943006, -1404467098100775533, 3621138246620388043, 1485318039863435327, -4634401387007861583, -3038957000594949223, -6067046826404507156, 259137294824191805, 1926200563648969364, -1129145445286767819, -4639143725143526709, -1270097303253184727, -4966536635347290250, -1836953348237170973, -5794517138929612802, -3357996511996118312, -3791611262883130211, -2712206311139191209, -6791804046333807375, -6857569926998170414, -1214712356026727843, 5648296861051792831, -3655125933126647862, -4437868813526158245, 5370436154027038373, 7792027546790392396, -4561037836572836580, 3279434197974400561, 15665818235702521, 1524968315694800053, -176568868099139895, -6701925414249034591, 5178105708499513041, 5804070608230706547, 5230568807411941866, 4853077499092200821, -7369163820979119575, -6952830276823514532, -103846967177016728, -992945990790605636, 8386798552877056049, 1133063589928776842, -5956267497982120165, -1738707564756902725, -2311239167820960648, -3617434983021206653, -8806310878743135339, 8682303417832312958, 6542373449263183026, 8453323091764003081, 6401264866713689372, -3510202054027109158, 717199908134802740, -909605829712997119, -5535683829237212567, 3940927772554800994, 7989711667997573788, 8150595733957275539, -2056771582667231152, -6866192218627519514, -7574389104777509529, 5949929264212497033, 6666375004548028554, -2043394039982026649, -6812742753429456488, -3465313958018044075, -7402991056989302396, -519923436200876788, -4296903967476984787, 3858349911860527952, 8003019013187970627, -3210942499279608571, 8134376040994854713, -5268721429312034183, -3255006945112380069, -1237903303615122379, 3508003286295929138, 5602756197637522861, -82889097934825496, 765186793610237170, -7218303157549388562, -8371313459689358554, -6077713351632690487, -1338841163598433145, -1461213882739586397, -8003371252654520955, -5723333243472021823, -3598399274862099320, -7975816588440379235, 3182154134710411372, 5982218720245695231, -8022333615083156526, -8013322529234294041, 6582953801098307917, -1746134476029010781, -7962438755842596771, 8440675877475389789, -5517615739684767939, -6728175856340492674, -7709261831346207023, -8218332298477205921, 5863034785872395197, 8549729515336153751, 8444318272321445644, 3406730916609539115, 5280359148735403361, 5662208716882257323, -5591151511066121729, -8883677973470414918, 3941620723706007740, -8502421991628807392, 2575039736256740314, -7803521813182963380, -7876943857235327196, 56269010885645520, 7417591635542850574, -6907680069794125375, 6531632281214254321, 7014445363384813781, -8681664890784800888, 7219964038695109401, 6756764466860574160, 1789954678045529674, -1335653650150848103, -928650376983969000, -4319562382198011677, 3627982001996933078, 6709446486973791041, 50365639515001438, -4723088506383709176, -989799393669168409, -6884393927805650255, 2137838288095421568, 2035601076520437763, 2854041178342208778, -4230353222642009796, 9161035368666734598, 5723595328722592552, -5240450739054301426, -3811011906531734905, -6652655064390295222, -1071492933004332904, -23090648579602723, -7852280585483772283, -3218302238885030434, -5558203672196491920, 5363625513107461792, 2857931834978150504, 5816817509490364905, 8705412457605912815, -9001237871363535131, 5372544962964494314, -3732504860581423471, -1689725134578914778, 252695792052499106, 383585012655668966, 4564452022988505575, -6187619051100758030, -4596776904188924370, 97223002029603781, 5724668741791588655, -6844936433089951339, 936695171381717631, -9015202489132629380, 2890170352300620348, 6185073674648946551, 77467571978625798, 6473387548436861078, 4870832448718573995, 3761903199032378549, -7144321973099061236, 8280025665725034732, 772638564347589154, -4372427892630359972, 5052074121036312165, 8797499046576556097, -3074773784104930630, 703862074880474660, -3604497667191381691, -4175924615349836461, -4675986421159984312, 3659305196643771673, 2507582415515462337, 5686556585242263320, -7525535642739226497, 1426877500142237997, 8260608696777969526, -3553756636007311684, 7986534795414733361, -743757303203258287, -9077028374338143018, -1006342375644801567, 6866260556387203777, -6751365809636160665, 1523413503891722687, -5297781222906273114, 858106848981761506, 5644228301183323679, -4542823553779553524, -4488719974975959451, -2811703002895913722, 3331721693627287123, 2704103012387122487, 5034226912642977658, -865565180332728508, -7949088462790623579, -4208460939445143339, 5475656673231497838, 1094762380324555299, 2833661454877006912, 5315999854388242392, 2172654140971713334, -635124055736189505, 5835272454069430665, -1125547323310034206, 4497840473825704342, 5372819866645000103, -6340846590744899253, 8308933521514706654, 2751666202465664570}}}, repair_tables={}
INFO  2023-07-23 08:23:22,945 [shard 0] storage_service - aborting node operation ops_uuid=4b61683f-0431-4d19-b651-9f87bdc4e86e
INFO  2023-07-23 08:23:22,953 [shard 0] storage_service - bootstrap[4b61683f-0431-4d19-b651-9f87bdc4e86e]: Removed node=127.0.10.2 as bootstrap, coordinator=127.0.10.2
INFO  2023-07-23 08:23:23,380 [shard 0] gossip - Got shutdown message from 127.0.10.1, received_generation=1690100109, local_generation=1690100109
INFO  2023-07-23 08:23:23,382 [shard 0] gossip - InetAddress 127.0.10.1 is now DOWN, status = shutdown
@bhalevy
Copy link
Member Author

bhalevy commented Jul 24, 2023

Also, the 0.1 seconds sleep in

time.sleep(0.1)

floods the log with messages.
See https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/dtest-gw0.log
The following messages are repeated every 0.1 seconds for 360 seconds:

08:17:22,692 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
08:17:22,694 788     urllib3.connectionpool         DEBUG    connectionpool.py   :456  | test_rebuild_many_keyspaces: http://127.0.10.1:10000 "GET /gossiper/endpoint/live HTTP/1.1" 200 14
08:17:22,696 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
08:17:22,698 788     urllib3.connectionpool         DEBUG    connectionpool.py   :456  | test_rebuild_many_keyspaces: http://127.0.10.1:10000 "GET /storage_service/nodes/joining HTTP/1.1" 200 2
08:17:22,799 788     urllib3.connectionpool         DEBUG    connectionpool.py   :228  | test_rebuild_many_keyspaces: Starting new HTTP connection (1): 127.0.10.1:10000
$ wc dtest-gw0.log
  13920  193064 2487237 dtest-gw0.log

@bhalevy
Copy link
Member Author

bhalevy commented Jul 24, 2023

All that, while on https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/245/artifact/logs-full.debug.094/1690100633894_rebuild_test.py%3A%3ATestRebuild%3A%3Atest_rebuild_many_keyspaces/node2.log:

INFO  2023-07-23 08:17:28,186 [shard 0] storage_service - Set host_id=ddb99668-493f-4171-ab7f-a1208ba4186f to be owned by node=127.0.10.1
INFO  2023-07-23 08:17:28,187 [shard 0] gossip - InetAddress 127.0.10.1 is now UP, status = NORMAL
INFO  2023-07-23 08:17:28,196 [shard 0] gossip - Live nodes seen in gossip: {127.0.10.1, 127.0.10.2}
INFO  2023-07-23 08:17:28,196 [shard 0] storage_service - Started waiting for normal state handlers to finish
INFO  2023-07-23 08:17:28,196 [shard 0] storage_service - Normal state handlers not yet finished for nodes (127.0.10.1, status=NORMAL)
INFO  2023-07-23 08:17:28,228 [shard 0] migration_manager - Requesting schema pull from 127.0.10.1:0
INFO  2023-07-23 08:17:28,228 [shard 0] migration_manager - Pulling schema from 127.0.10.1:0
INFO  2023-07-23 08:17:28,283 [shard 0] migration_manager - Requesting schema pull from 127.0.10.1:0
INFO  2023-07-23 08:17:28,296 [shard 0] storage_service - Finished waiting for normal state handlers; endpoints observed in gossip: (127.0.10.1, status=NORMAL), (127.0.10.2, status=UNKNOWN)
INFO  2023-07-23 08:17:28,296 [shard 0] storage_service - Waiting for nodes {127.0.10.2, 127.0.10.1} to be alive
INFO  2023-07-23 08:17:28,297 [shard 0] storage_service - Nodes {127.0.10.2, 127.0.10.1} are alive

@fruch
Copy link
Contributor

fruch commented Jul 24, 2023

Logging I'll take care of, I'll disable this for those calls

The default timeout should be bigger for debug mode ? 10min as it used to be ?

@bhalevy
Copy link
Member Author

bhalevy commented Jul 24, 2023

If @nyh thinks we still need it than 10 minutes should be enough, at least for small cluster.
But we need to see how long it takes on large clusters where bootstrap repair needs to communucate with more nodes.

fruch added a commit to fruch/scylla-ccm that referenced this issue Jul 24, 2023
disable `urllib3.connectionpool` logging for this function,
it doing lots of retries and we don't need to see each
request that is being sent out in the log

Ref: scylladb#477
fruch added a commit to fruch/scylla-ccm that referenced this issue Jul 24, 2023
…_alive`

seems like the 360s default for debug for `watch_rest_for_alive`
isn't enough as it was for `watch_log_for_alive`,
making it 600s (10min)

Fix: scylladb#477
fruch added a commit to fruch/scylla-ccm that referenced this issue Jul 24, 2023
…r_alive`

seems like the 360s default for debug for `watch_rest_for_alive`
isn't enough as it was for `watch_log_for_alive`,
making it 600s (10min)

Fix: scylladb#477
fruch added a commit to fruch/scylla-ccm that referenced this issue Jul 25, 2023
…r_alive`

seems like the 360s default for debug for `watch_rest_for_alive`
isn't enough as it was for `watch_log_for_alive`,
making it 600s (10min)

Fix: scylladb#477
fruch added a commit that referenced this issue Jul 25, 2023
disable `urllib3.connectionpool` logging for this function,
it doing lots of retries and we don't need to see each
request that is being sent out in the log

Ref: #477
fruch added a commit that referenced this issue Jul 25, 2023
…r_alive`

seems like the 360s default for debug for `watch_rest_for_alive`
isn't enough as it was for `watch_log_for_alive`,
making it 600s (10min)

Fix: #477
@nyh
Copy link
Contributor

nyh commented Jul 25, 2023

If @nyh thinks we still need it than 10 minutes should be enough, at least for small cluster. But we need to see how long it takes on large clusters where bootstrap repair needs to communucate with more nodes.

Maybe there shouldn't be any timeout at all, whatsoever? If the waiting code is not buggy, it will eventually stop waiting when the node becomes available. If the waiting code is buggy, the caller (dtest, Jenkins, etc.) will eventually stop on an overall timeout, so we're still safe.

Alternatively, maybe the timeout should be made configurable. It can default to something (e.g., 10 minutes), but if the user really wants to wait 60 minutes for a node on a huge cluster to come up, they can configure the timeout to 60 minutes.

@bhalevy
Copy link
Member Author

bhalevy commented Jul 25, 2023

I think we can also make the timeout be a linear function of the number of nodes in the cluster (at least for the first node in the loop, the rest can follow shortly after).
But I'm not sure it's worth it

@bhalevy
Copy link
Member Author

bhalevy commented Jul 26, 2023

Sigh, apparently 120 seconds in non-debug modes isn't enough and there's flakiness, see
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/314/testReport/update_cluster_layout_tests/TestLargeScaleCluster/Run_Dtest_Parallel_Cloud_Machines___LongDtest___long_split000___test_add_many_nodes_under_load/

ccmlib.node.TimeoutError: watch_rest_for_alive() timeout after 120 seconds

@bhalevy bhalevy reopened this Jul 28, 2023
@bhalevy
Copy link
Member Author

bhalevy commented Jul 30, 2023

And it turns out that 600 seconds isn't emough either in debug mode if there are enough keyspaces and/or tables.
See https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split090___test_rebuild_many_keyspaces/
and https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/246/testReport/rebuild_test/TestRebuild/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split086___test_rebuild_many_tables/

We can extend the timeout indefinitely while there is bootstrap progress like we do for the cql listen message, maybe this is the way to go for test robustness

@fruch
Copy link
Contributor

fruch commented Jul 30, 2023

@bhalevy I'm not sure what the suggestion is here exactly, this function is waiting for bootstrap to end already...

@bhalevy
Copy link
Member Author

bhalevy commented Jul 30, 2023

Yes, it's polling the rest api and the event it is waiting for doesn't happen until bootstrap is over.
I suggested to watch the log for progress as we do in wait_for_starting.

But coming to think about it, I think we could just change the order of operation as follows to run
wait_for_starting before self.watch_rest_for_alive(others):

        wait_timeout = timeout * 4 if timeout is not None else 420 if self.cluster.scylla_mode != 'debug' else 900

        if wait_other_notice:
            for node, _ in marks:
                node.watch_rest_for_alive(self, timeout=wait_timeout)

        if wait_for_binary_proto:
            from_mark = self.mark
            try:
                self.wait_for_binary_interface(from_mark=from_mark, process=self._process_scylla, timeout=wait_timeout)
            except TimeoutError as e:
                self.wait_for_starting(from_mark=self.mark, timeout=wait_timeout)
                self.wait_for_binary_interface(from_mark=from_mark, process=self._process_scylla, timeout=0)
        elif wait_other_notice:
            self.wait_for_starting(from_mark=self.mark, timeout=wait_timeout)

        if wait_other_notice:
            for node, _ in marks:
                self.watch_rest_for_alive(node, timeout=wait_timeout)

@fruch
Copy link
Contributor

fruch commented Jul 30, 2023

Why not todo all the wait_other_notice logic after wait_for_binary_proto ? as it in scylla_cluster.py ?

why split it into two parts like that ?

@bhalevy
Copy link
Member Author

bhalevy commented Jul 30, 2023

it's possible, although the other node notice this node as up before it starts listening for cql, but that shouldn't matter.

However, if only wait_other_notics is set, we'd still need wait_for_starting to prevent flakiness.

@fruch
Copy link
Contributor

fruch commented Jul 30, 2023

it's possible, although the other node notice this node as up before it starts listening for cql, but that shouldn't matter.

However, if only wait_other_notics is set, we'd still need wait_for_starting to prevent flakiness.

Even though I don't know of any test that would do wait_other_notice without wait_for_binary_protocol ...

@bhalevy
Copy link
Member Author

bhalevy commented Jul 30, 2023

It should be ok to imply wait_for_binary_protocol when wait_other_notice is set.

avelanarius pushed a commit to avelanarius/scylla-ccm that referenced this issue Jan 12, 2024
disable `urllib3.connectionpool` logging for this function,
it doing lots of retries and we don't need to see each
request that is being sent out in the log

Ref: scylladb#477
avelanarius pushed a commit to avelanarius/scylla-ccm that referenced this issue Jan 12, 2024
…r_alive`

seems like the 360s default for debug for `watch_rest_for_alive`
isn't enough as it was for `watch_log_for_alive`,
making it 600s (10min)

Fix: scylladb#477
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants