Skip to content

Conversation

@saintstack
Copy link
Contributor

Based on #12515

- Add new test configuration file BulkDumpingS3WithChaos.toml with four
  test variants: Stable, LightChaos, MediumChaos, and HeavyChaos

- Clear bulkLoadJobHistoryKeys in clearDatabase() to prevent assertion
   failures from leftover job history between test iterations
- Add MockS3 storage cleanup at test start to prevent memory accumulation
   across multiple test iterations via clearMockS3Storage()
- Fix deadlock in tryGetRangeForBulkLoad() when bulkLoadFileSetsToLoad
   is empty - now properly terminates stream with empty result instead of
   hanging indefinitely
- Reduce waitUntilDumpJobComplete() polling interval from 30s to 5s to
   work around a Flow simulation hang that consistently occurs around
   242-243 seconds of simulation time

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 101e6d1
  • Duration 0:12:05
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 101e6d1
  • Duration 0:23:21
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 101e6d1
  • Duration 0:42:42
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 101e6d1
  • Duration 0:45:20
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 101e6d1
  • Duration 0:48:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack saintstack requested a review from kakaiu October 28, 2025 06:01
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 101e6d1
  • Duration 0:57:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 23e94ac
  • Duration 0:12:56
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 101e6d1
  • Duration 1:04:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 23e94ac
  • Duration 0:28:06
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 23e94ac
  • Duration 0:45:13
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 23e94ac
  • Duration 0:46:00
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 23e94ac
  • Duration 0:45:20
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 23e94ac
  • Duration 0:46:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 134d46b
  • Duration 0:11:59
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 134d46b
  • Duration 0:24:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 23e94ac
  • Duration 1:11:47
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 2bb7374
  • Duration 0:26:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 134d46b
  • Duration 0:38:41
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 134d46b
  • Duration 0:45:55
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 134d46b
  • Duration 0:45:58
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 2bb7374
  • Duration 0:43:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 2bb7374
  • Duration 0:43:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 134d46b
  • Duration 1:10:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 2bb7374
  • Duration 1:04:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 2bb7374
  • Duration 1:06:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 2bb7374
  • Duration 1:07:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 06ee8f2
  • Duration 0:04:54
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 141
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 5290ea1
  • Duration 1:07:58
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 1f26317
  • Duration 0:23:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 1f26317
  • Duration 0:36:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 1f26317
  • Duration 0:45:05
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 1f26317
  • Duration 0:56:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 1f26317
  • Duration 1:02:42
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 1f26317
  • Duration 1:07:55
  • Result: ❌ FAILED
  • Error: Error while executing command: TEST_USERNAME=fdb-pr-${CODEBUILD_BUILD_NUMBER} make -kj -C tests foundationdb-pr-tests. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@saintstack
Copy link
Contributor Author

I ran the new BulkLoadingS3WithChaos test 100k times on joshua

20251031-205439-stack_bulk-b54afec446cf3051 compressed=True data_size=55535146 duration=7248422 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:25:53 sanity=False started=100000 stopped=20251031-222032 submitted=20251031-205439 timeout=5400 username=stack_bulk

And here is the general 100k joshua run

20251031-223307-stack_all-b574ae2aba0fa7c8 compressed=True data_size=55510665 duration=5561255 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:19 sanity=False started=100000 stopped=20251031-233226 submitted=20251031-223307 timeout=5400 username=stack_all

@saintstack
Copy link
Contributor Author

Looking at the k8s fail, it seems like all the tests claim to have passed


benchmark_ha_rocksdb_sharded_uniform.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkHaRocksdbShardedUniform (74.02s)
benchmark_ha_rocksdb_uniform.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkHaRocksdbUniform (108.02s)
benchmark_rocksdb_sharded_uniform.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksdbUniform (73.07s)
benchmark_rocksdb_sharded_zipfian.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksDBZipfian (193.01s)
benchmark_rocksdb_uniform_offline_backup_restore.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksdbUniformOfflineBackupRestore (94.09s)
benchmark_rocksdb_uniform.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksdbUniform (245.00s)
benchmark_rocksdb_zipfian_ck.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksDBZipfianCK (164.00s)
benchmark_rocksdb_zipfian.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkRocksDBZipfian (163.00s)
benchmark_sqlite_uniform.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkSqliteUniform (138.00s)
benchmark_sqlite_zipfian.2025-10-31T23:08+00:00.log:--- PASS: TestBenchmarkSqliteZipfian (77.03s)
benchmark_ycsb_jdbc.2025-10-31T23:08+00:00.log:--- PASS: TestYcsbJdbc (282.00s)
perf_constant_load.2025-10-31T23:08+00:00.log:--- PASS: TestPerfConstantLoad (83.05s)
perf_multi_tester.2025-10-31T23:08+00:00.log:--- PASS: TestYcsb (109.05s)
perf_ycsb_storage.2025-10-31T23:08+00:00.log:--- PASS: TestYcsb (84.05s)
test_bulkload_s3.2025-10-31T23:09+00:00.log:--- PASS: TestBulkLoadS3 (262.00s)
test_circus.2025-10-31T23:08+00:00.log:--- PASS: TestCircus (61.05s)
test_compatibility.2025-10-31T23:08+00:00.log:--- PASS: TestCompatibility (229.02s)
test_connection_string_watcher.2025-10-31T23:08+00:00.log:--- PASS: TestConnectionStringWatcher (117.10s)
test_exclude_feature.2025-10-31T23:09+00:00.log:--- PASS: TestHarness (273.00s)
test_frm.2025-10-31T23:09+00:00.log:--- PASS: TestConsistency (65.07s)
test_gray_failures.2025-10-31T23:08+00:00.log:--- PASS: TestGrayFailure (238.00s)
test_ha_partition.2025-10-31T23:08+00:00.log:--- PASS: TestHAPartition (19.00s)
test_ha_rf4.2025-10-31T23:08+00:00.log:--- PASS: TestHaRf4 (107.08s)
test_ha_three_zone.2025-10-31T23:08+00:00.log:--- PASS: TestHaThreeZone (84.08s)
test_ha_throttled_satellite.2025-10-31T23:09+00:00.log:--- PASS: TestHaThrottledSatellite (26.00s)
test_ha_ycsb_both_primary.2025-10-31T23:08+00:00.log:--- PASS: TestHaYcsbBothPrimary (19.00s)
test_ha_ycsb_primary_and_remote.2025-10-31T23:09+00:00.log:--- PASS: TestHaYcsbPrimaryAndRemote (265.00s)
test_ha_ycsb.2025-10-31T23:09+00:00.log:--- PASS: TestHaYcsb (46.05s)
test_ha.2025-10-31T23:09+00:00.log:--- PASS: TestHa (237.02s)
test_ha2satellite.2025-10-31T23:08+00:00.log:--- PASS: TestHa2satellite (64.06s)
test_harness.2025-10-31T23:09+00:00.log:--- PASS: TestHarness (246.00s)
test_hostname.2025-10-31T23:08+00:00.log:--- PASS: TestHostname (167.00s)
test_invalidate_old_peers.2025-10-31T23:08+00:00.log:--- PASS: TestHostname (99.09s)
test_io_faults.2025-10-31T23:08+00:00.log:--- PASS: TestIoFaults (1.00s)
test_linearizability.2025-10-31T23:08+00:00.log:--- PASS: TestLinearizability (226.05s)
test_multithreadclient.2025-10-31T23:08+00:00.log:--- PASS: TestMultithreadclient (71.07s)
test_network_faults.2025-10-31T23:08+00:00.log:--- PASS: TestNetworkFaults (97.01s)
test_network_storage.2025-10-31T23:08+00:00.log:--- PASS: TestKubernetes (93.09s)
test_pod_faults.2025-10-31T23:08+00:00.log:--- PASS: TestPodFaults (10.01s)
test_record_layer.2025-10-31T23:08+00:00.log:--- PASS: TestRecordLayer (280.01s)
test_recovery.2025-10-31T23:08+00:00.log:--- PASS: TestRecovery (88.10s)
test_remote_tlog_clog.2025-10-31T23:08+00:00.log:--- PASS: TestNetworkFaults (82.01s)
test_rocksdb_storage_migration.2025-10-31T23:08+00:00.log:--- PASS: TestSsFailure (99.09s)
test_sample.2025-10-31T23:08+00:00.log:--- PASS: TestSample (109.03s)
test_shardedrocksdb_migration.2025-10-31T23:08+00:00.log:--- PASS: TestSsFailure (105.05s)
test_slow_network_performance.2025-10-31T23:08+00:00.log:--- PASS: TestSlowNetworkPerformance (236.02s)
test_splunk.2025-10-31T23:08+00:00.log:--- PASS: TestSplunk (171.00s)
test_ss_failure_on_busy_grv.2025-10-31T23:08+00:00.log:--- PASS: TestSsFailureOnBusyGrv (102.10s)
test_ss_failure.2025-10-31T23:08+00:00.log:--- PASS: TestSsFailure (216.00s)
test_ss_io_timeout.2025-10-31T23:08+00:00.log:--- PASS: TestPerfBaseline (272.00s)
test_ycsb_ck.2025-10-31T23:08+00:00.log:--- PASS: TestYcsbCk (16.01s)
test_ycsb.2025-10-31T23:08+00:00.log:--- PASS: TestYcsb (234.01s)

@kakaiu
Copy link
Member

kakaiu commented Oct 31, 2025

In general, it looks good to me! What you add more details about the deadlock issue in the comments? Thanks!

@saintstack
Copy link
Contributor Author

@kakaiu Hang was like this (not sure where to put it in code... )

FetchKeys operations stuck during bulk load with DataDistributionQueueSize at 1. ConsistencyCheck times out with DataDistributionQueueSize="1" repeatedly. Happened during medium and heavy chaos. bulkLoadDownloadTaskFileSets() skipped ranges without data files entirely, so no local fileSet entry was created.

Code before change...


for (auto iter = fromRemoteFileSets->begin(); iter != fromRemoteFileSets->end(); ++iter) {
    BulkLoadFileSet localFileSet = wait(bulkLoadDownloadTaskFileSet(...));
    localFileSets->push_back(std::make_pair(keys, localFileSet));
}

We just moved on to the next item leaving no accounting for the empty range.

if (!range->value().isValid()) {
continue;
}
// Skip empty ranges (no data file to load)
Copy link
Member

@kakaiu kakaiu Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help to check if DD skips noDataFile range when creating the bulkload tasks from manifests? Thanks! I think ideally, DD should not schedule noDataFile range as bulkload tasks. My impression is that but the DD code may change over time and break this invariant. Maybe supporting customized range loading causes the problem. Can you help to confirm? Thanks!

Nevertheless, this protection is great! Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at where this code was introduced

commit 0e736c6
Author: Zhe Wang [email protected]
Date: Mon Mar 17 11:45:15 2025 -0700

Allow One BulkloadTask Do Multiple Manifests  (#12036)

As best as I can tell, adding emptyRange was possible before the change, so I guess even before customized range loading. What you think should be done here? Should we add it as an invariant -- no entry unless data? Will that break us elsewhere? Could do as a follow-on?

Copy link
Member

@kakaiu kakaiu Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it as an invariant. Do Sev30 at storage server side if the invariant is broken. Do some check and enforce the invariant at DD side. We can do this in the next PR. Thanks!

@kakaiu kakaiu self-requested a review November 1, 2025 18:06
int64_t maxVersionOffset = 1e6) {
state QuietDatabaseChecker checker(isGeneralBuggifyEnabled() ? 4000.0 : 1500.0);
int64_t maxVersionOffset = 1e6,
double quiescentWaitTimeout = 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to add this quiescentWaitTimeout? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was hardcoded. Wanted to make it configurable per test. We were timing out in the consistency check. The s3 stuff w/ chaos -- especially medium and heavy -- were timing out.... Takes longer. The s3 stuff was tamed some subsequentially with less retries and less time between. I might be able to make it work inside the hardcodings? (Follow on?)

Copy link
Member

@kakaiu kakaiu Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. According to the definition of QuietDatabaseChecker indicating the input is maxDDRunTime. If DD timed out, there is a ddGotStuck assertion failure. So, probably better to change quiescentWaitTimeout to maxDDRunTime.

The maxDDRunTime is different from the option of quiescentWaitTimeout for consistency check workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. You are right. Let me change parameter name....

// This allows adding->start() to be called inline with CSK.
try {
if (conductBulkLoad) {
TraceEvent(SevDebug, "BulkLoadFetchKeysBeforeCoreStarted", data->thisServerID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably adding "SS" to the prefix which helps you to quickly find which role triggers the event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thank you (I had it on a few but not all)

The parameter name 'quiescentWaitTimeout' was misleading. It actually
controls the maximum time Data Distributor can run before being
considered stuck (triggers ddGotStuck assertion), not a general
quiescence timeout.

Renamed throughout the codebase.

This makes it clear the parameter is specifically for DD timeout checking.

Add 'SS' prefix to log tags in storageserver.actor.cpp trace additions.
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 62412c5
  • Duration 0:04:17
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 62412c5
  • Duration 0:04:24
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 62412c5
  • Duration 0:04:24
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 62412c5
  • Duration 0:04:27
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 62412c5
  • Duration 0:04:34
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 62412c5
  • Duration 0:35:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 62412c5
  • Duration 1:03:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: d50c4f3
  • Duration 0:24:53
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: d50c4f3
  • Duration 0:44:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: d50c4f3
  • Duration 0:51:01
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: d50c4f3
  • Duration 0:57:58
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: d50c4f3
  • Duration 1:03:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: d50c4f3
  • Duration 1:11:27
  • Result: ❌ FAILED
  • Error: Error while executing command: TEST_USERNAME=fdb-pr-${CODEBUILD_BUILD_NUMBER} make -kj -C tests foundationdb-pr-tests. Reason: exit status 2
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants