fix(web-analytics): use lightweight replica sync before partition swap#62537
Closed
andyzzhao wants to merge 1 commit into
Closed
fix(web-analytics): use lightweight replica sync before partition swap#62537andyzzhao wants to merge 1 commit into
andyzzhao wants to merge 1 commit into
Conversation
The pre-aggregation job bulk inserts into a staging table, syncs every data replica, then swaps partitions into the live table with REPLACE PARTITION. The swap clones staging parts as they are, merged or not, so the sync only needs replicated part fetches to finish. A plain SYSTEM SYNC REPLICA also waits for the background merges the bulk insert just scheduled, which blocks each run on the slowest replica for no benefit. SYNC REPLICA LIGHTWEIGHT waits exactly for the data-movement entries. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
Reviews (1): Last reviewed commit: "fix(web-analytics): use lightweight repl..." | Re-trigger Greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The web analytics pre-aggregation assets follow the pattern: recreate staging table, bulk insert,
SYSTEM SYNC REPLICAon every data node, thenALTER TABLE ... REPLACE PARTITION FROM staging. The sync exists so that whichever host later executes the swap has all staging parts locally.A plain
SYSTEM SYNC REPLICAwaits for the replica's entire replication queue, which right after a bulk insert includes the freshly scheduled background merges. The job blocks on.result()for the slowest replica, so every run stalls for tens of seconds (sometimes minutes) of pure waiting insystem.query_log, with zero bytes read and zero CPU. This pattern showed up as one of the largest "long wall time, no work" query groups on the cluster, hundreds of executions per day across the hourly assets.The merges are irrelevant to the swap:
REPLACE PARTITION FROMclones the staging part set as it exists, merged or not, and readers of AggregatingMergeTree tables must aggregate across parts anyway.Changes
sync_partitions_on_replicasnow issuesSYSTEM SYNC REPLICA <table> LIGHTWEIGHT. Lightweight mode waits only for data-movement queue entries (GET_PART,ATTACH_PART,DROP_RANGE,REPLACE_RANGE,DROP_PART), i.e. exactly "all inserted parts are present on this replica", and skips waiting on merges and mutations. ClickHouse itself uses lightweight mode internally for the same parts-must-be-present barrier, and its source notes that plain sync differs precisely in also waiting for merges.The
STRICTsyncs in the deletes DAG and overrides manager are deliberate (one asserts an empty queue afterwards) and are not touched.How did you test this code?
I am an agent; automated tests only:
test_sync_partitions_on_replicas_uses_lightweight_syncinproducts/web_analytics/dags/tests/test_web_preaggregated_utils.py, following the file's mock-cluster pattern (asserts the exact SQL executed per host and that the job waits on the futures).test_web_preaggregated_utils.pypasses (7 tests).Automatic notifications
Docs update
🤖 Agent context
Autonomy: Human-driven (agent-assisted)
query_log_archivefor queries with long wall time but near-zero CPU and no cluster-load signature; the staging-table sync was the single most frequent such pattern. Verified in the ClickHouse 26.3 source that lightweight sync waits exactly for part fetches and that the subsequentREPLACE PARTITIONdoes not depend on merges.