Skip to content

perf: use explicit column lists in system.local/peers queries#724

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:smaller_local_peers_queries
Draft

perf: use explicit column lists in system.local/peers queries#724
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:smaller_local_peers_queries

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 5, 2026

Replace SELECT * with explicit column lists in qrySystemLocal, qrySystemPeers, and qrySystemPeersV2 to reduce the amount of data transferred from the server during topology discovery and control connection setup. Only columns actually consumed by hostInfoFromMap are now fetched.

This mirrors the same optimization done in the Python driver (scylladb/python-driver#528).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes topology discovery/control connection setup by replacing SELECT * system table reads with explicit column lists, reducing transferred data while still providing the fields consumed by hostInfoFromMap.

Changes:

  • Replaced qrySystemLocal, qrySystemPeers, and qrySystemPeersV2 with explicit column SELECT statements.
  • Updated ring_describer_test.go mocks/metadata to match the new result shapes and column ordering.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
conn.go Switches system table queries from SELECT * to explicit column lists for peers/local discovery.
ring_describer_test.go Updates mocked result metadata and row data to match the new explicit column projections.

Comment thread conn.go Outdated
Comment thread conn.go
@mykaul mykaul force-pushed the smaller_local_peers_queries branch from a038746 to 9ad7e14 Compare March 7, 2026 20:07
@mykaul mykaul marked this pull request as draft March 8, 2026 20:36
@mykaul mykaul force-pushed the smaller_local_peers_queries branch from 9ad7e14 to 314b9b3 Compare March 16, 2026 16:49
@mykaul mykaul requested a review from Copilot March 16, 2026 16:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread host_source.go
@mykaul mykaul force-pushed the smaller_local_peers_queries branch from 3e4fd8c to f02f892 Compare April 6, 2026 17:27
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Force-pushed with the following changes:

  • Rebased on top of current origin/master (picks up d15f511)
  • Squashed the two commits into a single commit
  • Restored Graph() method with a deprecation comment, matching the treatment of WorkLoad() and DSEVersion(). This preserves backward API compatibility — callers won't get a compile error on upgrade. The method always returns false for driver-discovered hosts since the graph column is no longer queried.
  • The case "graph": branch in hostInfoFromMap remains removed (correct — column is not fetched)
  • All unit tests pass (go test -tags unit -count=1 ./...)

Replace SELECT * with explicit column lists in qrySystemLocal,
qrySystemPeers, and qrySystemPeersV2 to reduce the amount of data
transferred from the server during topology discovery and control
connection setup. Only columns actually consumed by hostInfoFromMap
are now fetched.

Add isScyllaConn() to ConnInterface so ring_describer can choose the
correct peers query: Scylla omits preferred_ip (column does not exist),
while Cassandra includes it.

Remove dead DSE-specific case branches (workload, graph, dse_version)
from hostInfoFromMap, since these columns are no longer queried.
Deprecate Graph(), WorkLoad(), and DSEVersion() methods on HostInfo
with doc comments, preserving backward API compatibility.

Fix a stale comment in awaitSchemaAgreement that listed the scan column
order incorrectly.

This mirrors the same optimization done in the Python driver
(scylladb/python-driver#528).
@mykaul mykaul force-pushed the smaller_local_peers_queries branch from f02f892 to 826a9df Compare April 6, 2026 18:27
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

Follow-up force-push:

  • Removed graph field from HostInfo struct entirely — Graph() now returns false directly (no field needed since it was never populated via builder or queries)
  • Dropped mutex locks from Graph(), WorkLoad(), and DSEVersion() — these fields are only written during construction (via HostInfoBuilder.Build()) and never mutated afterward, so the locks were unnecessary overhead. workload and dseVersion fields are kept since HostInfoBuilder still exposes them as exported fields.

All unit tests pass.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 6, 2026

CI Failure Analysis

The CI failure on Integration Tests On Scylla (LTS-PRIOR) is not caused by this PR's changes. Both failing tests are transient/flaky failures:

1. TestNewConnectWithLowTimeout/100ns/MetadataSchemaRequestTimeout/Query_from_control_connection

This test sets a 100ns timeout and expects queries to fail due to the timeout. It failed because the query succeeded faster than the timeout. This is a timing-dependent flaky test — a previous run of this same PR had LTS-PRIOR passing but Cassandra 4-LATEST failing with the same test.

2. TestSliceMapMapScanCollectionTypes

Failed during TRUNCATE with a Scylla-internal error: "Another global topology request is ongoing, please retry." This is a transient server-side contention issue unrelated to which columns are selected from system tables.

Evidence

  • A previous run of this exact PR (run 24042406890) had LTS-PRIOR passing but a different target failing with the same timeout test.
  • The latest master run passed all Scylla integration tests including LTS-PRIOR.

Recommendation

Re-run the failed CI job. No code changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants