Skip to content

Upstream: b187f3de8a2fd9ac18d0e1161e901c9237f9b4d9#753

Merged
kgyrtkirk merged 10 commits intomasterfrom
up-b187f3de8a2fd9ac18d0e1161e901c9237f9b4d9
Apr 1, 2025
Merged

Upstream: b187f3de8a2fd9ac18d0e1161e901c9237f9b4d9#753
kgyrtkirk merged 10 commits intomasterfrom
up-b187f3de8a2fd9ac18d0e1161e901c9237f9b4d9

Conversation

@kgyrtkirk
Copy link
Copy Markdown
Owner

No description provided.

ektravel and others added 10 commits March 25, 2025 11:53
* Update StringUtils.replace() after fix in JDK9

* Upgrade optimized string replace algorithm

* Update methods by re-using declared StringUtils#replace method

* Replace hard-coded UTF-8 encodings with StandardCharsets
…tionals; cleanup framework init (apache#17829)

* cleans up `SqlTestFramework` initialization to leave the `OverrideModule` empty - so that tests could more easily take over parts
* remove the `QueryComponentSupplier#createEngine`  factory method - instead uses a `Class<SqlEngine>` and use the `injector` to initialize it
* enables the usage of `!disabled <supplier> <message>` - to mark cases which are not yet supported with a specific configuration for some reason
* fixes that `datasets` was not respecting the `rollup` specification of the ingest
* enables to use `MultiComponentSupplier` backed tests - these will turn into matrix tests over multiple componentsuppliers - enabling running the same testcase in different scenarios
Description
-----------
apache#17653 introduces a cache for segment metadata on the Overlord.
This patch is a follow up to that to make the cache more robust, performant and debug-friendly.

Changes
---------
- Do not cache unused segments
This significantly reduces sync time in cases where the cluster has a lot of unused segments.
Unused segments are needed only during segment allocation to ensure that a duplicate ID is not allocated.
This is a rare DB query which is supported by sufficient indexes and thus need not be cached at the moment.
- Update cache directly when segments are marked as unused to avoid race conditions with DB sync.
- Fix NPE when using segment metadata cache with concurrent locks.
- Atomically update segment IDs and pending segments in a `HeapMemoryDatasourceSegmentCache`
using methods `syncSegmentIds()` and `syncPendingSegments()` rather than updating one by one.
This ensures that the locks are held for a shorter period and the update made to the cache is atomic.

Main updated classes
----------------------
- `IndexerMetadataStorageCoordinator`
- `OverlordDataSourcesResource`
- `HeapMemorySegmentMetadataCache`
- `HeapMemoryDatasourceSegmentCache`

Cleaner cache sync
--------------------
In every sync, the following steps are performed for each datasource:

- Retrieve ALL used segment IDs from metadata store
- Atomically update segment IDs in cache and determine list of segment IDs which need to be refreshed.
- Fetch payloads of segments that need to be refreshed
- Atomically update fetched payloads into the cache
- Fetch ALL pending segments
- Atomically update pending segments into the cache
- Clean up empty intervals from datasource caches
Prior to this patch, an offset specified on a groupBy that itself has an
inner groupBy would lead to an error like "Cannot push down offsets". This
happened because of a violated assumption: the processing logic assumes that
offsets have been pushed into limits (so limit pushdown optimizations can
safely be used).

This patch adjusts processing to incorporate offsets into limits during
processing of subqueries. Later on, in post-processing, offsets are applied
as written.
…ncing (apache#17824)

* Do not use segment metadata cache until leader has synced

* Read from cache only when synced, but write even if sync is pending

* Fix compilation

* Fix checkstyle, test

* Revert some extra changes

* Add 3 modes of cache usage

* Move enum to SegmentMetadataCache

* Run tests in all 3 cache modes

* Fix docs and IT configs

* Fix config binding

* Remove forbidden api

* Fix typos, docs and enum casing

* Fix doc
}

default Set<String> getTableNames()
default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'segmentMetadataCache' is never used.

Copilot Autofix

AI about 1 year ago

To fix the problem, we should remove the unused parameter segmentMetadataCache from the getTableNames method. This will simplify the method signature and remove any confusion about the parameter's purpose.

  • Remove the segmentMetadataCache parameter from the getTableNames method in the DruidSchemaManager interface.
  • Update the method signature to reflect this change.
Suggested changeset 1
sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java b/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
--- a/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
+++ b/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
@@ -62,3 +62,3 @@
 
-  default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)
+  default Set<String> getTableNames()
   {
EOF
@@ -62,3 +62,3 @@

default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)
default Set<String> getTableNames()
{
Copilot is powered by AI and may make mistakes. Always verify output.
}

@SuppressWarnings("unused")
protected void validateFrameworkConfig(SqlTestFrameworkConfig cfg)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'cfg' is never used.

Copilot Autofix

AI about 1 year ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@kgyrtkirk kgyrtkirk merged commit b187f3d into master Apr 1, 2025
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants