Skip to content

Upstream: d99d6a34af5a21755ea210ba4ed9dada9399a1d1#754

Merged
kgyrtkirk merged 11 commits intomasterfrom
up-d99d6a34af5a21755ea210ba4ed9dada9399a1d1
Apr 1, 2025
Merged

Upstream: d99d6a34af5a21755ea210ba4ed9dada9399a1d1#754
kgyrtkirk merged 11 commits intomasterfrom
up-d99d6a34af5a21755ea210ba4ed9dada9399a1d1

Conversation

@kgyrtkirk
Copy link
Copy Markdown
Owner

No description provided.

ektravel and others added 11 commits March 25, 2025 11:53
* Update StringUtils.replace() after fix in JDK9

* Upgrade optimized string replace algorithm

* Update methods by re-using declared StringUtils#replace method

* Replace hard-coded UTF-8 encodings with StandardCharsets
…tionals; cleanup framework init (apache#17829)

* cleans up `SqlTestFramework` initialization to leave the `OverrideModule` empty - so that tests could more easily take over parts
* remove the `QueryComponentSupplier#createEngine`  factory method - instead uses a `Class<SqlEngine>` and use the `injector` to initialize it
* enables the usage of `!disabled <supplier> <message>` - to mark cases which are not yet supported with a specific configuration for some reason
* fixes that `datasets` was not respecting the `rollup` specification of the ingest
* enables to use `MultiComponentSupplier` backed tests - these will turn into matrix tests over multiple componentsuppliers - enabling running the same testcase in different scenarios
Description
-----------
apache#17653 introduces a cache for segment metadata on the Overlord.
This patch is a follow up to that to make the cache more robust, performant and debug-friendly.

Changes
---------
- Do not cache unused segments
This significantly reduces sync time in cases where the cluster has a lot of unused segments.
Unused segments are needed only during segment allocation to ensure that a duplicate ID is not allocated.
This is a rare DB query which is supported by sufficient indexes and thus need not be cached at the moment.
- Update cache directly when segments are marked as unused to avoid race conditions with DB sync.
- Fix NPE when using segment metadata cache with concurrent locks.
- Atomically update segment IDs and pending segments in a `HeapMemoryDatasourceSegmentCache`
using methods `syncSegmentIds()` and `syncPendingSegments()` rather than updating one by one.
This ensures that the locks are held for a shorter period and the update made to the cache is atomic.

Main updated classes
----------------------
- `IndexerMetadataStorageCoordinator`
- `OverlordDataSourcesResource`
- `HeapMemorySegmentMetadataCache`
- `HeapMemoryDatasourceSegmentCache`

Cleaner cache sync
--------------------
In every sync, the following steps are performed for each datasource:

- Retrieve ALL used segment IDs from metadata store
- Atomically update segment IDs in cache and determine list of segment IDs which need to be refreshed.
- Fetch payloads of segments that need to be refreshed
- Atomically update fetched payloads into the cache
- Fetch ALL pending segments
- Atomically update pending segments into the cache
- Clean up empty intervals from datasource caches
Prior to this patch, an offset specified on a groupBy that itself has an
inner groupBy would lead to an error like "Cannot push down offsets". This
happened because of a violated assumption: the processing logic assumes that
offsets have been pushed into limits (so limit pushdown optimizations can
safely be used).

This patch adjusts processing to incorporate offsets into limits during
processing of subqueries. Later on, in post-processing, offsets are applied
as written.
…ncing (apache#17824)

* Do not use segment metadata cache until leader has synced

* Read from cache only when synced, but write even if sync is pending

* Fix compilation

* Fix checkstyle, test

* Revert some extra changes

* Add 3 modes of cache usage

* Move enum to SegmentMetadataCache

* Run tests in all 3 cache modes

* Fix docs and IT configs

* Fix config binding

* Remove forbidden api

* Fix typos, docs and enum casing

* Fix doc
This PR adds the sql-native portion of the json, array, and aggregation function tests to quidem.  It adds a total of 9965 queries, with 6752 positive tests and 3213 negative tests.
}

default Set<String> getTableNames()
default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'segmentMetadataCache' is never used.

Copilot Autofix

AI about 1 year ago

To fix the problem, we should remove the unused segmentMetadataCache parameter from the getTableNames method. This will simplify the method's interface without changing its existing functionality. The changes should be made in the DruidSchemaManager interface in the file sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java.

Suggested changeset 1
sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java b/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
--- a/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
+++ b/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchemaManager.java
@@ -62,3 +62,3 @@
 
-  default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)
+  default Set<String> getTableNames()
   {
EOF
@@ -62,3 +62,3 @@

default Set<String> getTableNames(BrokerSegmentMetadataCache segmentMetadataCache)
default Set<String> getTableNames()
{
Copilot is powered by AI and may make mistakes. Always verify output.
}

@SuppressWarnings("unused")
protected void validateFrameworkConfig(SqlTestFrameworkConfig cfg)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'cfg' is never used.

Copilot Autofix

AI about 1 year ago

To fix the problem, we should remove the unused parameter cfg from the validateFrameworkConfig method. This will simplify the method signature and eliminate the unnecessary parameter. The method will remain a no-op, but without the unused parameter.

  • Remove the parameter cfg from the validateFrameworkConfig method.
  • Update the method signature to reflect the removal of the parameter.
Suggested changeset 1
sql/src/test/java/org/apache/druid/quidem/DruidQuidemTestBase.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/sql/src/test/java/org/apache/druid/quidem/DruidQuidemTestBase.java b/sql/src/test/java/org/apache/druid/quidem/DruidQuidemTestBase.java
--- a/sql/src/test/java/org/apache/druid/quidem/DruidQuidemTestBase.java
+++ b/sql/src/test/java/org/apache/druid/quidem/DruidQuidemTestBase.java
@@ -208,3 +208,3 @@
   @SuppressWarnings("unused")
-  protected void validateFrameworkConfig(SqlTestFrameworkConfig cfg)
+  protected void validateFrameworkConfig()
   {
EOF
@@ -208,3 +208,3 @@
@SuppressWarnings("unused")
protected void validateFrameworkConfig(SqlTestFrameworkConfig cfg)
protected void validateFrameworkConfig()
{
Copilot is powered by AI and may make mistakes. Always verify output.
@kgyrtkirk kgyrtkirk merged commit d99d6a3 into master Apr 1, 2025
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants