Skip to content

apache/master->master: 9cdedcddc52f6de1c70a8c52568b4f051743d557#1308

Open
kgyrtkirk wants to merge 54 commits into
masterfrom
sync/apache-9cdedcddc52f6de1c70a8c52568b4f051743d557
Open

apache/master->master: 9cdedcddc52f6de1c70a8c52568b4f051743d557#1308
kgyrtkirk wants to merge 54 commits into
masterfrom
sync/apache-9cdedcddc52f6de1c70a8c52568b4f051743d557

Conversation

@kgyrtkirk

Copy link
Copy Markdown
Owner

No description provided.

capistrant and others added 30 commits May 22, 2026 10:13
* fix unit tests, bump actions-timeline

CI is failing to startup to run unit tests, complaining about actions-timeline version not being allowed, switched to latest per https://github.com/apache/infrastructure-actions/blob/main/actions.yml

* fix S3InputSourceTest
changes:
* add `PartialSegmentMetadataCacheEntry` a `CacheEntry` that range-reads the V10 header on mount, constructs `PartialSegmentFileMapperV10`, and shrinks its reservation to actual on-disk size
* add `PartialSegmentBundleCacheEntry` and `PartialSegmentBundleCacheEntryIdentifier` are `CacheEntry` associated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via the `StorageLocation` methods (weak reference holds on the parent cache entries) and reference-counted usage references
* add `PartialSegmentCacheBootstrap` a helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundles
* add `ResizableCacheEntry` interface and `StorageLocation.adjustReservation` (shrink-only) so the metadata entry can tighten its reservation post-mount
* rename `SegmentFileBuilder.startFileGroup` → `startFileBundle`; introduce `ROOT_BUNDLE_NAME` as the default bundle for containers written without an explicit declaration                                                              * rename json field `SegmentFileContainerMetadata.fileGroup` → `bundle`; now non-null via getter, normalizes to `ROOT_BUNDLE_NAME` in the constructor, default value omitted from JSON using a custom `JsonInclude` filter
* Extract shared `DirectoryBackedRangeReader` and `CountingRangeReader` test helpers; consolidate duplicates across processing + server tests
…#19491) (apache#19497)

OrcInputFormat.initialize() — which swaps Thread.currentThread().setContextClassLoader() and calls FileSystem.get(conf) — was invoked on every createReader() call. When a ParallelIndexTask runs multiple ORC subtasks concurrently in the same JVM (as in embedded tests)
* Add default value for thread enabling

* Peon disable thread renaming

* Add benchmark query types

* Add groupby benchmark

* Specify query type

* Docs for thread
* Bump jackson to 2.21.3

Jackson 2.21 (issue apache#1381) changed the default resolution of
@JacksonInject when combined with @JsonProperty on the same parameter:
the injected value now wins over the JSON value, where 2.20 treated
the inject as a fallback used only when JSON did not supply one.

DruidNode's serviceName, port, and tlsPort parameters carry both
annotations, with JSON expected to win when supplied — this is how
DruidNode JSON config files have always worked. Add the explicit
useInput = OptBoolean.TRUE to restore that contract.

A repo-wide audit confirmed DruidNode's three parameters are the only
sites in Druid where @JacksonInject and @JsonProperty annotate the
same parameter; everywhere else the annotations are on distinct
parameters and are unaffected.

Also adds the previously-missing license entry for org.jspecify:jspecify
1.0.0 in extensions-core/kubernetes-extensions, which the
check-licenses dependency report flagged.

* Preserve @JacksonInject metadata in GuiceAnnotationIntrospector

findInjectableValue was returning JacksonInject.Value.forId(id), which
strips useInput and optional from the original annotation. Production
deserialization happens to remain correct under jackson 2.21 because
AnnotationIntrospectorPair.findInjectableValue falls back to the
secondary (default Jackson) introspector and merges the recovered
useInput onto the primary's Value via withUseInput.

That fallback is undocumented as part of the introspector contract and
would silently regress if the pair semantics change, or if this
introspector were ever installed standalone for a special-purpose
mapper. Construct the Value via JacksonInject.Value.from(annotation)
.withId(id) so the introspector returns a complete Value on its own
and no longer relies on the pair to fix it up.

The annotation lookup is hoisted to the top of findInjectableValue so
the non-null contract between it and findGuiceInjectId is explicit —
findGuiceInjectId now documents the precondition and trusts the caller
to verify, eliminating the duplicate getAnnotation call.

Defensive cleanup motivated by FasterXML/jackson-databind#1381; no
observable behavior change.
…#19477)

* resetOffsetsAndBackfill using bounded stream supervisor

* Reject non-positive backfillTaskCount

* Reset supervisor after backfill Supervisor has already been started

* Add helper method specHasConcurrentLocks

* Fix doc reference

* Move validations into helper function

* Add embedded-test for resetSupervisorAndBackfill

* Remove flaky waitUntilPublishedRecordsAreIngested

* Update KafkaBoundedSupervisorTest.java

* Wait for supervisor to be RUNNING

* Use checkpointed offset if > requested reset offset to prevent duplicate ingestion

* Update KafkaBoundedSupervisorTest.java

* Revert "Use checkpointed offset if > requested reset offset to prevent duplicate ingestion"

resetOffsetsForwardOnly does not fully close the race it targets (the write is
still unconditional) and the duplicate scenario it addresses is narrower than
the overlap case, which cannot be solved without suspending the main supervisor.
Accepting the limitation and documenting it is preferable to the added complexity.

This reverts commit 89b5fec.

* Doc update - duplication notice and Kinesis callout

* Rename endpoint from resetOffsetsAndBackfill to resetToLatestAndBackfill

* Update test name to reflect new endpoint

* Address clean up from review comments

* Log out start/end offsets

* Add abstract createBackfillSpec

* Unit test createBackfillSpec

* Fix deprecation notices

* Rename functions to align with new endpoint name

* Add null check and rename for consistency
Caffeine 3 raised the Java baseline to 11, tightened the AsyncCache
surface, and replaced size-LRU eviction with W-TinyLFU with explicit
admission control. The Caffeine APIs Druid uses (Cache, Caffeine
builder, Weigher, CacheStats) are stable across the transition.

Errorprone 2.49.0 is required because caffeine 3.2.4 pulls
error_prone_annotations 2.49.0 transitively, which violates the
requireUpperBoundDeps enforcer rule without the bump.

CaffeineCacheTest.testSizeEviction is rewritten for W-TinyLFU: the old
test pre-read key1 multiple times before putting key2, biasing the
admission policy to keep key1 and reject val2, so the assertion that
key1 was evicted no longer holds. The rewrite avoids the pre-reads
and asserts only that eviction happened and the cache stayed under
bound, mirroring caffeine's own EvictionTest patterns.

Also adds the previously-missing license entry for org.jspecify:jspecify
1.0.0 in extensions-core/kubernetes-extensions, which the
check-licenses dependency report flags. This was missing pre-bump and
is unrelated to caffeine/errorprone, but the CI license check fails
without it, so it is included here to keep the PR green.
* Web console support for resetToLatestAndBackfill

* Make pretty

* Update supervisor-reset-to-latest-dialog.tsx
…with disk utilization (apache#19422)

The existing linear penalization factor is still ineffective in large skew scenarios where the CostBalancerStrategy's cost forces a move/load (even with the utilization-based penalty). This switches the penalty to scale exponentially with the disk utilization, ensuring that near-full historicals are penalized. This is also particularly helpful when the size of segments on the cluster vary wildly.

This also marks the diskNormalized strategy as ready for production use.
…ocessors (apache#19536)

changes:
* `AWSClientConfig` now defaults `maxConnections` to scale with available processors `(max(50, 4 * cores))` to be in sync with virtual storage mode historical download thread pool size
* tests with artificial `RuntimeInfo` to cover the config scaling
Fixes a typo in the error message "python interpreter not found" when running bin/start-druid with no installed python interpreter. The error message previously read "python interepreter not found".
This patch updates KafkaConsumerMonitor to accept the task's
metric builder, which includes supervisorId as well as other dimensions
from IndexTaskUtils.setTaskDimensions.
This patch adds the setting "backgroundFetchExternalFiles". When set,
cloud storage files referenced by ExternalSegment (EXTERN) are fetched
asynchronously into the task's storage locations. The setting defaults
to true.

To support this, new infrastructure is added:

1) VirtualStorageManager, a layer on top of StorageLocation that
   provides a simple "fetch and cache a file" API.

2) StorageLoadingThreadPool, an extraction of the thread pool from
   SegmentLocalCacheManager so it can be shared with
   VirtualStorageManager.

3) AsyncResource, a Future-like utility that provides better tools for
   managing Closeable resources. It is used by VirtualStorageManager to
   provide the asynchronously-fetched file handles.
…#19553)

In some cases, cancellation is triggered by an exception (rather than a
non-exceptional reason, like timeout or user request). This patch
retains the exception and includes it in the query report.
…pache#19555)

Most metrics are tracked by the StorageLocation itself, but it needs
help from the higher level layer to track load completion.
…asks (apache#19540)

Streaming ingestion tasks were incorrectly reporting thrown-away reason as null for filtered rows.
S3 segment pushes that use the AWS SDK v2 transfer manager can resolve credentials on the async upload path. If a file-session credential refresh, container credential lookup, or IMDS lookup is temporarily unavailable, the SDK reports an SdkClientException such as 'Unable to load credentials from any of the providers in the chain'.

Druid's S3 push path already wraps uploads in retryS3Operation, but these credential-provider failures were not classified as recoverable after the SDK v2 migration. That made an intermittent credential miss fail the task immediately instead of using the existing retry budget.
… level in addition to context (apache#19559)

* Make segmentLoadAheadCount able to be configured at worker task level in addition to context

* fixups based on review
changes:                                                                                                                                                                                                                                 * adds new `S3SegmentRangeReader` that wraps `ServerSideEncryptingAmazonS3` + bucket + key prefix and issues closed-range `GetObjectRequests` against `keyPrefix + filename`. Returned stream is wrapped in a `RetryingInputStream` with the `S3Utils.S3RETRY` predicate (the same retry policy `S3DataSegmentPuller` uses for full-segment downloads) so a transient mid-stream error reopens at the byte offset where it failed and resumes with a fresh range request for the remaining bytes, rather than restarting the whole read.
* New `rangeable` boolean on `S3LoadSpec` stamped by the pusher at write time. `S3LoadSpec.openRangeReader()` returns a reader iff the flag is true and the key isn't .zip
* `S3DataSegmentPusher.pushNoZip` stamps rangeable=true when binaryVersion is `V10_VERSION`, false otherwise. `pushZip` omits the field
Fixes apache#19563.

Description
This PR hardens the Consul-backed embedded tests against startup races where the Consul container has started but the host-mapped Consul API is not yet reliably accepting requests.
…pache#19565)

* fix: empty loads for asymmetric cluster-group partial-load matchers

* fix test

* ensure that rule is compatible with clustering before doing empty loads

* broken javadoc link
clintropolis and others added 14 commits June 18, 2026 13:42
…pache#19535)

changes:
* adds `acquirePartialSegment` / `acquireCachedPartialSegment` to `SegmentCacheManager` and `SegmentManager` to allow callers to opt-in to async partial segments; MSQ `RegularLoadableSegment` uses the new partial path
* `PartialSegmentMetadataCacheEntry`, `PartialSegmentBundleCacheEntry` are now wired into `SegmentLocalCacheManager`, along with added `PartialBundleAcquirer` helper to pass things like download thread pool and the ability to acquire reference holds on cache entrie
* adds `PartialQueryableIndexSegment`, `PartialQueryableIndexCursorFactory`, `V10TimeBoundaryInspector` for references acquired from the now wired up metadata cache entry, implementing the async cursor holder contract
* moved `DirectoryBackedRangeReader` out of tests to use as the implementation for `LocalLoadSpec` range reader.
* add `FilteredCursorFactory`/`RestrictedCursorFactory` async cursor implementations
* adds config `druid.segmentCache.virtualStoragePartialDownloadsEnabled to enable feature, of by default
…or (apache#19592)

* fix(delta): drain all batches per scan file in DeltaInputSourceIterator

Fixes apache#18606 — only 1024 rows ingested per Parquet file when using the
Delta Lake input source.

Root cause: filteredBatchIterator was a local variable inside hasNext().
When the method returned true after the first non-empty batch of a file,
the iterator went out of scope. The next hasNext() call advanced to the
next file, skipping all remaining batches of the current file.

With Delta kernel's default batch size of 1024 rows, this caused exactly
1024 rows × N files to be ingested regardless of actual file size.

Fix: promote filteredBatchIterator to a field (currentFileIterator) so
it survives across hasNext() calls and all batches of a file are drained
before advancing to the next file.

Also fixed close() to properly close currentFileIterator and drain all
remaining file iterators.

* test(delta): add regression test for apacheGH-18606 batch drain fix

Adds a Delta table with 2 Parquet files × 2000 rows (total 4000) where
each file exceeds the Delta kernel's default batch size of 1024 rows.

Without the fix: DeltaInputSourceIterator returns 1024 × 2 = 2048 rows.
With the fix:    all 4000 rows are returned correctly.

Test: DeltaInputSourceBatchDrainTest.testAllRowsReturnedWhenFilesExceedOneBatch

* test(delta): add BatchDrainRegressionTests to DeltaInputSourceTest for apacheGH-18606

Adds LargeRowGroupDeltaTable (2 files × 2000 rows = 4000 total) and a
BatchDrainRegressionTests inner class inside DeltaInputSourceTest following
the same pattern as existing test classes.

The regression test fails with the bug (returns 1024 × 2 = 2048 rows)
and passes with the fix (returns all 4000 rows).

* style: add missing newline at end of LargeRowGroupDeltaTable.java

* fix(delta): close drained file iterator before advancing

Each per-file iterator from Scan.transformPhysicalData() owns an
underlying Parquet reader/file handle. hasNext() overwrote
currentFileIterator with the next file without closing the exhausted
one, leaking a handle per completed file on multi-file tables (close()
only closed the last and the never-started iterators). Now close the
drained iterator before advancing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The description for `druid.broker.balancer.type` was truncated mid-sentence
("...the fewest number of active connections to"). Complete the sentence and
fix the "random choose" -> "random chooses" grammar.

Co-authored-by: Cursor <cursoragent@cursor.com>
changes:
* `CompactionTask` now can specify `baseTable` spec to create clustered segments
* `DataSourceMSQDestination` can now specify a `baseTable` so MSQ can generate clustered segments (or any other future baseTable spec)
* adds `baseTable` to 'inline' and reindexing template compaction configs to feed to compaction task for auto-compaction
* adds `baseTable`, `segmentGranularitySpec` to `CompactionState`, `CompactionStatus` is `baseTable` aware for checks
* guards to prevent `baseTable` from working with 'native' compaction and direct towards MSQ compaction
Changes:
- Add `LatchableEmitter.waitForEventAggregate` which waits for a specific timeout
- Add `StreamIndexTestBase.waitUntilPublishedRecordsAreIngested` which waits for a specific timeout
…e#19596)

This change adds maxValuesPerDimension, an optional safety cap on distinct values recorded per dimension per segment in the dim_value_set shard spec.
…pache#19566)

* Update Netty to 4.2.14.Final to address multiple CVEs

This update addresses 17 critical and high severity CVEs in Netty:

- CVE-2026-42583: Lz4FrameDecoder resource exhaustion (HIGH)
- CVE-2026-42579: HTTP response desynchronization (HIGH)
- CVE-2026-42585: MQTT resource exhaustion (MODERATE)
- CVE-2026-33870: HTTP request smuggling via quoted strings (HIGH)
- CVE-2025-67735: DNS codec validation bypass (HIGH)
- CVE-2026-42587: HTTP/3 QPACK unbounded allocation (HIGH)
- CVE-2026-41417: Epoll transport DoS via RST (HIGH)
- CVE-2026-42584: HTTP request smuggling via Transfer-Encoding (MODERATE)
- CVE-2026-42581: HTTP request smuggling via chunk size parsing (MODERATE)
- CVE-2026-42580: Redis codec CRLF injection (MODERATE)
- CVE-2026-33871: HTTP header injection via HttpProxyHandler (LOW)
- CVE-2026-42582: Additional HTTP codec vulnerabilities
- CVE-2026-44248: MQTT 5 decoder resource exhaustion (HIGH)
- CVE-2026-42586: Additional resource consumption issues
- CVE-2025-59419: Security improvements
- CVE-2026-42578: Additional security fixes
- CVE-2026-42577: Additional security fixes

Updated netty4.version from 4.2.12.Final to 4.2.14.Final.
All CVEs are fixed in version 4.2.13.Final and later.

* Update licenses.yaml for Netty 4.2.14.Final

* Add missing netty-codec-classes-quic to licenses.yaml

The Netty 4.2.14.Final upgrade introduced a new transitive dependency
io.netty:netty-codec-classes-quic which was missing from the licenses.yaml
file, causing license validation failures in CI.

This module provides QUIC protocol codec support and is licensed under
Apache License version 2.0, consistent with all other Netty modules.

* Upgrade Netty to 4.2.15.Final to address additional CVEs

Updates Netty from 4.2.14.Final to 4.2.15.Final to address 26 additional
critical security vulnerabilities discovered after the 4.2.14 release.

* Properly document Javassist MPL 1.1 triple-licensing

Javassist is triple-licensed under MPL 1.1, LGPL 2.1, and Apache License 2.0.
Apache Druid uses it under Apache 2.0 terms.

Changes:
1. Add MPL 1.1 as a recognized license in check-licenses.py
2. Update Javassist entry in licenses.yaml to declare MPL 1.1 as its
   primary license with a notice explaining the triple-licensing and that
   we use it under Apache 2.0 terms

This addresses review feedback to properly canonicalize MPL 1.1 to its
own license name rather than hiding it by mapping to Apache 2.0.

Addresses: Review comment from FrankChen021 on PR apache#19566

* Update netty-tcnative to 2.0.77.Final for azure-extensions

Netty 4.2.15.Final pulls in netty-tcnative 2.0.77.Final as a transitive
dependency in druid-azure-extensions. Update licenses.yaml to register
the new version.

---------

Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
changes:
* allow `AggregateProjectionSpec` on a clustered base table; remove the build and merge time guards that rejected it
* persist projections in `IndexMergerV10.makeClusteredIndexFiles` via the shared `makeProjections(...)`
* fix a bug on a non-clustering dictionary column over the clustered base table which conflated values (per-group-local dictionary IDs reused across the `ConcatenatingCursor` causing values to inappropriately group together). Force value-based grouping by reporting non-clustering columns as non-dictionary-encoded, on capabilities and on the selector (cardinality/name-lookup)
* tests: build and query tests for projections for both incremental and persisted segments, and also added first E2E coverage for clustered segments (`ClusteredSegmentProjectionQueryTest`, native ingestion + projection-vs-noProjections queries)
* remove unused vector selectors
…pache#19615)

When a concurrent REPLACE upgrades a still-appending streaming task, the
  upgraded (new-version) copy of each append segment previously adopted the
  pending segment's plain NumberedShardSpec, dropping the
  DimensionValueSetShardSpec stamped at publish time. This made upgraded
  segments unprunable by the broker.

  The upgraded copy now takes its partition number and core-partition count
  from the pending segment while carrying forward the original segment's
  partitionDimensionValues when it is a DimensionValueSetShardSpec. The
  value-set guarantee holds because the upgraded copy serves the same rows
  as the original append segment.
Retry the task submission while it fails with these transient auth
errors so the assertions only run once the Broker's auth cache reflects
the test setup. Other failures are not retried, so real errors still
fail fast.
…tch")

S3Utils.S3RETRY recursed an SSLException into its non-IOException cause
(AEADBadTagException from a corrupted TLS record failing GCM auth),
classifying a transient transport failure as non-retriable. Under MSQ this
aborts the whole query. Treat SSLException as retriable, ahead of the generic
IOException branch. Mirrors apache#11941.
…9547)

* Add supervisor-to-SQL dialog

* Added tests

* feat: address review feedback on supervisor-to-SQL conversion

- Preserve native column types in the EXTERN signature so numeric metric
  and typed dimension fields are not declared as strings
- Apply the supervisor's segment granularity to PARTITIONED BY and query
  granularity via TIME_FLOOR (also used in GROUP BY)
- Preserve the supervisor's inputFormat settings, overriding only the type
- Escape custom timestamp formats with the query-toolkit literal helper
- Clear stale specs in the paste-mode dialog so Generate SQL can't submit
  a hidden supervisor
- Open the converted query in a new workbench tab instead of overwriting
  the active tab
- Add tests for the new behaviors and update snapshots

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor: address review feedback on supervisor-to-SQL conversion

Reuse the existing ingestion-spec converter rather than reimplementing it:
- convertSupervisorToSql now rewrites the supervisor into an index_parallel
  spec (file inputSource/inputFormat, default segment granularity, leading
  dimension clustering) and delegates to convertSpecToSql, so column types,
  granularity, timestamp parsing, and metric aggregation are all shared
- Reuse the IngestionSpec interface instead of a bespoke SupervisorSpec; drop
  the duplicated MetricSpec interface and metric-to-SQL helpers (~400 lines)
- Convert the conversion tests to inline snapshots
- Dialog: use IngestionSpec, fix Blueprint 5 scss namespace (.#{$bp-ns}),
  replace the native select with a Button + Menu dropdown matching the
  console style, and clarify the SQL is a one-time batch (not streaming)
- Don't auto-select the first supervisor; the button shows "Select
  supervisor" until one is chosen
- Remove the single-export index.ts barrel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* style: run prettier on supervisor-to-SQL files

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Kyle Hoondert <kyle.hoondert@imply.io>
@kgyrtkirk kgyrtkirk enabled auto-merge June 24, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.