apache/master->master: a4d221e21cadda23b209e3d0247e9c8fe70c5c17#1304
Open
kgyrtkirk wants to merge 36 commits into
Open
apache/master->master: a4d221e21cadda23b209e3d0247e9c8fe70c5c17#1304kgyrtkirk wants to merge 36 commits into
kgyrtkirk wants to merge 36 commits into
Conversation
* fix unit tests, bump actions-timeline CI is failing to startup to run unit tests, complaining about actions-timeline version not being allowed, switched to latest per https://github.com/apache/infrastructure-actions/blob/main/actions.yml * fix S3InputSourceTest
changes: * add `PartialSegmentMetadataCacheEntry` a `CacheEntry` that range-reads the V10 header on mount, constructs `PartialSegmentFileMapperV10`, and shrinks its reservation to actual on-disk size * add `PartialSegmentBundleCacheEntry` and `PartialSegmentBundleCacheEntryIdentifier` are `CacheEntry` associated with each file bundle of a v10 segment that sparse-allocates and evicts its containers as a unit; places holds metadata and transitive parent bundle entries holds via the `StorageLocation` methods (weak reference holds on the parent cache entries) and reference-counted usage references * add `PartialSegmentCacheBootstrap` a helper that restores partial-format entries from on-disk layout on historical startup (not wired up yet); cleans orphaned bundles * add `ResizableCacheEntry` interface and `StorageLocation.adjustReservation` (shrink-only) so the metadata entry can tighten its reservation post-mount * rename `SegmentFileBuilder.startFileGroup` → `startFileBundle`; introduce `ROOT_BUNDLE_NAME` as the default bundle for containers written without an explicit declaration * rename json field `SegmentFileContainerMetadata.fileGroup` → `bundle`; now non-null via getter, normalizes to `ROOT_BUNDLE_NAME` in the constructor, default value omitted from JSON using a custom `JsonInclude` filter * Extract shared `DirectoryBackedRangeReader` and `CountingRangeReader` test helpers; consolidate duplicates across processing + server tests
…#19491) (apache#19497) OrcInputFormat.initialize() — which swaps Thread.currentThread().setContextClassLoader() and calls FileSystem.get(conf) — was invoked on every createReader() call. When a ParallelIndexTask runs multiple ORC subtasks concurrently in the same JVM (as in embedded tests)
* Add default value for thread enabling * Peon disable thread renaming * Add benchmark query types * Add groupby benchmark * Specify query type * Docs for thread
* Bump jackson to 2.21.3 Jackson 2.21 (issue apache#1381) changed the default resolution of @JacksonInject when combined with @JsonProperty on the same parameter: the injected value now wins over the JSON value, where 2.20 treated the inject as a fallback used only when JSON did not supply one. DruidNode's serviceName, port, and tlsPort parameters carry both annotations, with JSON expected to win when supplied — this is how DruidNode JSON config files have always worked. Add the explicit useInput = OptBoolean.TRUE to restore that contract. A repo-wide audit confirmed DruidNode's three parameters are the only sites in Druid where @JacksonInject and @JsonProperty annotate the same parameter; everywhere else the annotations are on distinct parameters and are unaffected. Also adds the previously-missing license entry for org.jspecify:jspecify 1.0.0 in extensions-core/kubernetes-extensions, which the check-licenses dependency report flagged. * Preserve @JacksonInject metadata in GuiceAnnotationIntrospector findInjectableValue was returning JacksonInject.Value.forId(id), which strips useInput and optional from the original annotation. Production deserialization happens to remain correct under jackson 2.21 because AnnotationIntrospectorPair.findInjectableValue falls back to the secondary (default Jackson) introspector and merges the recovered useInput onto the primary's Value via withUseInput. That fallback is undocumented as part of the introspector contract and would silently regress if the pair semantics change, or if this introspector were ever installed standalone for a special-purpose mapper. Construct the Value via JacksonInject.Value.from(annotation) .withId(id) so the introspector returns a complete Value on its own and no longer relies on the pair to fix it up. The annotation lookup is hoisted to the top of findInjectableValue so the non-null contract between it and findGuiceInjectId is explicit — findGuiceInjectId now documents the precondition and trusts the caller to verify, eliminating the duplicate getAnnotation call. Defensive cleanup motivated by FasterXML/jackson-databind#1381; no observable behavior change.
…#19477) * resetOffsetsAndBackfill using bounded stream supervisor * Reject non-positive backfillTaskCount * Reset supervisor after backfill Supervisor has already been started * Add helper method specHasConcurrentLocks * Fix doc reference * Move validations into helper function * Add embedded-test for resetSupervisorAndBackfill * Remove flaky waitUntilPublishedRecordsAreIngested * Update KafkaBoundedSupervisorTest.java * Wait for supervisor to be RUNNING * Use checkpointed offset if > requested reset offset to prevent duplicate ingestion * Update KafkaBoundedSupervisorTest.java * Revert "Use checkpointed offset if > requested reset offset to prevent duplicate ingestion" resetOffsetsForwardOnly does not fully close the race it targets (the write is still unconditional) and the duplicate scenario it addresses is narrower than the overlap case, which cannot be solved without suspending the main supervisor. Accepting the limitation and documenting it is preferable to the added complexity. This reverts commit 89b5fec. * Doc update - duplication notice and Kinesis callout * Rename endpoint from resetOffsetsAndBackfill to resetToLatestAndBackfill * Update test name to reflect new endpoint * Address clean up from review comments * Log out start/end offsets * Add abstract createBackfillSpec * Unit test createBackfillSpec * Fix deprecation notices * Rename functions to align with new endpoint name * Add null check and rename for consistency
Caffeine 3 raised the Java baseline to 11, tightened the AsyncCache surface, and replaced size-LRU eviction with W-TinyLFU with explicit admission control. The Caffeine APIs Druid uses (Cache, Caffeine builder, Weigher, CacheStats) are stable across the transition. Errorprone 2.49.0 is required because caffeine 3.2.4 pulls error_prone_annotations 2.49.0 transitively, which violates the requireUpperBoundDeps enforcer rule without the bump. CaffeineCacheTest.testSizeEviction is rewritten for W-TinyLFU: the old test pre-read key1 multiple times before putting key2, biasing the admission policy to keep key1 and reject val2, so the assertion that key1 was evicted no longer holds. The rewrite avoids the pre-reads and asserts only that eviction happened and the cache stayed under bound, mirroring caffeine's own EvictionTest patterns. Also adds the previously-missing license entry for org.jspecify:jspecify 1.0.0 in extensions-core/kubernetes-extensions, which the check-licenses dependency report flags. This was missing pre-bump and is unrelated to caffeine/errorprone, but the CI license check fails without it, so it is included here to keep the PR green.
* Web console support for resetToLatestAndBackfill * Make pretty * Update supervisor-reset-to-latest-dialog.tsx
…with disk utilization (apache#19422) The existing linear penalization factor is still ineffective in large skew scenarios where the CostBalancerStrategy's cost forces a move/load (even with the utilization-based penalty). This switches the penalty to scale exponentially with the disk utilization, ensuring that near-full historicals are penalized. This is also particularly helpful when the size of segments on the cluster vary wildly. This also marks the diskNormalized strategy as ready for production use.
…ocessors (apache#19536) changes: * `AWSClientConfig` now defaults `maxConnections` to scale with available processors `(max(50, 4 * cores))` to be in sync with virtual storage mode historical download thread pool size * tests with artificial `RuntimeInfo` to cover the config scaling
Fixes a typo in the error message "python interpreter not found" when running bin/start-druid with no installed python interpreter. The error message previously read "python interepreter not found".
This patch updates KafkaConsumerMonitor to accept the task's metric builder, which includes supervisorId as well as other dimensions from IndexTaskUtils.setTaskDimensions.
This patch adds the setting "backgroundFetchExternalFiles". When set, cloud storage files referenced by ExternalSegment (EXTERN) are fetched asynchronously into the task's storage locations. The setting defaults to true. To support this, new infrastructure is added: 1) VirtualStorageManager, a layer on top of StorageLocation that provides a simple "fetch and cache a file" API. 2) StorageLoadingThreadPool, an extraction of the thread pool from SegmentLocalCacheManager so it can be shared with VirtualStorageManager. 3) AsyncResource, a Future-like utility that provides better tools for managing Closeable resources. It is used by VirtualStorageManager to provide the asynchronously-fetched file handles.
…#19553) In some cases, cancellation is triggered by an exception (rather than a non-exceptional reason, like timeout or user request). This patch retains the exception and includes it in the query report.
…pache#19555) Most metrics are tracked by the StorageLocation itself, but it needs help from the higher level layer to track load completion.
…asks (apache#19540) Streaming ingestion tasks were incorrectly reporting thrown-away reason as null for filtered rows.
S3 segment pushes that use the AWS SDK v2 transfer manager can resolve credentials on the async upload path. If a file-session credential refresh, container credential lookup, or IMDS lookup is temporarily unavailable, the SDK reports an SdkClientException such as 'Unable to load credentials from any of the providers in the chain'. Druid's S3 push path already wraps uploads in retryS3Operation, but these credential-provider failures were not classified as recoverable after the SDK v2 migration. That made an intermittent credential miss fail the task immediately instead of using the existing retry budget.
… level in addition to context (apache#19559) * Make segmentLoadAheadCount able to be configured at worker task level in addition to context * fixups based on review
changes: * adds new `S3SegmentRangeReader` that wraps `ServerSideEncryptingAmazonS3` + bucket + key prefix and issues closed-range `GetObjectRequests` against `keyPrefix + filename`. Returned stream is wrapped in a `RetryingInputStream` with the `S3Utils.S3RETRY` predicate (the same retry policy `S3DataSegmentPuller` uses for full-segment downloads) so a transient mid-stream error reopens at the byte offset where it failed and resumes with a fresh range request for the remaining bytes, rather than restarting the whole read. * New `rangeable` boolean on `S3LoadSpec` stamped by the pusher at write time. `S3LoadSpec.openRangeReader()` returns a reader iff the flag is true and the key isn't .zip * `S3DataSegmentPusher.pushNoZip` stamps rangeable=true when binaryVersion is `V10_VERSION`, false otherwise. `pushZip` omits the field
Fixes apache#19563. Description This PR hardens the Consul-backed embedded tests against startup races where the Consul container has started but the host-mapped Consul API is not yet reliably accepting requests.
…pache#19565) * fix: empty loads for asymmetric cluster-group partial-load matchers * fix test * ensure that rule is compatible with clustering before doing empty loads * broken javadoc link
…9571) Kafka ingestion can now publish segments that the broker prunes at query time, without waiting for compaction. Set tuningConfig.streamingPartitionsSpec.partitionDimensions to a list of low-to-medium cardinality dimensions; each task records the distinct values it observes per dimension and stamps them onto a new dim_value_set shard spec. Queries that filter on a declared dimension then skip segments whose values can't match. The feature is opt-in, Kafka-only, and disabled by default; when unset, behavior is unchanged. Compatibility: dim_value_set is a new core shard spec type with no fallback, so it is not forward-compatible. Upgrade all services before enabling streamingPartitionsSpec. Once dim_value_set segments are published, downgrade is unsupported until they are compacted away or streamingPartitionsSpec is removed. Highlights: - StreamRangeShardSpec extends NumberedShardSpec; possibleInDomain prunes by per-value range intersection. Null is declared as a first-class value (encoded as Range.lessThan("")) so IS NULL queries are never wrongly pruned, and is kept distinct from the empty string. - Opt-in via partitionFilterDimensions on the Kafka supervisor/IOConfig (null by default; segments otherwise get a plain NumberedShardSpec). Kafka only for now; backward-compatible config (old specs/constructors unchanged). - Per-segment value accumulation at ingest time; each segment is stamped with only its own observed values at publish. - Correctness guards: restart-spanning segments fall back to NumberedShardSpec (pre-restart rows are not re-read, so their values can't be fully observed); dimensions that observed a null/missing value declare null so IS NULL is not pruned. - BaseAppenderatorDriver reconciles the returned SegmentsAndCommitMetadata to the published shard specs so handoff/publish logs report the real spec. Tests: - StreamRangeShardSpecTest: possibleInDomain matrix incl. null vs "" and serde. - SeekableStreamIndexTaskRunnerTest: annotator unit tests (restart fallback, null handling). - EmbeddedStreamRangeShardSpecTest: end-to-end pruning verified via the query/segment/time scan metric across a predicate matrix (=, !=, IN, NOT IN, IS NULL, IS NOT NULL, multi-value, untracked dimension, non-existent value), plus a no-partitioning control twin and in-memory/graceful-widening cases. - StreamAppenderatorDriverTest: returned metadata carries the published spec. * Comments * Move to tuningConfig. * Stamp with empty partitions spec when segments cannot be recovered. * Sort partitionDimension values. * Use SegmentId for strongly typed identifier & tests * Reanme stream_range to dim_value_set to better capture the intent. Renames related classes as well * Document numeric type is not eligible for pruning & additional test coverage. * Assert row values too in addition to counts * Cleanup
… from PhysicalSegmentInspector (apache#19577)
test_runKafkaSupervisor produced 10 records, waited only for the broker to discover the datasource (the first matching metric event), then immediately asserted SELECT COUNT(*) == 10. Under a loaded CI runner the query raced ingestion and saw fewer than 10 rows (expected:<10> but was:<8/7/9>). Wait for ingest/events/processed to reach the expected count before querying, matching the sibling test_runSupervisor_withEmptyDimension, and derive the expected count from expectedSegments instead of a hardcoded literal.
…o reduce flakiness (apache#19580) KafkaIndexFaultToleranceTest inherited the 60s LatchableEmitter default from StreamIndexTestBase and intermittently timed out ("Timed out waiting for event after [60,000]ms") when the shared CI runner was under load. KinesisFaultToleranceTest already worked around this with a 120s override; move that override into the shared StreamIndexFaultToleranceTest base so both stream fault-tolerance suites inherit it, and drop the now-redundant per-subclass overrides.
* Raise start-druid middleManager memory floor to 256m The auto memory split in start-druid gives the middleManager the lowest weight, so on a moderately sized machine it can be awarded a very small heap: at `-m 16g` it gets only ~68m. The middleManager does not just fork peons, it also runs an embedded HTTP server that proxies task reports and logs back to callers (for example the web console polling a running query). Under that proxying load a ~68m heap is exhausted, the middleManager exits with an OutOfMemoryError, and the task it owns is reported as having "disappeared on the worker". Raise the middleManager minimum from 64m to 256m. 256m matches the heap already shipped for the medium/large/xlarge single-server example configs. The floor only affects deployments small enough that the proportional split would fall below it (roughly under 60g total); larger deployments are unchanged. * Drop the explanatory comment on the middleManager floor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.