feat: add partial load rule types#19374
Conversation
changes: * adds `PartialLoadRule` abstract class to capture load rules that should partially load a segment on some tier * adds `IntervalPartialLoadRule, `ForeverPartialLoadRule`, and `PeriodPartialLoadRule` implementations to mirror non-partial load rules * adds `PartialLoadMatcher` interface to match and select what to partial load * adds `ExactProjectionPartialLoadMatcher` and `WildcardProjectionPartialLoadMatcher` to do partial loading of projections * adds `CannotMatchBehavior` enum to describe behavior of `PartialLoadRule` when `PartialLoadMatcher` is unable to match a segment * since partial loading is not available yet, partial rules function as regular load rules until follow-up work * tests
| this.matcher = matcher; | ||
| this.onCannotMatch = Configs.valueOrDefault(onCannotMatch, CannotMatchBehavior.FULL_LOAD); | ||
| } | ||
|
|
There was a problem hiding this comment.
P1 Partial rule fall-through is invisible to handoff checks
appliesTo returns false for a partial rule whose matcher cannot match and whose onCannotMatch is FALL_THROUGH. Handoff/readiness code that only asks whether the rule applies to the segment interval can therefore miss that this rule intentionally falls through to a later full-load rule, and can conclude that no load rule covers the segment. A realtime task for a segment without matching projections can then wait forever even though the cascade would load it through the following rule. The rule-selection path used by handoff needs to evaluate the same matcher/fall-through semantics as the coordinator cascade, not just interval applicability.
| * version that introduces a new behavior falls back to the constructor's default ({@link #FULL_LOAD}) rather than | ||
| * failing to parse the rule. | ||
| */ | ||
| public enum CannotMatchBehavior |
There was a problem hiding this comment.
IMO it'd be nicer to define the enums like FALL_THROUGH("fallThrough") and have the "fallThrough" be how they serialize in JSON. SImilar to JoinAlgorithm and CloneQueryMode. Most Druid stuff uses thatStyle rather than THIS_ONE.
| * configuration that drives the decision and the wire format of their corresponding {@link LoadSpec} wrapper, so the | ||
| * rule layer stays scheme-agnostic. | ||
| */ | ||
| @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") |
There was a problem hiding this comment.
If we add a new one of these then the PartialLoadRule will fail to deserialize. Just like CannotMatchBehavior, maybe this should be lenient too?
The way to make things lenient is to add a dummy implementation as a defaultImpl, such as QueryCounterSnapshot for example. Servers will then at least know that they're dealing with an unrecognized PartialLoadMatcher.
| return patterns; | ||
| } | ||
|
|
||
| @JsonProperty |
There was a problem hiding this comment.
@JsonInclude(NON_EMPTY) would be nice.
| "type", WIRE_TYPE, | ||
| "delegate", baseLoadSpec, | ||
| "projections", resolved, | ||
| "ruleFingerprint", fingerprint |
There was a problem hiding this comment.
It isn't really a rule fingerprint, it's more of a projection names fingerprint. Maybe just call it fingerprint since projectionNamesFingerprint is a mouthful.
| * Base for {@link PartialLoadMatcher} implementations that decide which of a segment's V10 projections to load. | ||
| * Subclasses supply the resolution policy via {@link #resolveProjectionNames(DataSegment)}; this base handles | ||
| * fingerprint computation and wraps the result into the {@code partialProjection} load-spec wire form consumed | ||
| * by the historical-side {@code PartialProjectionLoadSpec} (which does not exist yet, supplied in future work). |
There was a problem hiding this comment.
IMO best not to mention things that don't exist yet. Instead, the next PR should update this text. It will need to update it anyway to remove the "supplied in future work" comment.
| */ | ||
| public abstract class ProjectionPartialLoadMatcher implements PartialLoadMatcher | ||
| { | ||
| static final String WIRE_TYPE = "partialProjection"; |
There was a problem hiding this comment.
LOAD_SPEC_TYPE? Sounds clearer.
| { | ||
| final Hasher hasher = Hashing.sha256().newHasher(); | ||
| for (String name : sortedDedupedNames) { | ||
| hasher.putString(name, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
The javadoc for putString says that putUnencodedChars is preferred.
| private static boolean matchesAny(String name, List<String> patterns) | ||
| { | ||
| for (String pattern : patterns) { | ||
| if (FilenameUtils.wildcardMatch(name, pattern)) { |
There was a problem hiding this comment.
How does this treat \*? I wonder if it escapes the * or if it looks for a literal \ followed by anything.
It's probaly not likely that a projection name will include * or ?, but it would be nice to support escaping anyway, in case they do.
There was a problem hiding this comment.
ah, it did not handle escapes... had claude help generate a version that supports escaping which seems to not add too much complexity
| @Override | ||
| public boolean appliesTo(Interval interval, DateTime referenceTimestamp) | ||
| { | ||
| return Rules.eligibleForLoad(period, interval, referenceTimestamp, includeFuture); |
There was a problem hiding this comment.
I wonder if anything weird will happen because appliesTo(segment.getInterval(), refTs) doesn't always match appliesTo(segment, refTs) like it does with the other rules. This overload is used in two places:
DataSourcesResource#isHandOffComplete, where it's used to short-circuit handoff checking for segments that no load rule applies to. I suppose the risk here is that we're in a situation where really no rule matches (because the segment doesn't have any desired projections for a partial-rule, and there's no other rule to fall back to), but the handoff checker thinks the partial-rule would match, and so handoff times out. This seems unlikely to happen— probably in a real cluster there would be another rule to fall back on— but is worth a comment somewhere acknowledging it's a risk. Possibly right here.TieredBrokerHostSelector, where it's used to find the rule that will drive which Broker handles a request whendruid.router.tierToBrokerMapis set. This is probably fine: in some configurations of load rules it might change which broker gets a query, but probably the behavior is defensible given that the routing isn't projection-aware.
There was a problem hiding this comment.
i switched isHandOffComplete to use the metadata snapshot to use the version of appliesTo that takes a segment so that it will work properly.
I didn't make any changes for TieredBrokerHostSelector, since i'm not really sure how to improve that, i guess we can document it when we document the rest of this stuff.
| SegmentId.of(dataSourceName, theInterval, version, partitionNumber) | ||
| ); | ||
| // Segment isn't published in metadata; it will never be handed off. | ||
| if (segment == null) { |
There was a problem hiding this comment.
[P1] Do not treat a missing metadata snapshot segment as handed off
getRecentDataSourcesSnapshot() may return a snapshot up to the segment poll delay old, so a just-published realtime segment can be absent even though it still needs handoff. Returning true here tells the task the segment will never be handed off and can let it stop waiting before any historical has loaded it. This should return false or force a fresh metadata poll before concluding the segment is not published.
Description
Adds a new family of retention rules,
loadPartialByPeriod,loadPartialByInterval,loadPartialForever, laying the groundwork for partial loading of V10 segment projections on historicals. This PR includes the rule classes, matcher abstraction, and cascade semantics only; the coordinator plumbing, wire format, and historical-side partial load path are deferred to follow-up work.changes:
PartialLoadRuleabstract class to capture load rules that should partially load a segment on some tierIntervalPartialLoadRule,ForeverPartialLoadRule, andPeriodPartialLoadRule` implementations to mirror non-partial load rulesPartialLoadMatcherinterface to match and select what to partial loadExactProjectionPartialLoadMatcherandWildcardProjectionPartialLoadMatcherto do partial loading of projectionsCannotMatchBehaviorenum to describe behavior ofPartialLoadRulewhenPartialLoadMatcheris unable to match a segment