-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Implement a fingerprinting mechanism to track compaction states in a more efficient manner #18844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
capistrant
wants to merge
18
commits into
apache:master
Choose a base branch
from
capistrant:compaction-fingerprinting
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
d8490b0
meatadata store bits part 1
capistrant 3d2d423
annotate segments with compaction fingerprint before persist
capistrant 48854f4
Add ability to generate compaction state fingerprint
capistrant c6a3367
add fingerprint to task context and make legacy last compaction state…
capistrant f3b706e
update embedded tests for compaction supervisors to flex fingerprints
capistrant 46fb807
checkpoint with persisting compaction states
capistrant 0fef358
add duty to clean up unused compaction states
capistrant edeaf30
take fingerprints into account in CompactionStatus
capistrant 97daf3f
Add and improve tests
capistrant dbcdfcf
get rid of some todo comments
capistrant 38f6d15
fix checkstyle
capistrant 4cf1197
cleanup some more TODO
capistrant ba269bd
Add some docs
capistrant f168bc9
update web console
capistrant 2292b15
make cache size configurable and fix some spelling
capistrant 74c8ebc
fixup use of deprecated builder
capistrant adac5ec
fix checktyle
capistrant 4fb3a9c
fix coordinator compactsegments duty and respond to self review comments
capistrant File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -41,6 +41,7 @@ | |||||
| import org.apache.druid.server.coordinator.InlineSchemaDataSourceCompactionConfig; | ||||||
| import org.apache.druid.server.coordinator.UserCompactionTaskGranularityConfig; | ||||||
| import org.apache.druid.server.coordinator.UserCompactionTaskQueryTuningConfig; | ||||||
| import org.apache.druid.server.coordinator.duty.CompactSegments; | ||||||
| import org.apache.druid.testing.embedded.EmbeddedBroker; | ||||||
| import org.apache.druid.testing.embedded.EmbeddedCoordinator; | ||||||
| import org.apache.druid.testing.embedded.EmbeddedDruidCluster; | ||||||
|
|
@@ -50,6 +51,7 @@ | |||||
| import org.apache.druid.testing.embedded.EmbeddedRouter; | ||||||
| import org.apache.druid.testing.embedded.indexing.MoreResources; | ||||||
| import org.apache.druid.testing.embedded.junit5.EmbeddedClusterTestBase; | ||||||
| import org.apache.druid.timeline.CompactionState; | ||||||
| import org.hamcrest.Matcher; | ||||||
| import org.hamcrest.Matchers; | ||||||
| import org.joda.time.Period; | ||||||
|
|
@@ -111,14 +113,14 @@ public EmbeddedDruidCluster createCluster() | |||||
| private void configureCompaction(CompactionEngine compactionEngine) | ||||||
| { | ||||||
| final UpdateResponse updateResponse = cluster.callApi().onLeaderOverlord( | ||||||
| o -> o.updateClusterCompactionConfig(new ClusterCompactionConfig(1.0, 100, null, true, compactionEngine)) | ||||||
| o -> o.updateClusterCompactionConfig(new ClusterCompactionConfig(1.0, 100, null, true, compactionEngine, true)) | ||||||
| ); | ||||||
| Assertions.assertTrue(updateResponse.isSuccess()); | ||||||
| } | ||||||
|
|
||||||
| @MethodSource("getEngine") | ||||||
| @ParameterizedTest(name = "compactionEngine={0}") | ||||||
| public void test_ingestDayGranularity_andCompactToMonthGranularity_withInlineConfig(CompactionEngine compactionEngine) | ||||||
| public void test_ingestDayGranularity_andCompactToMonthGranularity_andCompactToYearGranularity_withInlineConfig(CompactionEngine compactionEngine) | ||||||
| { | ||||||
| configureCompaction(compactionEngine); | ||||||
|
|
||||||
|
|
@@ -132,7 +134,7 @@ public void test_ingestDayGranularity_andCompactToMonthGranularity_withInlineCon | |||||
| Assertions.assertEquals(3, getNumSegmentsWith(Granularities.DAY)); | ||||||
|
|
||||||
| // Create a compaction config with MONTH granularity | ||||||
| InlineSchemaDataSourceCompactionConfig compactionConfig = | ||||||
| InlineSchemaDataSourceCompactionConfig monthGranConfig = | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| InlineSchemaDataSourceCompactionConfig | ||||||
| .builder() | ||||||
| .forDataSource(dataSource) | ||||||
|
|
@@ -165,11 +167,170 @@ public void test_ingestDayGranularity_andCompactToMonthGranularity_withInlineCon | |||||
| ) | ||||||
| .build(); | ||||||
|
|
||||||
| runCompactionWithSpec(compactionConfig); | ||||||
| runCompactionWithSpec(monthGranConfig); | ||||||
| waitForAllCompactionTasksToFinish(); | ||||||
|
|
||||||
| Assertions.assertEquals(0, getNumSegmentsWith(Granularities.DAY)); | ||||||
| Assertions.assertEquals(1, getNumSegmentsWith(Granularities.MONTH)); | ||||||
|
|
||||||
| verifyCompactedSegmentsHaveFingerprints(monthGranConfig); | ||||||
|
|
||||||
| InlineSchemaDataSourceCompactionConfig yearGranConfig = | ||||||
| InlineSchemaDataSourceCompactionConfig | ||||||
| .builder() | ||||||
| .forDataSource(dataSource) | ||||||
| .withSkipOffsetFromLatest(Period.seconds(0)) | ||||||
| .withGranularitySpec( | ||||||
| new UserCompactionTaskGranularityConfig(Granularities.YEAR, null, null) | ||||||
| ) | ||||||
| .withTuningConfig( | ||||||
| new UserCompactionTaskQueryTuningConfig( | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| new DimensionRangePartitionsSpec(null, 5000, List.of("item"), false), | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null | ||||||
| ) | ||||||
| ) | ||||||
| .build(); | ||||||
|
|
||||||
| overlord.latchableEmitter().flush(); // flush events so wait for works correctly on the next round of compaction | ||||||
| runCompactionWithSpec(yearGranConfig); | ||||||
| waitForAllCompactionTasksToFinish(); | ||||||
|
|
||||||
| Assertions.assertEquals(0, getNumSegmentsWith(Granularities.DAY)); | ||||||
| Assertions.assertEquals(0, getNumSegmentsWith(Granularities.MONTH)); | ||||||
| Assertions.assertEquals(1, getNumSegmentsWith(Granularities.YEAR)); | ||||||
|
|
||||||
| verifyCompactedSegmentsHaveFingerprints(yearGranConfig); | ||||||
| } | ||||||
|
|
||||||
| @MethodSource("getEngine") | ||||||
| @ParameterizedTest(name = "compactionEngine={0}") | ||||||
| public void test_compaction_withPersistLastCompactionStateFalse_storesOnlyFingerprint(CompactionEngine compactionEngine) | ||||||
| throws InterruptedException | ||||||
| { | ||||||
| // Configure cluster with persistLastCompactionState=false | ||||||
| final UpdateResponse updateResponse = cluster.callApi().onLeaderOverlord( | ||||||
| o -> o.updateClusterCompactionConfig( | ||||||
| new ClusterCompactionConfig(1.0, 100, null, true, compactionEngine, false) | ||||||
| ) | ||||||
| ); | ||||||
| Assertions.assertTrue(updateResponse.isSuccess()); | ||||||
|
|
||||||
| // Ingest data at DAY granularity | ||||||
| runIngestionAtGranularity( | ||||||
| "DAY", | ||||||
| "2025-06-01T00:00:00.000Z,shirt,105\n" | ||||||
| + "2025-06-02T00:00:00.000Z,trousers,210" | ||||||
| ); | ||||||
| Assertions.assertEquals(2, getNumSegmentsWith(Granularities.DAY)); | ||||||
|
|
||||||
| // Create compaction config to compact to MONTH granularity | ||||||
| InlineSchemaDataSourceCompactionConfig monthConfig = | ||||||
| InlineSchemaDataSourceCompactionConfig | ||||||
| .builder() | ||||||
| .forDataSource(dataSource) | ||||||
| .withSkipOffsetFromLatest(Period.seconds(0)) | ||||||
| .withGranularitySpec( | ||||||
| new UserCompactionTaskGranularityConfig(Granularities.MONTH, null, null) | ||||||
| ) | ||||||
| .withTuningConfig( | ||||||
| new UserCompactionTaskQueryTuningConfig( | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| new DimensionRangePartitionsSpec(1000, null, List.of("item"), false), | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null, | ||||||
| null | ||||||
| ) | ||||||
| ) | ||||||
| .build(); | ||||||
|
|
||||||
| runCompactionWithSpec(monthConfig); | ||||||
| waitForAllCompactionTasksToFinish(); | ||||||
|
|
||||||
| verifySegmentsHaveNullLastCompactionStateAndNonNullFingerprint(); | ||||||
| } | ||||||
|
|
||||||
| private void verifySegmentsHaveNullLastCompactionStateAndNonNullFingerprint() | ||||||
| { | ||||||
| overlord | ||||||
| .bindings() | ||||||
| .segmentsMetadataStorage() | ||||||
| .retrieveAllUsedSegments(dataSource, Segments.ONLY_VISIBLE) | ||||||
| .forEach(segment -> { | ||||||
| Assertions.assertNull( | ||||||
| segment.getLastCompactionState(), | ||||||
| "Segment " + segment.getId() + " should have null lastCompactionState" | ||||||
| ); | ||||||
| Assertions.assertNotNull( | ||||||
| segment.getCompactionStateFingerprint(), | ||||||
| "Segment " + segment.getId() + " should have non-null compactionStateFingerprint" | ||||||
| ); | ||||||
| }); | ||||||
| } | ||||||
|
|
||||||
| private void verifyCompactedSegmentsHaveFingerprints(DataSourceCompactionConfig compactionConfig) | ||||||
| { | ||||||
| String expectedFingerprint = CompactionState.generateCompactionStateFingerprint( | ||||||
| CompactSegments.createCompactionStateFromConfig(compactionConfig), | ||||||
| dataSource | ||||||
| ); | ||||||
|
|
||||||
| overlord | ||||||
| .bindings() | ||||||
| .segmentsMetadataStorage() | ||||||
| .retrieveAllUsedSegments(dataSource, Segments.ONLY_VISIBLE) | ||||||
| .forEach(segment -> { | ||||||
| String fingerprint = segment.getCompactionStateFingerprint(); | ||||||
| Assertions.assertNotNull( | ||||||
| fingerprint, | ||||||
| "Segment " + segment.getId() + " should have a compaction state fingerprint" | ||||||
| ); | ||||||
| Assertions.assertFalse( | ||||||
| fingerprint.isEmpty(), | ||||||
| "Segment " + segment.getId() + " fingerprint should not be empty" | ||||||
| ); | ||||||
| // SHA-256 fingerprints should be 64 hex characters | ||||||
| Assertions.assertEquals( | ||||||
| 64, | ||||||
| fingerprint.length(), | ||||||
| "Segment " + segment.getId() + " fingerprint should be 64 characters (SHA-256)" | ||||||
| ); | ||||||
|
Comment on lines
+314
to
+327
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These 3 assertions can be omitted, just the last one should suffice. |
||||||
| Assertions.assertEquals( | ||||||
| expectedFingerprint, | ||||||
| fingerprint, | ||||||
| "Segment " + segment.getId() + " fingerprint should match expected fingerprint" | ||||||
| ); | ||||||
| }); | ||||||
| } | ||||||
|
|
||||||
| private void runCompactionWithSpec(DataSourceCompactionConfig config) | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Coordinator and Overlord (with segment metadata caching enabled) already keep all used segments in memory, including the respective (interned)
CompactionStateobjects as well.I don't think the number of distinct
CompactStateobjects that we keep in memory will increase after this patch.Do we still need to worry about the cache size of these objects?
Does a cache miss trigger a fetch from metadata store?