Skip to content

Commit da65dc6

Browse files
dlg99hanmz
authored andcommitted
[improve][pip] PIP-381: Handle large PositionInfo state (apache#23328)
1 parent b47ca39 commit da65dc6

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed

pip/pip-381-large-positioninfo.md

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# PIP-381: Handle large PositionInfo state
2+
3+
# Background knowledge
4+
5+
In case of KEY_SHARED subscription and out-of-order acknowledgments,
6+
the PositionInfo state can be persisted to preserve the state,
7+
with configurable maximum number of ranges to persist:
8+
9+
```
10+
# Max number of "acknowledgment holes" that are going to be persistently stored.
11+
# When acknowledging out of order, a consumer will leave holes that are supposed
12+
# to be quickly filled by acking all the messages. The information of which
13+
# messages are acknowledged is persisted by compressing in "ranges" of messages
14+
# that were acknowledged. After the max number of ranges is reached, the information
15+
# will only be tracked in memory and messages will be redelivered in case of
16+
# crashes.
17+
managedLedgerMaxUnackedRangesToPersist=10000
18+
```
19+
20+
The PositionInfo state is stored to the BookKeeper as a single entry, and it can grow large if the number of ranges is large.
21+
Currently, this means that BookKeeper can fail persisting too large PositionInfo state, e.g. over 1MB
22+
by default and the ManagedCursor recovery on topic reload might not succeed.
23+
24+
There is an abandoned PIP-81 for similar problem, this PIP takes over.
25+
26+
# Motivation
27+
28+
While keeping the number of ranges low to prevent such problems is a common sense solution, there are cases
29+
where the higher number of ranges is required. For example, in case of the JMS protocol handler,
30+
JMS consumers with filters may end up processing data out of order and/or at different speed,
31+
and the number of ranges can grow large.
32+
33+
# Goals
34+
35+
Store the PositionInfo state in a BookKeeper ledger as multiple entries if the state grows too large to be stored as a single entry.
36+
37+
## In Scope
38+
39+
Transparent backwards compatibility if the PositionInfo state is small enough.
40+
41+
## Out of Scope
42+
43+
Backwards compatibility in case of the PositionInfo state is too large to be stored as a single entry.
44+
45+
# High Level Design
46+
47+
Cursor state writes and reads are happening at the same cases as currently, without changes.
48+
49+
Write path:
50+
51+
1. serialize the PositionInfo state to a byte array.
52+
2. if the byte array is smaller than the threshold, store it as a single entry, as now. Done.
53+
3. if the byte array is larger than the threshold, split it to smaller chunks and store the chunks in a BookKeeper ledger.
54+
4. write the "footer" into the metadata store as a last entry.
55+
56+
See `persistPositionToLedger()` in `ManagedCursorImpl` for the implementation.
57+
58+
The footer is a JSON representation of
59+
60+
```java
61+
public static final class ChunkSequenceFooter {
62+
private int numParts;
63+
private int length;
64+
}
65+
```
66+
67+
Read path:
68+
69+
1. read the last entry from the metadata store.
70+
2. if the entry does not appear to be a JSON, treat it as serialized PositionInfo state and use it as is. Done.
71+
3. if the footer is a JSON, parse number of chunks and length from the json.
72+
4. read the chunks from the BookKeeper ledger (entries from `startPos = footerPosition - chunkSequenceFooter.numParts` to `footerPosition - 1`) and merge them.
73+
5. parse the merged byte array as a PositionInfo state.
74+
75+
See `recoverFromLedgerByEntryId()` in `ManagedCursorImpl` for the implementation.
76+
77+
## Design & Implementation Details
78+
79+
Proposed implementation: https://github.com/apache/pulsar/pull/22799
80+
81+
## Public-facing Changes
82+
83+
Nothing
84+
85+
### Public API
86+
87+
None
88+
89+
### Binary protocol
90+
91+
No public-facing changes
92+
93+
### Configuration
94+
95+
* **managedLedgerMaxUnackedRangesToPersist**: int, default 10000 (existing parameter). Controls number of unacked ranges to store.
96+
* **persistentUnackedRangesWithMultipleEntriesEnabled**: boolean, default false. If true, the PositionInfo state is stored as multiple entries in BookKeeper if it grows too large.
97+
* **persistentUnackedRangesMaxEntrySize**: int, default 1MB. Maximum size of a single entry in BookKeeper, in bytes.
98+
* **cursorInfoCompressionType**: string, default "NONE". Compression type to use for the PositionInfo state.
99+
100+
### CLI
101+
102+
None
103+
104+
### Metrics
105+
106+
<!--
107+
For each metric provide:
108+
* Full name
109+
* Description
110+
* Attributes (labels)
111+
* Unit
112+
-->
113+
114+
115+
# Monitoring
116+
117+
Existing monitoring should be sufficient.
118+
119+
# Security Considerations
120+
121+
N/A
122+
123+
# Backward & Forward Compatibility
124+
125+
## Upgrade
126+
127+
Not affected, just upgrade.
128+
129+
## Downgrade / Rollback
130+
131+
Not affected, just downgrade **as long as the managedLedgerMaxUnackedRangesToPersist was in the range to fit it into a single entry in BK**.
132+
133+
## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations
134+
135+
Not affected AFAIK.
136+
137+
# Alternatives
138+
139+
1. Do nothing. Keep the number of ranges low. This does not fit some use cases.
140+
2. Come up with an extremely efficient storage format for the unacked ranges to fit them into a single entry all the time for e.g. 10mil ranges. This breaks backwards compatibility and the feasibility is unclear.
141+
142+
# General Notes
143+
144+
# Links
145+
146+
* Proposed implementation: https://github.com/apache/pulsar/pull/22799
147+
* PIP-81: https://github.com/apache/pulsar/wiki/PIP-81:-Split-the-individual-acknowledgments-into-multiple-entries
148+
* PR that implements better storage format for the unacked ranges (alternative 2): https://github.com/apache/pulsar/pull/9292
149+
150+
ML discussion and voting threads:
151+
152+
* Mailing List discussion thread: https://lists.apache.org/thread/8sm0h804v5914zowghrqxr92fp7c255d
153+
* Mailing List voting thread: https://lists.apache.org/thread/q31fx0rox9tdt34xsmo1ol1l76q8vk99

0 commit comments

Comments
 (0)