Skip to content

[improve] [pip] PIP-394 Add two interfaces CursorMetadataSerializerProvider and CursorMetadataDeSerializerProvider to support newer or customized cursor metadata serializations #23608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions pip/pip-394.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# PIP-394: Add two interfaces `CursorMetadataSerializerProvider` and `CursorMetadataDeSerializerProvider` to support newer of customized cursor metadata serializations

# Background knowledge

**1. What does cursor metadata contain**

- cursor properties.
- entry id that indicates the latest persist cursor metadata into.
- information of individual acknowledged messages, we call it `individualDeletedMessages`.
- information of individual acknowledged batched messages, we call it `batchedEntryDeletionIndexInfo`.

**2. The improvements we did for the persistence for cursor metadata**
- https://github.com/apache/pulsar/pull/758: skip to information that over the max limitation of max ranges to persist.
- https://github.com/apache/pulsar/issues/14529: compress the info when persisting.
- https://github.com/apache/pulsar/pull/9292: add a new compression strategy: change Range Objects to `long[]`.

# Motivation

**Issue-1: Compatible of improvements**

- The third improvement was contributed with `release:4.0`, which is a new LTS version.
- It changed the default implementation of serialization that contains https://github.com/apache/pulsar/pull/9292.
- Users can not rollback to `3.0.x` once upgraded to `4.0.x` because `release:3.0.x` does not contain the deserialization that introduced by https://github.com/apache/pulsar/pull/9292.

**Issue-2: Frequently Young GC relates to the cursor metadata persistence if there are too many active subscriptions in a broker, even if we did so many improvements**

`individualDeletedMessages` and `batchedEntryDeletionIndexInfo` often is the largest attributes of the metadata. They are serialized to a proto data when being persisted. But we can not recycle the object which typed proto due to it is immutable.

![375661781-51d5bd6d-f5a1-48d7-921a-975875fe8bed](https://github.com/user-attachments/assets/dd1eb135-7dee-4dd1-84ba-994618a8198e)


# Goals

- Guarantee compatability for rollback from `4.0.x` to `3.0.x`.
- This PIP will be cherry-picked into `branch-3.0` and `branch-3.3`.
- Support customized cursor metadata serializer to improve the issues users encountered, such as **Issue-1** in the Motivation.

# High Level Design

### Design

- We call the serialization that implemented before `4.0.0` `V1`, and call after the https://github.com/apache/pulsar/pull/9292 `v2`.
- Add all version of serialization into `branch-3.0`.
- Set the default value of `3.0.x` is `V1`, which is the same as the current status.
- Set the default value of `4.0.x` is `V1`, which is the same as the current status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.0.0 default serialization is V2?

- Add two interfaces `CursorMetadataSerializerProvider` and `CursorMetadataDeSerializerProvider` to support newer of customized cursor metadata serializations.

### Public API

**CursorMetadataSerializerProvider.java**
```java
CursorMetadataSerializer newProvider(Name, PulsarService);
```

**CursorMetadataDeSerializerProvider.java**
```java
CursorMetadataDeserializer newProvider(Name, PulsarService);
```

**CursorMetadataSerializer.java**
```java
ManagedCursorInfo serialize(Position markDeletePosition,
Map<String, Long> properties,
LongPairRangeSet<Position> individualDeletedMessages,
Map<Position, BitSetRecyclable> batchDeletedIndexes);
```

**CursorMetadataDeserializer.java**
```java
ManagedCursorInfo deserialize(ByteBuf data);
```

### Public-facing Changes & Binary protocol
- If you used your customized `CursorMetadataSerializer`, it may break the tools who will read cursor ZK node, such as the tool `pulsar-managed-ledger-admin`.

### Configuration

**broker.conf**
```properties
cursorMetadataSerializerProvider=V2
cursorMetadataDeserializerProvider=V1,V2
```

### InScope and out of Scope

This PIP will only add the interfaces named `CursorMetadataSerializerProvider` and `CursorMetadataDeSerializerProvider`, the implementations other than `V1` and `V2` will not be provided.

# Backward & Forward Compatibility

## Upgrade

Nothing to do.

## Downgrade / Rollback

- I will cherry-pick this PIP into `branch-3.0` and `branch-3.3`.
- Since https://github.com/apache/pulsar/pull/9292 changed the cursor metadata serialization. Once you upgraded to `4.0.x` from a lower version, you can only downgrade to the version that contains the current PIP.

# Links

<!--
Updated afterwards
-->
* Mailing List discussion thread: https://lists.apache.org/thread/xy1prwcv4wdoobphcgloj7s5gxy05qq3
* Mailing List voting thread: https://lists.apache.org/thread/x8bf9hvk1pvo0dl0q3mcjh08wg90s89k