Description
Search before asking
- I searched in the issues and found nothing similar.
Fluss version
0.6.0 (latest release)
Please describe the bug 🐞
Drop a table and create a table with the same table t1. Then it may happen in zk that the t1
zk node is with table id 12, but the children partition node is with table id 11. Then in client, the following exception will happen:
Caused by: java.lang.IllegalArgumentException: table path not found for tableId 11 in cluster
at com.alibaba.fluss.cluster.Cluster.lambda$getTablePathOrElseThrow$1(Cluster.java:177)
at java.util.Optional.orElseThrow(Optional.java:290)
at com.alibaba.fluss.cluster.Cluster.getTablePathOrElseThrow(Cluster.java:175)
at com.alibaba.fluss.client.utils.MetadataUtils.lambda$getTableMetadataToUpdate$3(MetadataUtils.java:221)
at java.lang.Iterable.forEach(Iterable.java:75)
at com.alibaba.fluss.client.utils.MetadataUtils.getTableMetadataToUpdate(MetadataUtils.java:217)
at com.alibaba.fluss.client.utils.MetadataUtils.lambda$sendMetadataRequestAndRebuildCluster$1(MetadataUtils.java:122)
It happens with following steps:
- create tablet t1 with id 11
- drop tablet t1
- coordinator listen the t1 is created, do auto partition, create the partition in the zk path
tables/t1/partitions/p1
with table id 11 - coordinator listen the t1 is dropped, just remove t1 id = 11 remove
AutoPartitionManager
. - create tablet t1 with id 12, upsert the zk path
tables/t1
with table id 12
So the inconsistent happens.
Solution
Seems we should drop the partitions in step 4 after we listen the table t1
is dropped.
If we don't drop the partitions, it may cause partition leak.
- create tablet t1 with id 11
- drop tablet t1
- coordinator listen the t1 is created, do auto partition
- coordinator listen the t1 is dropped
Note, the partitions created in step 3 won't be dropped then.
Are you willing to submit a PR?
- I'm willing to submit a PR!