Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an exception when create and drop a table with the same name multiple times #719

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zcoo
Copy link
Contributor

@zcoo zcoo commented Apr 7, 2025

Purpose

Linked issue: close #712

Brief change log

Drop table's all partitions when coordinator listens a partitioned table dropped.

Tests

API and Format

Documentation

@wuchong
Copy link
Member

wuchong commented Apr 12, 2025

I think there are issues with the current fix.

When deleting tables/t1, its child partitions, such as tables/t1/partitions/p1, should also be deleted automatically. There should be no need to explicitly and additionally delete the partition directories.

In my opinion, the root cause here is that after ZooKeeper deletes tables/t1, the AutoPartitionManager is triggered and recreates the partitions (with the old table ID). This results in an exception when attempting to recreate the table.

The solution is straightforward: before dropping the table, the AutoPartitionManager should unregister the table id first.

With the current fix, which deletes partitions in the processDropTableEvent, may lead to the deletion of partitions belonging to the new table, potentially causing data loss.

cc @luoyuxia

@zcoo
Copy link
Contributor Author

zcoo commented Apr 14, 2025

@wuchong I think your solution may be inconsistent with current code implement.

Here is the original code logic:

Step 1. Create Table 1 (only created on zk node without adding to AutoPartitionManager)
Step 2. Delete Table 1 (only dropped on zk node)
Step 3. zk table creation is being monitored and the table creation request is added to the CoordinatorEventProcessor request queue
Step 4. The zk deletion table is being monitored and the deletion request is added to the CoordinatorEventProcessor request queue
Step 5. CoordinatorEventProcessor queue executes table creation request, AutoPartitionManager adds the table zk, creates table partition nodes
Step 6. CoordinatEventProcessor queue executes droping a table, but only unregister for table 1 from AutoPartitionManager without deleting the table partition node, resulting in the retention of the zk table partition node

"The solution is straightforward: before dropping the table, the AutoPartitionManager should unregister the table id first."
So this solution doesn't work because before Step 2, the Table 1 haven't been register from AutoPartitionManager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The table id in table zknode is not same with the partition id in the children node of the table in zk
2 participants