Skip to content

Conversation

@junlong-gao
Copy link
Owner

Great project BTW!

@junlong-gao junlong-gao force-pushed the solutions branch 10 times, most recently from 9fdc72a to b0aa17e Compare July 16, 2020 01:50
@junlong-gao junlong-gao force-pushed the solutions branch 12 times, most recently from 40ee827 to 77f0471 Compare July 20, 2020 04:17
@junlong-gao junlong-gao force-pushed the solutions branch 6 times, most recently from 57997ca to 097fab7 Compare July 26, 2020 16:22
@junlong-gao junlong-gao force-pushed the solutions branch 2 times, most recently from d0fdf3b to ca3912f Compare July 27, 2020 02:43
@junlong-gao junlong-gao force-pushed the solutions branch 5 times, most recently from 6221358 to 99cd750 Compare July 28, 2020 15:19
@junlong-gao
Copy link
Owner Author

I feel one simple section of the simple scheduler implementation cannot do justice to the marvelous topic of cluster monitoring, load balancing, and directory/discovery service as implemented in the current placement driver. I somehow hacked the proto files so that up to project 3b, the entire setup can be bootstrapped by a version of the official placement driver (no further testing is done except writing and reading a few keys using raw kv api).

@junlong-gao
Copy link
Owner Author

junlong-gao commented Jul 31, 2020

In the end, project 4 is also a wonderful topic containing many important transaction manager design ideas. The leader of the region holding the primary becomes the coordinator/lock table, and concurrency control is using a centralized clock service, and finally, the two-phase commit ensuring all-or-nothing across regions is also borrowed from the percolator paper.

I was cheating as the test case did not have region error nor does the interface of the storage module surface these errors (unfortunately), so I did not implement these error handlings. But in general, these should result in a big error and ask the client to retry.

Overall, great project, really learned a lot. Kudos to the PingCAP team for bringing this.

imaffe and others added 2 commits August 1, 2020 01:40
Also, added a basic raw kv interface demo and remove the scheduler part.
@junlong-gao
Copy link
Owner Author

My implementation of dynamic membership change is subject to this bug:

https://groups.google.com/g/raft-dev/c/t4xj6dJTP6E

Curiously, it was also mentioned in https://github.com/eBay/NuRaft/blob/6d371ddcc6bc0a9b8aaad4e57e9859adbc111d85/src/raft_server.cxx#L138
but the solution does not seem to cover removing one node in a two-node cluster case. When the leader node noticed the removal is committed, it goes on and kills itself without propagating the commitment to the other node. This will result in the other node permanently stuck in the candidate state. The argument is for safeness, not liveness though from the first glance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants