This project aims to ensure that a distributed system can reach consensus even under unreliable network conditions, such as message delays, message loss, or partial node failures. To achieve this, I implemented the three fundamental roles of the Paxos algorithm: Proposers, Acceptors, and Learners. These roles cooperate to maintain data consistency across the system despite node failures or network disruptions. The system is tested in an environment where nodes randomly fail and recover. Additionally, Docker Compose is used to deploy and test the system across multiple isolated containers.
During development, I encountered challenges such as handling asynchronous message delivery and ensuring convergence when nodes go offline. To address these, I implemented retry mechanisms and timeout strategies, and tested the system under various failure scenarios. These enhancements improved the system's fault tolerance and ensured stable performance under high load conditions.
Server will run on port 1099.
Running command under "src" directory:
docker-compose up --build
In Docker compose file, it is default to connect to server1.
If you want to connect to other servers, modify line 50 in docker-compose.yml
command: ["java", "client.Client", "<modify at here>"]
There are 5 Server options: Server1, Server2, Server3, Server4, Server5.
You can change <modify at here> to one of the options. Other input will get an error.
PS: The Client will send message to the server, the message may fail if more than half of the servers stopped. in my own code implementation.
After implementing the code, the second problem was how to make the server to fail and restart. I used a variable isRunning to indicate if the server is currently running. If isRunning is false, any Paxos request will be rejected.