Description
When Riak is improperly shut down or its process is killed, the cleanup processes that release lock files are not triggered. If another OS PID has been created that shares Riak's old process id, and Riak is started again, the subsequent checks in Riak will see the original OS PID that was written to the write lock file is still active, and will not release the lock (even though the process id in question does not refer to Riak anymore).
This can be replicated by:
- Starting Riak.
- Shutting down Riak improperly by issuing a kill on the Riak process.
- Starting a process (that is not Riak) that uses the same Pid as the killed Riak process.
- Starting Riak.
- Attempting a PUT.
Could the operation that checks for the OS PID's existence also confirm that it is in fact a beam.smp process, to lessen the likelihood of this stuck lock file?
For additional context, see the following:
https://basho.zendesk.com/agent/#/tickets/3873
https://basho.zendesk.com/agent/#/tickets/5336