-
Notifications
You must be signed in to change notification settings - Fork 521
Avoid I/O after deleting client flows #4179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This changes also removes a value-shadowing error that caused empty index files to be written.
houseKeeping only has to do any work when flows have been deleted from a client. Passing the Flow IDs along with the Client IDs allows us to filter out flows from the index using removeFlowsFromIndex instead of rebuilding the index from scratch.
self.mu.Unlock() | ||
|
||
for _, client_id := range pending { | ||
err := self.buildFlowIndexFromDatastore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried about coherency here - if there is a new flow created after a flow is deleted then the safest option is to rebuild the index from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add another field
addedFlows map[string][]string
that is used to tell houseKeeping
about flows that need to be present in the index. If they are not, only those flows would need to be re-read. In the worst case, any missing flow would be added in the next loop iteration.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method buildFlowIndexFromDatastore()
returns the ground truth about the index of flows. The index is just a quick way to access the list of flows within the directory but the ground truth of which flows are present is still kept within the individual flow objects.
With this PR we never really re-build the ground truth because we only look at changes. If somethings are lost (e.g. due to server restart between housekeeping runs) we can get the index into an inconsistent state and have no way to return it (other than remove the index completely which will force a rebuild).
I think a more robust solution involves writing a journal of changes to the index and then rebuilding the index based on this (i.e. write a journal with the contents of addedFlows and removedFlows immediately, then in the housekeeping thread rebuild the index based on that journal).
The current code tries to amortize the cost of rebuilding by delaying the rebuild to once per minute (previously we rebuilt for each change). Do you still see a large IO demand due to this? can we either increase the rebuild time or sleep a bit during the rebuild so as not to spike it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still saw large I/O demands before applying the patches that are in this PR so far. When we deleted hunts and all their associated flows, we would cause flow indexes for a lot of clients to be rebuilt from within houseKeeping
– I can't see much of an amortization effect here.
I assume that flow indexes are only written by the master, so we should be able to avoid race conditions using mutexes. I'll add a patch to this PR, hopefully later today.
Consistency/durability across server restarts or crashes is a more involved issue – we seem to be wandering into "proper" database territory. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a journal based implementation that I feel is more robust. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Loving the very clear prose comment at the beginning of journal.go!
If I'm not mistaken, there's still an opportunity for the client flow index to be corrupted by concurrent write operations – from removeClientFlowsFromIndex
, buildFlowIndexFromDatastore
, WriteFlowIndex
. Protecting client flow index files using a mutex seems to be the sensible thing to do. (This problem seems to have existed before this PR.)
I won't be able to properly test the journal implementation before Tuesday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code draft for client-specific flow index locks:
func (self *FlowStorageManager) lockFlowIndex(client_id string) {
for {
self.mu.Lock()
if _, locked := self.flowIndexLocks[client_id]; locked {
self.mu.Unlock()
time.Sleep(100 * time.Millisecond)
} else {
self.flowIndexLocks[client_id] = struct{}{}
self.FlowIndexMu.Unlock()
return
}
}
}
func (self *FlowStorageManager) unlockFlowIndex(client_id string) {
self.mu.Lock()
delete(self.flowIndexLocks, client_id)
self.mu.Unlock()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good point - I moved all the functions regarding building the index into the (per client) flowIndexBuilder which means they all share the same mutex and can not all run the same time.
Move removeClientFlowsFromIndex to the flowIndexBuilder to mediates contention with buildFlowIndexFromDatastore
This shares the lock with the other index building functions.
I have successfully tested your patches in our EFS-backed master/minion setup. Seems to perform well and we haven't noticed any irregularities. |
Ok thanks for testing it! I will merge this now. |
Commit 786e656 introduced a background job for updating client flow indices after deleting flows.
This background job always constructed indices from scratch, leading to slow I/O in a master/minion setup backed by Amazon EFS.
This PR allows individual flow IDs to be passed to the background job so it can simply filter them out from the index files, similar to the path taken when
delete_flow( … , sync = true )
is called.