-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Refresh aspired servables/versions following config update #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: r1.15
Are you sure you want to change the base?
Conversation
Currently when the configured model list is updated via a call to handleReloadConfigRequest, the request thread blocks until any newly added models become available. Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns. This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the --file_system_poll_wait_seconds=0 setting (in this case all handleReloadConfigRequest calls time-out and do not take effect).
I have opened this against 1.15 since that's the version we are using, but can rebase on a different branch if needed. Also apologies in advance for the code, I am not very familiar with C++. |
tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add unit test coverage?
Thanks @christisg, I've pushed a commit to address your logging comment. I will aim to add unit test coverage when I get a chance... it will take me a bit longer due to unfamiliarity with C++ and the codebase/test framework. |
@njhill thanks for reporting this bug. It's painful and took me more than 3 hrs to figure out REAL behavior when setting |
@njhill do you want wrap this PR by adding unit-test as requested by the reviewer? thanks! |
@netfs apologies for letting this lag. I am not sure when I will realistically have a chance to do this since I'm especially busy right now and not very familiar with C++ or the testing setup so it would take me a decent chunk of time to do. Any help with that part would be appreciated! |
no worries @njhill. @astleychen do you want to help here and add tests? |
Currently when the configured model list is updated via a call to
handleReloadConfigRequest
, the request thread blocks until any newly added models become available.Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns.
This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the
--file_system_poll_wait_seconds=0
setting (in this case allhandleReloadConfigRequest
calls time-out and do not take effect).Fixes #1519