-
Notifications
You must be signed in to change notification settings - Fork 23
WIP: RSDK-10435: Dial direct #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
server_->shutdown(); | ||
|
||
if (parent_) { | ||
try { | ||
parent_->close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was causing a double call to stop_all
, the second one happening after disconnecting from the robot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does conan
now give us a vehicle for building the entire dependency tree under ASAN and UBSAN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes in theory but it's probably pretty tricky, see here for discussion (old conan docs) https://docs.conan.io/1/howtos/sanitizers.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! A couple small things but generally looks good to me.
src/viam/sdk/rpc/dial.cpp
Outdated
@@ -1,3 +1,4 @@ | |||
#include <grpc/grpc.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) should move this to a separate include block
src/viam/sdk/rpc/dial.cpp
Outdated
#include <viam/api/proto/rpc/v1/auth.grpc.pb.h> | ||
#include <viam/api/proto/rpc/v1/auth.pb.h> | ||
#include <viam/api/robot/v1/robot.grpc.pb.h> | ||
#include <viam/api/robot/v1/robot.pb.h> | ||
#include <viam/sdk/common/client_helper.hpp> | ||
#include <viam/sdk/common/exception.hpp> | ||
#include <viam/sdk/common/private/instance.hpp> | ||
#include <viam/sdk/rpc/private/viam_grpc_channel.hpp> | ||
#include <viam/sdk/rpc/private/viam_rust_utils.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Hm we did this wrong here (my bad!) but in most places in the SDK we've got the api
includes in a separate block before the sdk
includes. Could we do the same here?
|
||
return grpc::CreateChannel(address, tls_creds); | ||
} | ||
|
||
std::shared_ptr<grpc::Channel> create_viam_channel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not worth fixing here, but it's unfortunate and a little confusing that create_viam_channel
creates a grpc::Channel
, not a ViamChannel
. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it :)
using namespace proto::rpc::v1; | ||
|
||
auto auth_stub = AuthService::NewStub(channel_for_auth); | ||
ClientContext ctx; | ||
AuthenticateRequest req; | ||
|
||
*req.mutable_entity() = opts.auth_entity.get(); | ||
*req.mutable_credentials()->mutable_payload() = opts.credentials->payload(); | ||
*req.mutable_credentials()->mutable_type() = opts.credentials->type(); | ||
|
||
AuthenticateResponse resp; | ||
|
||
auto status = auth_stub->Authenticate(ctx, req, &resp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we factor this out into a helper function? I think it's a bit cleaner, and also will be relevant if/when we want to have app client support in C++.
src/viam/sdk/rpc/dial.hpp
Outdated
/// @brief Bypass WebRTC and connect directly to the robot. | ||
/// This dials directly through grpc bypassing rust utils. | ||
bool disable_webrtc = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(opt) Maybe not a big deal (I don't think we do this in other SDKs), but it might be good to add a note here about the specialized uri
form that is required (<part>.<location>.local.viam.cloud:8080
). I could definitely imagine that being a gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I wanted to ask you about that, should we try to validate the URI under this code path and error out of it's not of that form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good question. I think probably no. My thinking is:
- that format might change in the future and that could lead to some issues that might be annoying to track down, and
- that format is currently required assuming the user isn't doing anything weird or bespoke. If, e.g., they were to use a different app instance than
app.viam.com
then maybe the format would be different? I have no idea but I don't want to just break things for someone if such a case were to arise
On the other hand, it is a little annoying if we have users saying "why doesn't this work??" and we say "oh it says in the comments" and they say "that wasn't helpful, if you'd just given me an error message I would've known much faster". So while I'm inclined not to error out, I'm open to being convinced.
Maybe we add a warning
or info
log message? Probably excessive 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look out of interest. Please don't let these comments get in the way of moving this forward, they are just observations and notes on things that have bugged me about the code that is under modification here, so this seemed like a time to raise them. I'm happy for any or none of these issues to be deferred into future work.
// TODO (RSDK-917): We currently don't provide a flag for disabling webRTC, instead relying on a | ||
// `local` uri. We should update dial logic to consider such a flag. | ||
|
||
struct DialOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what was the motivation for making this a struct over a class? Keeping it as a class would make it easier in the future to hide the implementation details behind a pimpl in order to make it not part of the ABI.
I also liked the fluent style setters, so you could write:
ViamChannel::dial(where, DialOptions().set_entity(...).set_credentials()...));
Admittedly, the structure of some of the existing setters is a little off. For instance, it isn't clear to me why set_credentials
takes an optional
. When would you call set_credentials
with a disengaged one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so personally I was applying the C++ core guideline against trivial getters and setters in a class which hint that something probably should have just been a struct
. With newer C++ (or older C++ with gnu extensions) I'm much more in favor of using designated initializers, eg
ViamChannel::dial(where, {
.entity = ...,
.credentials = ...,
});
this PR has exposed a bunch of weird stuff around these classes and optional
, I was going to try to clean up the example code because there's a lot of unnecessary instances of constructing an options
and then a optional<options>
which is set to the previous options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Designated initializers are definitely great (though C++20 IIRC), but it does create a very tightly bound interface: not only can you not change the sizes / orders of the members of the struct, but you can't even interpose on the act of setting the values to add behavior or checks. I have some doubts that they are the right thing to use for a public interface in an SDK.
On the other hand, we aren't there yet with worrying about ABI, so I'm not going to push too hard either way on it.
|
||
/// @brief Timeout of connection attempts when initially dialing a robot | ||
/// Defaults to 20sec to match the default timeout duration | ||
std::chrono::duration<float> initial_connection_attempt_timeout_{20}; | ||
std::chrono::duration<float> initial_connection_attempt_timeout{20}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⬇️ - I've always wondered: what is the point of having Options
separate from DialOptions
? And if they are specifically part of RobotClient
, maybe they belong as part of that class (i.e. RobotClient::Options
) or at least in that header? Having a SDK namespace level type named Options
when it is only associated with RobotClient
has always bothered me.
I'm also curious if having the default should be to have a refresh interval (what would it be?), and provide an API to opt out:
class [RobotClient::]Options {
Options();
Options(DialOptions dial_options);
Options& disable_refresh();
Options& set_refresh_interval(std::chrono...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect a lot of this may be cruft that was not thought over a ton and hasn't been updated. I agree no refresh seems like an odd default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing an overhaul of this API (commit forthcoming) and made the troubling discovery that a lot of our example code says that passing a refresh interval of zero seconds does auto refresh. I'm not sure what auto refresh means or if this was ever the case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: documentation has been updated to reflect that zero seconds means 'only on config update'
@@ -11,7 +11,8 @@ | |||
namespace viam { | |||
namespace sdk { | |||
|
|||
class DialOptions; | |||
struct DialOptions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these really ViamChannelOptions
? Or ViamChannel::Options
? In which case, this header could be viam_channel.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they probably are indeed ViamChannel::Options
} | ||
|
||
const std::chrono::duration<float> float_timeout = opts.timeout; | ||
void* ptr = init_rust_runtime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was captured by unique_ptr
with free_rust_runtime
as the deleter type, some things could be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea i'll try to do it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to do it, you could probably just create a nice shim class in sdk/rpc/private/viam_rust_utils.h
that called init_rust_runtime
in the ctor, and presented dial
as a member.
|
||
const std::string& token = | ||
Instance::current(Instance::Creation::open_existing).impl_->direct_dial_token; | ||
if (!token.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised to see this as a global, but maybe I'm misunderstanding. But if you had one channel that was doing direct dial and another that wasn't, wouldn't this by trying to add this metadata to the contexts for both channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, although I'm not sure how common/feasible such a situation would be in practice, or if we should error out if we detect that kind of thing? cc @stuqdog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is a good question. I'm not sure how common such a situation is in practice currently, but I do think it's certainly feasible and I could imagine a case where, e.g., someone is connecting to two machines at once and would prefer to use direct gRPC but is required to use webRTC for some of its added functionality on one of the machines only.
For that matter, even if both channels were dialing direct, if they were dialing to different machines then the token would only be valid for one of them! So yeah, probably tokens should either not be global or (if that's not feasible) should be keyed to a particular channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok so this is probably something we should handle better. i'll try to look into the architecture and see how to make this work probably on a per channel basis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acmorrow after much wailing and gnashing of teeth we figured out a way to do this without a global, thereby ballooning the surface area of this PR considerably.
server_->shutdown(); | ||
|
||
if (parent_) { | ||
try { | ||
parent_->close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does conan
now give us a vehicle for building the entire dependency tree under ASAN and UBSAN?
Enables dialing directly to gRPC, potentially circumventing rust-utils entirely.
This includes some QoL/drive-by changes, such as
struct
ifyingDialOptions
and removing theshared_ptr
fromRobotClient