GRPC calls may hang indefinitely in the event of a server fault

## Problem to Solve

Suppose there is some (any) kind of issue affecting the server, or the connection to it. In a recent incident we had a TypeDB Cloud cluster node that was not responding to the `user_token` GRPC request. This meant that `TypeDB.cloudDriver` (using, in our case, either the Java driver or Rust driver) would hang indefinitely, rather than throwing an error.

## Proposed Solution

The most obvious solution would be to add a timeout to GRPC calls in the Rust driver. This would need to be done with care, as long-running queries are legitimate.

## Additional Information

We ran the following test...
```rs
let connection = Connection::new_cloud_with_translation(
        [
            ("address1", "localhost:1729"),
            ("address2", "localhost:1730"),
            ("address3", "localhost:1731"),
        ]
        .into(),
        Credential::without_tls("username", "password")
    )
    .unwrap();
```

... with the following modifications to our source code (`println` statements)
```rs
/* connection/network/transmitter/rpc.rs */

    pub(in crate::connection) fn start_cloud(
        address: Address,
        credential: Credential,
        runtime: &BackgroundRuntime,
    ) -> Result<Self> {
        println!("{}", address.clone().to_string());
        let (request_sink, request_source) = unbounded_async();
        let (shutdown_sink, shutdown_source) = unbounded_async();
        runtime.run_blocking(async move {
            println!("a");
            let (channel, call_credentials) = open_callcred_channel(address, credential)?;
            println!("b");
            let rpc = RPCStub::new(channel, Some(call_credentials)).await;
            println!("c");
            tokio::spawn(Self::dispatcher_loop(rpc, request_source, shutdown_source));
            Ok::<(), Error>(())
        })?;
        Ok(Self { request_sink, shutdown_sink })
    }
```
```rs
/* connection/network/stub.rs */
    pub(super) async fn new(channel: Channel, call_credentials: Option<Arc<CallCredentials>>) -> Self {
        println!("d");
        let mut this = Self { grpc: GRPC::new(channel), call_credentials };
        println!("e");
        if let Err(err) = this.renew_token().await {
            warn!("{err:?}");
        }
        println!("f");
        this
    }

    async fn renew_token(&mut self) -> Result {
        if let Some(call_credentials) = &self.call_credentials {
            trace!("renewing token...");
            println!("g");
            call_credentials.reset_token();
            let req = user::token::Req { username: call_credentials.username().to_owned() };
            trace!("sending token request...");
            println!("h");
            let token = self.grpc.user_token(req).await?.into_inner().token;
            println!("i");
            call_credentials.set_token(token);
            trace!("renewed token");
            println!("j");
        }
        Ok(())
    }
```

This produced the following output ...
```
running 1 test
localhost:1730
a
b
d
e
g
h
test integration::network::address_translation has been running for over 60 seconds
```

This indicates that it hung at `self.grpc.user_token(req).await` in `renew_token`.

Naturally, you'd need a broken server to actually reproduce the issue; we hypothesise that it stops responding when there are too many concurrent connections to it.

Putting the server on a breakpoint might also work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPC calls may hang indefinitely in the event of a server fault #672

Problem to Solve

Proposed Solution

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPC calls may hang indefinitely in the event of a server fault #672

Description

Problem to Solve

Proposed Solution

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions