-
Notifications
You must be signed in to change notification settings - Fork 109
restate up command #3904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
restate up command #3904
Conversation
This change reworks how we define and manage network ports, and achieves a bunch of goals in one go, at the cost of being quite big, of course. ## Unix-socket all the things Restate server now supports listening on unix-sockets on all services (fabric, admin, ingress, and tokio-console). But even better, we listen on *both* the inet socket and unix-socket by default. Unix sockets get automatically created under the `restate-data/*.sock` and get cleaned up on shutdown (even if not, they are cleaned up on the next start). The unix-socket support include restatectl and restate CLIs. `restatectl -s unix:restate-data/admin.sock status` and setting env variables like `RESTATE_ADMIN_URL=unix:restate-data/admin.sock` will work for `restate svc status`. You can also use `curl` to call ingress like `curl --unix-socket restate-data/ingress.sock [http://local/Counter/123/add](http://test/Counter/123/add) --silent --json 1` (note that the hostname in the URL is ignored when connecting with unix-sockets. Listening on unix-sockets can be disabled on all ports or on certain services, with a new option `listen-mode` that can be supplied in env-variable, config file, or `restate-server --listen-mode=tcp`. Listen modes support `tcp`, `unix`, and `all (default)`. When using `unix` we’ll only listen on unix-sockets and all advertised addresses will automatically be derived to show the unix-socket address. As a result of this change, all unit tests now use unix-sockets, no more port conflicts with your locally running services and potentially less flaky tests on CI. I’ve updated (and simplified) the local-cluster-runner utility to make use of it. There is more room for more improvements there still. ## Let’s talk about ports **Random Ports** Restate can now select random ports on startup by `restate-server --use-random-ports=true` or `RESTATE_USE_RANDOM_PORTS=true`. Those are conflict-free (os-selected) and because unix-sockets are also created, users (in the future) can use restate/restatectl to by pointing them to the unix-socket until they figure the ports. We print the advertised addresses for all services on startup and with the new `restate-server --no-logo` advertiseds address will be the first thing printed on stdout. **Socket Activation** Another cool feature is the support for LISTEN_FD/systemd compatible file-descriptor passing for listener sockets from the parent process to restate-server. A parent process can open the tcp listeners and even the unix-socket listeners (except for fabric port) and pass the file descriptors to restate-server (i.e. via systemd socket activation, or a utility like `systemfd` ). for instance `systemfd --no-pid -s http::9000 -- restate-server` where `9000` becomes the ingress port. You can pass multiple ports and restate has a certain order to assign those ports. *What does this bring to the table?* 1. Restarting restate-server without losing the socket listeners (ingress is the biggest winner), so clients will not observe connection errors during restart or upgrades. 2. Test harnesses, wrappers, or even our own tools can pre-allocate the tcp ports, and listen on them before starting restate. Those external wrappers don’t need to wait any more for restate-server to start before they try to connect to it (no connection retries needed). **This unlocks embedding restate in tests or shipping a restate-lite version that only listens on anonymous unix sockets**. In fact, we are a couple of steps away from making it possible to fully embed restate for use-cases that don’t need a server. The only small thing that’s missing is the invoker using a pre-supplied file-descriptor and/or support unix-sockets to connect to deployments. 3. Restate server will now attempt to bind on all required ports and unix-sockets very early in its startup, before starting any roles or opening the database. This reduces the downtime window, and allows us to centralize port assignment (for random) and gives us a nice place to print all addresses. ## Advertised Addresses The PR unifies how we manage and configure advertised addresses for all services, it deprecates some of the old inconsistently named configuration keys (admin and ingress advertised addresses). But most importantly, restate-server will now attempt to detect a reasonable value for the advertised address. If the restate is listen mode is `unix` (only), it’ll now print `unix:/` advertised addresses, and if it’s tcp, it’ll try and detect the public routable IP address of the node instead of using `127.0.0.1` This makes docker deployments much nicer while maintaining to override all of them as needed. There is also a new option to override the hostname part of this address only without interfering with random ports `RESTATE_ADVERTISED_HOST=my_host.com` (or via cli, config) for global override, and it can be applied per service (`RESTATE_ADMIN__ADVERTISED_HOST=`). In fact, all new options can be overridden per service. Additionally, all addresses and ports are now managed by a new component `AddressBook` that’s available via task-center. The address book is what powers handing off listeners down to services and it provides an interface to query all bound addresses such that we can return them in future `GetIdent` responses (not implemented yet). A related improvement is how we configure `metadata-client` ’s `addresses`. Nodes now do not need to supply their own node advertised address in `addresses`. They only need to know about one or more of their peers but we'll now automatically include our own node if it's running a metadata server, thanks to early port binding and the `AddressBook`, this makes `addresses` field in config completely optional and for single-node setups, it's now empty by default. This opens the door (not implemented) to adding support for `restate-server --peer=<address>` which would allow restate nodes to join a cluster and bootstrap completely by connecting to known peer address. This will let is figure its own correct advertised address, its metadata configuration without passing a configuration file. ## Misc - New config options (global and with per-service override) `bind-port`, `bind-ip`, `advertised-host`, `use-random-ports` and `listen-mode` - `restate-hyper-uds` crate to support using unix-socket with hyper clients, we should have a **config-gated** option to allow invoking deployments via unix-sockets too (any takers?) - Port numbers, unix socket names, and service names are defined in a set of zero-cost types in `restate-types::net::address` - Type-checked usage of addresses in all the codebase to denote which services they're meant to refer, this avoid confusion where a type like `AdvertisedAddress` didn't make it clear which service. You’ll see types like `AdvertisedAddress<AdminPort>` everywhere now. - Documentation and configuration json schema express the service name and defaults according to the `ListenerPort` type parameter. # What did we lose? - For simplicity and to reduce confusion, unix-socket paths are not configurable anymore through `bind-address`, they will now be always created under restate_data directory (fabric.sock, ingress.sock, admin.sock, and tokio.sock (if enabled). The socket files are deleted on process shutdown. The benefit is that their locations are predictable for tools, users, and system operators. `restatectl -s unix:restate_data/admin.sock status` - Unix socket names have a limit of ~108 bytes in most unix systems, this puts a limit over the path length of restate-data, I've included a small optimization that converts the path into relative if CWD is a prefix of the data-dir but this is not guaranteed solution. I'd say we evaluate how much is this going to be a problem in practice and we can provide a configurable base directory for unix-sockets via config and env variable as needed. It's literally a single variable in AddressBook.
Adds bound_addresses and advertised_addresses to GetIdent responses to enable tools and automations to extract that information for future use.
Note to reviewers. This is a PR stack. Select the latest commit only to see the changes relevant to this particular PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea, I had something like that in my list for some time already.
I would love to fit this in the bigger picture of the quickstart. For doing that, i would do the following (stuff that i can take over after this PR):
- Ship this in the templates, as
npm run restate-dev
i guess should run thisrestate dev
to let you play around. - Remove the server download from the quickstart, no need for that anymore!
- Integrate some form of auto-registration (needs some changes in the SDKs too). The starting point is the template, you write two commands in terminal, one
restate up
and onenpm run dev
, poof ready to send requests. In that case, I guess the auto registered counter service maybe is not needed? - Change some of the conf defaults to be more suitable for debugging. For example, one thing I've found very useful is increasing to 1day both abort and inactivity timeout, to avoid triggering disconnections/suspensions during debugging sessions.
- Also logging, I think it does make sense that we show them, but only the ones related to invocation. This is important because it gives users a sense that something is going on. What I think we can do there is simply to tune the default RUST_LOG filters, to show only things we care about.
- (moving forward) Some basic shortcuts while the dev command is running (like
worker dev
does), to do common things in development (kill all invocations for example, clear all states).
pub async fn run(State(_env): State<CliEnv>, opts: &Dev) -> Result<()> { | ||
let cancellation = CancellationToken::new(); | ||
let temp_dir = tempfile::tempdir()?; | ||
let data_dir = temp_dir.path().to_path_buf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the IDE plugins i'm always using {project root dir}/.restate/dev-cluster
.
Maybe here it makes sense to use {cwd}/.restate/dev-cluster
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be an option if we do --retain
? I'm not sure if I'd expect us to delete the data on stop from the dot directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the opposite. By default retain, and if the user wants, clean it up on start using wipe
or smth like that. This is what i personally found more useful.
Also temp_dir
s might be problematic on some locked down machines (we had this problem with some of our bank customers), so $cwd/.restate/dev-cluster
might work better.
cli/src/commands/dev.rs
Outdated
// register mock service | ||
discover_deployment(&admin_uds, format!("http://{mock_svc_addr}/")).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic! :)
it's also running in-process on a random port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes what i meant is where is the code for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which SDK is that using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces `restate-lite` a crate that provides restate core functionality as a library. The library is intended to be used in developement or testing use cases. Therefore, it uses defaults tuned for for that purpose.
A developer-focused version of restate that's embedded into the restate CLI. It starts restate on an ephemeral temporary directory that's auto-deleted after Ctrl+C. 1. Supports --use-random-ports 2. Emits very clean output, it doesn't show the server log. Just a table of addresses. 3. Opens the admin UI automatically on startup in the browser 4. Runs the Counter service on a random port and auto-registers it by default so you can play with the UI immediately with that service. 5. Supports --retain to persist the temporary directory (meant to be used in debugging) and currently it doesn't support choosing your own directory
A developer-focused version of restate that's embedded into the restate CLI. It starts restate on an ephemeral temporary directory that's auto-deleted after Ctrl+C.
Stack created with Sapling. Best reviewed with ReviewStack.