Support URLs with origins and path prefixes#11951
Support URLs with origins and path prefixes#11951methylDragon wants to merge 2 commits intorerun-io:mainfrom
Conversation
There was a problem hiding this comment.
Hi! Thanks for opening this pull request.
Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.
There was a problem hiding this comment.
Overall this sounds like a reasonable feature—thank you for opening the PR!
As a quick fix, you could now already host Rerun under a subdomain and it should work out of the box. But I can see how this can sometimes could be too limiting. I'm curious: Is there anything preventing you from doing this that motivates this PR?
Important
The way we currently use paths with GRPC endpoints is already a bit of a hack, so I'm a bit hesitant complicating things even more at the current point in time.
Code-wise, I think there are also some changes that we need to make if we go down that path (pun intended).
- We should separate the path handling + the origin into a new object to avoid having to add logic to every new
*Urivariant. - This new object could then replace the existing
originfield in those structs. - Given that
path_prefixis more of a niche feature, I'd make it anOptiontoo, and use the builder pattern to add it. This is also motivated by the following: - Finally, we need to make sure that the implementation is robust against leading and trailing slashes. A method like
with_path_prefixcould be used to validate the inputs. We should also test those edge cases too.
The way we're hosting Rerun atm is:
The number of users and orgs are relatively unbounded for us, and make using a subdomain pretty tricky. The URL we end up with is something like: |
emilk
left a comment
There was a problem hiding this comment.
Seems like adding the path-prefix to the struct Origin would make the code simpler, and less error-prone (though the name Origin would be a bit misleading in that case)
There was a problem hiding this comment.
This is a bad change - using explicit destruction is preferred, as it forces us to consider new additions as they are made
Signed-off-by: methylDragon <methylDragon@intrinsic.ai>
Signed-off-by: methylDragon <methylDragon@intrinsic.ai>
fa20320 to
e44fee4
Compare
Added a new struct |
| pub struct EndpointAddr { | ||
| pub origin: Origin, | ||
|
|
||
| /// An optional path prefix, e.g. `/my/prefix`. | ||
| /// | ||
| /// The prefix is guaranteed to start with a slash if it is not empty, | ||
| /// and guaranteed not to end with a slash. | ||
| #[serde(default, skip_serializing_if = "Option::is_none")] | ||
| pub path_prefix: Option<String>, | ||
| } |
There was a problem hiding this comment.
Shouldn't this just be a url::Url?
I remember there being some shenanigans around default ports though:
There was a problem hiding this comment.
Technically speaking a URI is: <scheme>://<origin>/<subpath>/<endpoint>
I was taking the EndpointAddr to be <origin>/<subpath>, since it isn't including the final endpoint segment.
There was a problem hiding this comment.
Given that ^, what do you think? I'm trying to scope down the change as much as possible (I'm treating this as an extension to the origin, mostly)
There was a problem hiding this comment.
Our Origin contains the scheme as well. So my idea was to use an URL to represent everything up to endpoint. Maybe we could therefore use a simple wrapper struct around url::Url.
There was a problem hiding this comment.
By replacing Origin, we are also ensuring that we catch all instances where the path segment functionality needs to be added.
There was a problem hiding this comment.
If we want to land this change, we should also make sure that we add the prefix to all our SDKs, as handling otherwise becomes inconsistent.
The partition_url in the Python SDK is just one such example. There is also ConnectionHandle and probably more places.
Apologies, I'm a little unfamiliar with the code base, what do you mean by this? |
|
Sorry, was just about to clarify my comment. What I meant is: We basically would need to start using the new This looks like a pretty big task, so I wonder what @emilk's thoughts are here? |
|
Sorry for the slow response, most of us have been on out over the break. The tricky thing here is that we need to ensure to add this functionality to all places where we currently use If these above points are addressed (also commented above), I think this would be a very nice addition from a technical point of view. |
What
This PR updates the URL parser to support path prefixes. This allows Rerun instances to be hosted at non-root paths (e.g., behind a reverse proxy).
Before: The parser assumed Rerun endpoints existed at the root:
http://example.com/cataloghttp://example.com/proxyAfter: The parser now accepts arbitrary sub-paths:
http://example.com/custom/prefix/cataloghttp://example.com/custom/prefix/proxyMotivation
Currently, hosting Rerun behind a reverse proxy at a specific sub-path triggers a "Failed to parse URL" error.
For example, if
example.comhosts a Rerun instance atexample.com/hosted_rerun/, the current parser cannot handle the gRPC proxy link:You get a "Failed to parse URL error"! Hence motivating this PR.
I am trying to host a Rerun web viewer behind such a reverse proxy and getting this issue.
Additional Concerns
Some of the URL parsing logic needs to search for keywords to then extract arguments from.
Consider:
Should this be an "entry" or a "dataset" URL?
I decided it would be a "dataset" URL, to support cases where an external page has a really long, accidentally clobbering path prefix, by having the last occurence of a keyword be what determines what kind of page it is e.g.:
If we searched the first, the chance of an unintentional collision is higher.
Tests
I added more unit tests and adjusted the pre-existing one.