Skip to content

Support URLs with origins and path prefixes#11951

Open
methylDragon wants to merge 2 commits intorerun-io:mainfrom
methylDragon:ch3/support-external-path-prefixes
Open

Support URLs with origins and path prefixes#11951
methylDragon wants to merge 2 commits intorerun-io:mainfrom
methylDragon:ch3/support-external-path-prefixes

Conversation

@methylDragon
Copy link

@methylDragon methylDragon commented Nov 23, 2025

What

This PR updates the URL parser to support path prefixes. This allows Rerun instances to be hosted at non-root paths (e.g., behind a reverse proxy).

Before: The parser assumed Rerun endpoints existed at the root:

  • http://example.com/catalog
  • http://example.com/proxy

After: The parser now accepts arbitrary sub-paths:

  • http://example.com/custom/prefix/catalog
  • http://example.com/custom/prefix/proxy

Motivation

Currently, hosting Rerun behind a reverse proxy at a specific sub-path triggers a "Failed to parse URL" error.

For example, if example.com hosts a Rerun instance at example.com/hosted_rerun/, the current parser cannot handle the gRPC proxy link:

https://example.com/hosted_rerun/url?=rerun%2Bhttps://example.com/hosted_rerun_grpc_data/proxy

You get a "Failed to parse URL error"! Hence motivating this PR.

I am trying to host a Rerun web viewer behind such a reverse proxy and getting this issue.

Additional Concerns

Some of the URL parsing logic needs to search for keywords to then extract arguments from.

Consider:

http://example.com/sub/path/entry/entry_id/dataset/dataset_id

Should this be an "entry" or a "dataset" URL?

I decided it would be a "dataset" URL, to support cases where an external page has a really long, accidentally clobbering path prefix, by having the last occurence of a keyword be what determines what kind of page it is e.g.:

http://example.com/path/that/contains/example/dataset/and/then/hosts/rerun/dataset/dataset_id

If we searched the first, the chance of an unintentional collision is higher.

Tests

I added more unit tests and adjusted the pre-existing one.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for opening this pull request.

Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.

@grtlr grtlr self-requested a review November 24, 2025 08:14
Copy link
Member

@grtlr grtlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this sounds like a reasonable feature—thank you for opening the PR!

As a quick fix, you could now already host Rerun under a subdomain and it should work out of the box. But I can see how this can sometimes could be too limiting. I'm curious: Is there anything preventing you from doing this that motivates this PR?

Important

The way we currently use paths with GRPC endpoints is already a bit of a hack, so I'm a bit hesitant complicating things even more at the current point in time.

Code-wise, I think there are also some changes that we need to make if we go down that path (pun intended).

  • We should separate the path handling + the origin into a new object to avoid having to add logic to every new *Uri variant.
  • This new object could then replace the existing origin field in those structs.
  • Given that path_prefix is more of a niche feature, I'd make it an Option too, and use the builder pattern to add it. This is also motivated by the following:
  • Finally, we need to make sure that the implementation is robust against leading and trailing slashes. A method like with_path_prefix could be used to validate the inputs. We should also test those edge cases too.

@methylDragon
Copy link
Author

methylDragon commented Nov 24, 2025

Overall this sounds like a reasonable feature—thank you for opening the PR!

As a quick fix, you could now already host Rerun under a subdomain and it should work out of the box. But I can see how this can sometimes could be too limiting. I'm curious: Is there anything preventing you from doing this that motivates this PR?

The way we're hosting Rerun atm is:

  • We're spinning up a different VM for each user who is requesting visualization of a file they have (MCAP). These VMs are ephemeral and separately authed.
  • We also have different "organizations" that the users belong to

The number of users and orgs are relatively unbounded for us, and make using a subdomain pretty tricky. The URL we end up with is something like: https://our_site.com/org/user/vm/rerun, hence motivating use of subpaths. (Similarly for the grpc server)

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like adding the path-prefix to the struct Origin would make the code simpler, and less error-prone (though the name Origin would be a bit misleading in that case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad change - using explicit destruction is preferred, as it forces us to consider new additions as they are made

Copy link
Author

@methylDragon methylDragon Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: methylDragon <methylDragon@intrinsic.ai>
Signed-off-by: methylDragon <methylDragon@intrinsic.ai>
@methylDragon methylDragon force-pushed the ch3/support-external-path-prefixes branch from fa20320 to e44fee4 Compare November 27, 2025 02:43
@methylDragon
Copy link
Author

Seems like adding the path-prefix to the struct Origin would make the code simpler, and less error-prone (though the name Origin would be a bit misleading in that case)

Added a new struct EndpointAddr and used it, composing Origin

@methylDragon methylDragon requested review from emilk and grtlr November 27, 2025 02:46
Comment on lines +7 to +16
pub struct EndpointAddr {
pub origin: Origin,

/// An optional path prefix, e.g. `/my/prefix`.
///
/// The prefix is guaranteed to start with a slash if it is not empty,
/// and guaranteed not to end with a slash.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub path_prefix: Option<String>,
}
Copy link
Member

@grtlr grtlr Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be a url::Url?

I remember there being some shenanigans around default ports though:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking a URI is: <scheme>://<origin>/<subpath>/<endpoint>

I was taking the EndpointAddr to be <origin>/<subpath>, since it isn't including the final endpoint segment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ^, what do you think? I'm trying to scope down the change as much as possible (I'm treating this as an extension to the origin, mostly)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Origin contains the scheme as well. So my idea was to use an URL to represent everything up to endpoint. Maybe we could therefore use a simple wrapper struct around url::Url.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By replacing Origin, we are also ensuring that we catch all instances where the path segment functionality needs to be added.

Copy link
Member

@grtlr grtlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to land this change, we should also make sure that we add the prefix to all our SDKs, as handling otherwise becomes inconsistent.

The partition_url in the Python SDK is just one such example. There is also ConnectionHandle and probably more places.

@methylDragon
Copy link
Author

If we want to land this change, we should also make sure that we add the prefix to all our SDKs, as handling otherwise becomes inconsistent.

The partition_url in the Python SDK is one such example.

Apologies, I'm a little unfamiliar with the code base, what do you mean by this?

@grtlr
Copy link
Member

grtlr commented Nov 27, 2025

Sorry, was just about to clarify my comment. What I meant is: We basically would need to start using the new EndpointAddr instead of Origin in many places to have consistency across all SDKs and use cases.

This looks like a pretty big task, so I wonder what @emilk's thoughts are here?

@methylDragon
Copy link
Author

methylDragon commented Jan 7, 2026

Bump on this @grtlr / @emilk

Alternatively, is there work done/underway for supporting hosted instances of Rerun, pointing to gRPC data located behind proxies? This issue is unfortunately preventing us from upgrading from Rerun v0.22.X

@grtlr
Copy link
Member

grtlr commented Jan 8, 2026

Sorry for the slow response, most of us have been on out over the break.

The tricky thing here is that we need to ensure to add this functionality to all places where we currently use Origin. That means all SDKs and the link sharing in the browser, so this is a big undertaking. @abey79 has also been refactoring the API over the last couple of weeks, so he might have opinions too.

If these above points are addressed (also commented above), I think this would be a very nice addition from a technical point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments