[RFC] config: Support provider randomization#49
[RFC] config: Support provider randomization#49abhinav wants to merge 3 commits intofacebook:mainfrom
Conversation
|
I'm not sure how I feel about this implementation. As it stands, this will solve your use case (load-balanced mirror selection), but doesn't really help with more complicated cases which could require "weights" or "priority" What about a system that allows setting the "priority" of each provider and randomly selecting a given entry if priority is set? This would allow, e.g., hitting several mirrors first before trying the "backend" option with lower priority. But also, can you help me understand why you need this? Why not just just use DNS or a load-balancing front-end instead of doing the load-balancing client side? |
|
Happy to make the suggested changes once we agree on whether this change is desirable.
That's possible within a single organization or when one entity controls the infrastructure, but it doesn't fit with distributed ownership and hosting. As with the example in the PR description: the Zig compiler is provided as a download on the official website, plus a number of third-party mirrors. The infrastructure is distributed and not controlled by a single entity, so there's no putting a load balancer in front without adding centrally controlled infrastructure. I guess one question to consider here is whether more chaotic usage patterns are something y'all intend for dotslash to support, or whether it's expected mostly for use in-organization, where someone can add some uniformity to consumption patterns. |
|
Dotslash was originally built with the needs of Meta infra (single entity, simple distribution mechanisms) and likely fits well with other single-entity usage. But, we want to make sure it works for Open Source environments so I want to solve this problem. So what do you think about weights? Overkill or a good idea? |
|
The weights idea is great! I considered the idea other selection strategies, and decided to omit all that to avoid undue complexity to the schema before I even had approval that this direction was desirable. Are you thinking of something like this, perhaps? |
|
that seems like a good idea to me, but can we make weight optional to allow backwards compatibility? |
|
Oh, of course. I assumed that was a given. I guess in schema-ish terms: (Not sure if that's valid TypeScript, but I think it conveys the message.) I'll follow up on this in the coming days. |
**Proposed change**
This changes the dotslash file schema to add a new optional field:
```diff
{
// ...
providers: [ ... ]
+ providers_order: "sequential" | "random",
}
```
The default ("sequential") will try the providers in the order
they are specified in the 'providers' list.
This is how dotslash behaves today.
If the value is "random",
the order in which providers are tried will be randomized.
**Why**
This will result in better usage patterns
when there are multiple providers for an executable.
As an example, for the [Zig compiler](https://ziglang.org/),
while there's an officially hosted source,
it's preferred to use third-party mirrors
(https://github.com/mlugg/setup-zig/blob/153c8d5202cbb8c7e10831110a3afd27593eb960/mirrors.json)
to avoid hammering the official server.
**Binary size**
This adds ~16 KB to the output of `cargo build --release`:
Before 1103376 bytes
After 1119904 bytes
------- -------------
Change 16528 bytes
Resolves facebook#33
|
Updated with weighted randomization, @bigfootjon. |
|
I'll likely have to make some changes after I import this to Meta, but lgtm! |
|
@bigfootjon has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@bigfootjon merged this pull request in ac37a15. |
Proposed change
This changes the dotslash file schema to add a new optional field:
{ // ... providers: [ ... ] + providers_order: "sequential" | "weighted-random", }The default ("sequential") will try the providers in the order
they are specified in the 'providers' list.
This is how dotslash behaves today.
If the value is "weighted-random",
the order in which providers are tried will be randomized.
By default, all providers will be weighted equally.
This can be changed by adding a
"weight"valueto the provider configuration:
{"url": "https://primary.example.com/hermes.tar.gz", "weight": 3}, {"url": "https://mirror1.example.com/hermes.tar.gz", "weight": 1}, {"url": "https://mirror2.example.com/hermes.tar.gz", "weight": 1}, {"url": "https://mirror3.example.com/hermes.tar.gz", "weight": 1}Weight must be an integer >=1.
Zero ("don't select") is not permitted.
To shuffle providers in weighted order, they are sampled without replacement.
That is, even if the weight of a provider would present it twice before another option,
it is not re-added to the final order of providers—it is tried only once.
Why
This will result in better usage patterns
when there are multiple providers for an executable.
As an example, for the Zig compiler,
while there's an officially hosted source,
it's preferred to use third-party mirrors
(https://github.com/mlugg/setup-zig/blob/153c8d5202cbb8c7e10831110a3afd27593eb960/mirrors.json)
to avoid hammering the official server.
Binary size
This adds ~16 KB to the output of
cargo build --releaseon macOS ARM64:Resolves #33