-
Notifications
You must be signed in to change notification settings - Fork 109
Deployments improvements #3859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployments improvements #3859
Conversation
* Add `breaking` flag. This is needed when you want people to avoid users overwriting by mistake a deployment, but they still need to allow breaking changes (e.g. changing service type). `force` remains same as before, with same behavior. * Change default for `force` to `false`. * When registering the same service, without any flags set, 200 is returned to signal the request was idempotent (no change done). When force is set, 200 means overwrite. * These changes are part of `AdminAPIVersion::V3` (to make sure CLI still behaves as before with /v2 prefix) * Encapsulated the business logic of "semantic equality" between two deployments inside `Deployment::semantic_eq` and `DeploymentMetadata::semantic_eq` * Apply the new experience to the CLI.
* This allows adding arbitrary metadata to deployments during registration, for observability. * Auto-resolve the environment info for github actions. This should be enough to link the deployment back to the repository and related commit.
This redesign is meant to cover a bunch of use cases/user flows: * The common scenario of token/assume_role arn changed/expired, and all the invocations to a deployment are stuck. With this API, we can expose in the UI an edit box to update the deployment, changing the additional_headers/assume_role_arn * The less common scenario when a user wants to update the address of an endpoint. This used to be your "only way out" using Lambda with stuck invocations before 1.5. From 1.5, with pause, we can resume on a different endpoint, so this user flow is now less relevant but we still support it here. * The probably useless scenario where the user wants to forcefully overwrite the schema of an old deployment. This is still possible, but behind an opt-in flag.
For whoever reviews this, the context is this: https://www.notion.so/restatedev/Deployment-flows-272bfec2629180dc922ddd67acd54ccb?source=copy_link |
Tested manually, all good |
The e2e test rightfully fails, because it's depending on the old behavior of update_deployment. Will update that. |
Fixing the e2e tests here restatedev/e2e#395 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slinkydeveloper for this PR. I left few questions
} | ||
|
||
/// This returns true if the two deployments are to be considered the "same". | ||
pub fn semantic_eq_with_address_and_headers(&self, other_addess: &DeploymentAddress) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two types DeploymentAddress
and DeploymentType
are suspiciously similar. Is this part of the planned clean up you are intending to do?
Also maybe a goo idea if DeploymentType
has method address()
which returns DeploymentAddress
then implementation basically can use the same implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two types DeploymentAddress and DeploymentType are suspiciously similar. Is this part of the planned clean up you are intending to do?
They are similar, not exactly the same. But indeed I want to clean this up in the next iteration of cleanups with the upcoming PRs.
Ok(schema_updater.into_inner()) | ||
} | ||
|
||
pub fn update_and_return<T, E>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully get the reason for this pattern. Why instead of passing an update_fn we instead do something like
let updater = schema.into_updater();
// do update operations
// here ...
let schema = updater.commit()?;
// .end() or .finish() but it only return an updated version of the schema with correct version.
Is there any particular reason it's done this way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the lambda approach as it's more self contained, and exposes less methods outside. The .commit()
approach to me makes sense if a commit actually happens at the end, in this case no commit happens, we just return the updated data structure.
Will hide this anyway behind SchemaRegistry
, once i finish the next PR.
&service_name, | ||
&new_service, | ||
previous_service_revision, | ||
RemovedHandlerBehavior::Warn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's okay to change the service type here or remove a handler but not when doing add_deployment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all a matter of defaulting and optimizing the common use case:
- 99% of people using the update_deployment API won't use
overwrite: true
in the first place, as they just need to update the api token, or the assume_role. And the default reflects that:overwrite = false
. - That 1% of people using that, will use it as an escape hatch in case you make some mistakes between registrations etc. In that case, it makes sense to do
overwrite = true
and you just take your risks.
In add_deployment
, defaults are again leaning on safety first:
- Most people doing properly immutable deployments want these safety checks -> by default those are enabled
- When the safety check doesn't pass, and the user knows what is doing, they can provide
--breaking
- When the user does mutable deployments, they take their risks and go with
--force
/// Declared SDK during discovery | ||
pub sdk_version: Option<String>, | ||
pub created_at: MillisSinceEpoch, | ||
pub metadata: HashMap<String, String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find Metadata
to have a field metadata
is confusing. Maybe call it extra
or user_defined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to flatten this DeploymentMetadata
into Deployment
directly in the subsequent PRs. Also naming here is meant to align with service metadata and handler metadata.
@muhamadazmy I applied the feedback, PTAL |
Addressing the issues described here #3144