Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIP-0088: Add support for upgradable actors #873

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions FIPS/fip-actor-upgrades.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
fip: "<to be assigned>" <!--keep the qoutes around the fip number, i.e: `fip: "0001"`-->
title: Add support for upgradable actors
author: Fridrik Asmundsson (@fridrik01), Steven Allen (@stebalien)
discussions-to: https://github.com/filecoin-project/FIPs/discussions/396
status: Draft
type: Technical
category: Core
created: 2023-11-27
---

## Simple Summary

This FIP introduces support for upgradable actors, enabling deployed actors to update their code while retaining their address, state, and balance. This feature is currently limited to use by built-in actors, and as of now, no built-in actor has been updated to become upgradable.

## Abstract

This FIP proposes the integration of upgradable actors into the Filecoin network through the introduction of a new `upgrade_actor` syscall and an optional `upgrade` WebAssembly (Wasm) entrypoint.

Upgradable actors provide a framework for seamlessly replacing deployed actor code, significantly enhancing the user experience when updating deployed actor code.

## Change Motivation

Currently, the code associated with all actors on the Filecoin Network is immutable once deployed. To modify the actor code, such as fixing a security bug, the following steps are required:
1. Deploy a new actor with the corrected code.
2. Migrate all state from the previous actor to the new one.
3. Update all other actors interacting with the old actor to use the new actor.

By adding support for upgradable actors, deployed actors can easily upgrade their code and no longer need to go through the series of steps mentioned above.

fridrik01 marked this conversation as resolved.
Show resolved Hide resolved
This FIP is also motivated by the `f4` extensible address class which was introduced in [FIP-0048] and required special "placeholder" actors to support interactions with addresses that do not yet exist on-chain. With upgradable actors we can simplify this address class and remove these placeholder actors completely. We will be able to deploy real actors and upgrade their code on first send.

Furthermore, this FIP paves the way for moving more network upgrade logic on-chain in the future, enabling a more seamless process for implementing critical updates and ensuring the continuous improvement of the Filecoin Network.

## Specification

Introducing support for actor upgrades involves the following changes to the FVM:

1. Adding a new `upgrade` Wasm entrypoint, which actors must implement in order to be a valid upgrade target.
2. Adding a new `upgrade_actor` syscall, enabling actors to upgrade themselves.

These changes are discussed in detail in the following sections.

### New upgrade Wasm Entrypoint

We introduce a new optional `upgrade` Wasm entrypoint. Deployed actors must implement this entrypoint to be a valid upgrade target. It is defined as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why must deployed actors implement it? It's not invoked on deployed actors, it's invoked on the new code. The algorithm below does not specify any check that the deployed actors implement this entry point


```rust
pub fn upgrade(params_id: u32, upgrade_info_id: u32) -> u32
jennijuju marked this conversation as resolved.
Show resolved Hide resolved
```

Parameters:
- `params_id`: An IPLD block handle provided by the caller and sent to the upgrade receiver, or `0` for none.
- `upgrade_info_id`: An IPLD block handle for an `UpgradeInfo` struct provided by the FVM runtime (defined below).

The single `u32` return value is an IPLD block handle, or `0` for none.

The `UpgradeInfo` struct is defined as follows:

```rust
#[derive(Clone, Debug, Copy, PartialEq, Eq, Serialize_tuple, Deserialize_tuple)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think these Rust codegen things are relevant to the FIP.

pub struct UpgradeInfo {
// the old code cid we are upgrading from
pub old_code_cid: Cid,
}
```

When a target actor's `upgrade` Wasm entrypoint is called, it can make necessary state tree changes from the calling if needed to its actor code. The `UpgradeInfo` struct provided by the FVM runtime can be used to check what code CID its upgrading from. A successful return from the `upgrade` entrypoint instructs the FVM that it should proceed with the upgrade. The target actor can reject the upgrade by calling `sdk::vm::exit()`` before returning from the upgrade entrypoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first sentence is a bit unclear

Suggested change
When a target actor's `upgrade` Wasm entrypoint is called, it can make necessary state tree changes from the calling if needed to its actor code. The `UpgradeInfo` struct provided by the FVM runtime can be used to check what code CID its upgrading from. A successful return from the `upgrade` entrypoint instructs the FVM that it should proceed with the upgrade. The target actor can reject the upgrade by calling `sdk::vm::exit()`` before returning from the upgrade entrypoint.
When the `upgrade` entry point is invoked in the new code, it can make any necessary state tree changes required by that new code. The `UpgradeInfo` struct provided by the FVM runtime can be used to check what code CID its upgrading from. A successful return from the `upgrade` entrypoint instructs the FVM that it should proceed with the upgrade. The new code can abort the upgrade operation by calling `sdk::vm::exit()`.


### New upgrade_actor syscall
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the entry point and syscall are presented in the opposite order to which they happen in during an upgrade, which makes grokking the flow a bit harder. I drew a completely wrong idea about the upgrade method until I finished reading the syscall spec.


We introduce a new `upgrade_actor` syscall which calls the `upgrade` Wasm entrypoint of the calling actor and then atomically replaces the code CID of the calling actor with the provided code CID, and returns the exit code and block of the return. It is defined as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm below specifies the opposite order of things: the code is changed then upgrade is invoked (in the new code). The opposite order makes sense, since the old code can't possibly know what state schema is required by the new code, while the new code can know both schemas.


```rust
pub fn upgrade_actor(
new_code_cid_off: *const u8,
params: u32,
) -> Result<Send>;
fridrik01 marked this conversation as resolved.
Show resolved Hide resolved
```

Parameters:
- `new_code_cid_off`: The code CID the calling actor should be replaced with.
- `params`: The IPLD block handle passed, or `0` for none.

The `Send` struct is defined as follows:

```rust
pub struct Send {
// exit code returned by the upgrade endpoint
pub exit_code: u32,
// the block id/codec/size returned by the upgrade endpoint, or 0 if no block was returned
pub return_id: BlockId,
pub return_codec: u64,
pub return_size: u32,
}
```

On successful upgrade, this syscall will not return. Instead, the current invocation will "complete" and the return value will be the block returned by the new code's `upgrade` endpoint. If the new code rejects the upgrade (calls `sdk::vm::exit()`) or performs an illegal operation, this syscall will return the exit code plus the error returned by the upgrade endpoint.

This syscall will:
1. Validate that the pointers passed to the syscall are in-bounds.
2. Validate that `new_code_cid_off` is a valid code CID.
3. Validate that the calling actor is not currently executing in "read-only" mode. If so, the syscall fails with a "ReadOnly" (13) syscall error.
4. Checks whether the calling actor is already on the call stack where it has previously been called on its `invoke` entrypoint (note that we allow calling `upgrade` recursively). If so, the syscall fails with a "Forbidden" (11) syscall error. For example if an actor A has a call stack `A (upgrade -> upgrade -> upgrade)` then that is allowed, while call stack `A -> B -> A (upgrade)` would be rejected.
5. Checks that we have space for storing the return block. If not, the syscall fails with a "LimitExceeded" (3) syscall error.
6. Start a new Call Manager transaction:
1. Validate that the calling actor has not been deleted. If so, the syscall fails with a "IllegalOperation" (2) syscall error.
2. Update the actor in the state tree with the new `new_code_cid` keeping the same `state`, `sequence` and `balance`.
3. Invoke the target actor's `upgrade` entrypoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be very clear which code is invoked here. Since the prior step changed the code CID, I'm assuming it's the new code, but "target actor" also makes it sound like the old code.

4. If the target actor does not implement the `upgrade` entrypoint, the syscall fails with a `ExitCode::SYS_INVALID_RECEIVER` exit code.
5. If the target actor aborts the `upgrade` entrypoint by calling `sdk::vm::exit()`, the syscall fails with the provided exit code.
7. Apply transaction, committing changes.
fridrik01 marked this conversation as resolved.
Show resolved Hide resolved
8. Abort the calling actor and return the IPLD block from the `upgrade` entrypoint.

## Design Rationale

### Additional metadata syscall

We considered adding a new `get_old_code_cid` syscall to get the calling actors code CID. That has the benefit of keeping the `upgrade` entrypoint signature consistent with the `invoke` signature. We however rejected that as we felt the benefit didn't outweigh the overhead of adding a new syscall. Furthermore, it did not provide the flexibility of passing in an IPLD handle for a `UpgradeInfo` struct where we can easily add more fields if required.

## Backwards Compatibility

Full backwards compatibility is expected.

## Test Cases

Detailed test cases are provided with the implementation.

## Security Considerations

Upgradable actors pose potential security risks, as users can replace deployed actors' code. However, measures are in place to minimize these risks:

- Upgradable actors are opt-in by default, ensuring no impact on currently deployed actors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand from the text above how this is enforced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the text above doesn't say this, but I now do understand. It's opt-in because an actor code must have some path that invokes the upgrade syscall in order to be upgradeable. The stuff about requiring the upgrade entrypoint still seems wrong, but the fact that actors can only upgrade themselves means that an actor can be determined to be non-upgradeable by lack of such a call (up to malicious obfuscation, etc).

I suggest this mechanism and property be spelled out more clearly.

- Actors can only upgrade themselves, preventing one actor from upgrading another actor to a new version.
- We reject re-entrant `upgrade_actor` syscalls, i.e., if some actor `A` is already on the call stack, no "deeper" instance of `A` should be able to call the upgrade syscall.

Detailed tests cover these security considerations and edge cases.

## Incentive Considerations

This FIP does not materially impact incentives in any way.
Copy link
Contributor

@kaitlin-beegle kaitlin-beegle Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? I'm curious how this change may impact the logic around timing and resource use for network upgrades, as well as the prioritization of certain changes.

If actors' code can be much more efficiently upgraded, shouldn't we technically be able to push more updates/small changes more quickly?

This change may not effect cryptoeconomics, but I might request we add a line about affecting the logic of what gets included for upgrades. Ultimately a small nit; the approval of the draft shouldn't be gated on this topic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaitlin-beegle I don't think I get why those are incentive considerations rather than product considerations.


## Product Considerations

This FIP makes it possible to upgrade deployed actors, for example in cases where a bug or security concern was identified in the deployed code, allowing a simple safe way to address such issues which significantly improves the user experience from how it is today.
jennijuju marked this conversation as resolved.
Show resolved Hide resolved

## Implementation

https://github.com/filecoin-project/ref-fvm/pull/1866

## TODO

N/A

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).