Skip to content

Fix array assignment and deletion at storage slot overflow boundary #15984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

r0qs
Copy link
Member

@r0qs r0qs commented Apr 7, 2025

Fixes #15587

@r0qs r0qs marked this pull request as draft April 7, 2025 09:37
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 2 times, most recently from cbd97cc to b14ee0a Compare April 7, 2025 10:02
@r0qs r0qs added this to the 0.8.30 milestone Apr 7, 2025
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 10 times, most recently from 550b207 to 1614491 Compare April 11, 2025 14:37
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from 1614491 to 0d84691 Compare April 14, 2025 15:34
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of review comments. Mostly about making sure the test coverage is adequate.

@cameel

This comment was marked as resolved.

@r0qs r0qs changed the title Storage overflow delete bug Fix array assignment and deletion at storage slot overflow boundary Apr 23, 2025
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch 3 times, most recently from b57d7a2 to 47471c6 Compare April 23, 2025 20:47
Comment on lines 670 to +654
if (_type.storageStride() < 32)
clearStorageLoop(TypeProvider::uint256());
clearStorageLoop(TypeProvider::uint256(), /* _canOverflow */ false);
else
clearStorageLoop(_type.baseType());
clearStorageLoop(_type.baseType(), /* _canOverflow */ false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the docs for delete, it does not even say that the elements of dynamic arrays are necessarily zeroed. It only says this:

delete a assigns the initial value for the type to a. I.e. for integers it is equivalent to a = 0, but it can also be used on arrays, where it assigns a dynamic array of length zero or a static array of the same length with all elements set to their initial value.

The way it's written, it rather sounds like it's equivalent to an assignment and does not to reset the elements. It's an important detail because in storage this cleanup is expensive and the user should be aware of it.

It's not strictly related to the bug (other than one could argue that any change related to dynamic arrays is not a bug because of this because cleanup is not a documented feature), but we should take the opportunity to correct that and state that the elements get cleared. Either here or in a separate PR.

@clonker clonker force-pushed the storageOverflowDeleteBug branch from 7b9d321 to 459e0a0 Compare May 6, 2025 15:29
@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from 459e0a0 to c724157 Compare May 6, 2025 15:33
Comment on lines 291 to +298
// stack: target_ref target_data_end target_data_pos_updated
if (targetBaseType->storageBytes() < 32)
utils.clearStorageLoop(TypeProvider::uint256());
{
if (!targetType->isDynamicallySized() && !sourceType->isDynamicallySized())
if (auto const* targetArrayType = dynamic_cast<ArrayType const*>(targetType))
{
auto const* sourceArrayType = dynamic_cast<ArrayType const*>(sourceType);
solAssert(sourceArrayType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bit added here because it is not guaranteed that when there's nothing to clear the position is equal to the end? If so, it needs a comment providing that context. Otherwise it looks just like an optimization (which could be done just as well in a separate PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole bit is added so that we don't end up with a target position that is bigger than the target end position, which can happen for packed arrays. If that were to happen, we'd run into out of gas because the EQ check in the clearing loop never takes. It also i an optimization in that sense because we can skip the clearing in such cases...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put that in a comment.

@r0qs r0qs force-pushed the storageOverflowDeleteBug branch from c724157 to 935a52f Compare May 6, 2025 16:06
clonker and others added 4 commits May 6, 2025 18:46
Also sometimes cleanup can be skipped when copying from a smaller source array into a larger target array
@clonker clonker force-pushed the storageOverflowDeleteBug branch from 935a52f to ba4853a Compare May 6, 2025 16:54
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found anything outright wrong so far, but I did not manage to review it to the extent that would make me confident enough there are no issues either. The PR is small but it requires digging into the code that surrounds the changes and thinking about a lot of corner cases. For now I'm not comfortable approving it and including it in tomorrow's release. If @ekpyron did, I'd probably be fine with it, but if I have to make the call myself, I'll need more time with the PR.

EDIT: Wait, actually one thing is wrong: #15984 (comment).

What I have now are just some general suggestions. Especially one regarding the abstractions we have around cleanup - maybe we'd be better off removing them and just making the cases that mess with EQ impossible? It seems to me that StorageItem::setToZero() is forcing us to make the code more defensive than it has to be.

auto const* sourceArrayType = dynamic_cast<ArrayType const*>(sourceType);
solAssert(sourceArrayType);

auto const numSourceItemsPerSlot = 32 / sourceArrayType->baseType()->storageBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert that storageBytes() is <= 32. Otherwise this code may not fail and instead just give weird results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it guaranteed that baseType() here is a value type and not e.g. another static array type? Note that for the latter storageBytes() will give you 32, not the actual size of the inner type in bytes.

This function is pretty badly named TBH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the assert is a good thing. How does that work for multidimensional static arrays actually. Each element in the outer array is then a pointer to an inner array or is it stored contiguously?


auto const numSourceItemsPerSlot = 32 / sourceArrayType->baseType()->storageBytes();
auto const paddedLength = numSourceItemsPerSlot * sourceArrayType->storageSize();
auto const paddedTargetType = TypeProvider::array(targetArrayType->location(), targetArrayType->baseType(), paddedLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the padding is necessary here. Wouldn't you get the same storageSize() with less hassle if you used sourceArrayType->length() instead of paddedLength?

Copy link
Member

@clonker clonker May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that:

  • we have a copying loop over source array that loops while source_pos <= source_end,
  • in case of a packed source array, there is byte offsets inside each slot and we call incrementByteOffset on it, which essentially increases a byte offset slot until it exceeds 32 bytes, in which case it is set back to zero and the source_pos is incremented
  • due to that, if there is padding in the last slot of the source array, we loop beyond the actual source end item and 'copy' also whatever is left in that slot over to the target
  • this can lead to situations in which the target_pos exceeds the target_end which then leads to problems with the EQ check :)

So here the check is:

  • let source_type[source_len] be the source array type
  • let target_type[target_len] be the target array type
  • if nslots(target_type[source_len + padding]) > nslots(target_type[target_len]), we abort because we exceed the final slot of the target array

Just taking the source array length doesn't help because the length is (in case of packing) with respect to the actual length but what we need is the position in terms of slots. Which can be vastly different in cases where the source type is tightly packed and the target type isn't.

Example:
Take bytes1 as source type and bytes31 as target type. In that case if, supposedly, the last source slot only has one element, we will invoke the copy loop 31 more times. That potentially shifts the end position 31 slots beyond the target array's length. If the target array is long enough to begin with and accommodates for this offset then sure, fine, but that's not a guarantee. You could also copy bytes1[1] into bytes31[1].

return;
}
}
utils.clearStorageLoop(TypeProvider::uint256(), !targetType->isDynamicallySized());
Copy link
Member

@cameel cameel May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have some comments explaining why we are excluding dynamic arrays.

I guess the idea is that getting one to land exactly at the storage boundary when it's small enough to be cleared/overwritten without going out of gas is statistically impossible. And that a typical contract with storage variables at 0 already strongly depends on it not happening, so there is no additional harm in assuming that it won't ever happen.

I'm still wondering if it wouldn't be better to just use EQ for them where we can. E.g. here we clear slot by slot anyway so we're not risking any issues with it. The only downside I guess would be that we'd not be using it in all cases and the behavior would be inconsistent?

Comment on lines +815 to +818
if (_canOverflow)
_context << Instruction::EQ;
else
_context << Instruction::GT << Instruction::ISZERO;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH the fact that we have to assume whether an array can wrap around bothers me a bit. It's easy to make a mistake. It seems to me that EQ is the right choice in all cases and resorting to GT sometimes is a weird quirk.

If I understand the motivation correctly, we're doing it like this because there's a risk that we won't hit the end exactly when array elements take more than one slot. But why don't we just always clear slot by slot? We use StorageItem::setToZero(), which abstracts the cleanup, but this abstraction seems to be driving us to this weird solution, so maybe we could just drop it? In the end there is no situation where clearing means anything other than zeroing slot by slot. This abstraction is already a fiction to some extent, because for things smaller than one slot we use TypeProvider::uint256(), i.e. we already assume that you can just clear the whole slot and there is no type-specific behavior that needs to be accounted for.

@ekpyron What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we call these functions for array resizes, no matter whether the array is lengthened or shortened, so we can call this with start > end (when lengthening an array) intentionally, expecting the clearing to be a no-op due to this comparison (while with an eq it'd instead run amok and attempt to clear almost all of storage). Conversely, that situation presumably only happens in cases (we need to strongly make sure that's true), in which the <= comparison cannot lead to bugs. Now we could add another check before calling into this, e.g. to distinguish lengthening from shortening arrays, and only if needed call into here (and when eq is fine). This will very likely incur additional cost in otherwise safe cases, though (and in cases that'll actually happen in real-world code) - it may be worthwhile to check how much cost and consider whether it's the safer option after all - but that's the reason for the current version to distinguish between those cases here.

Comment on lines 75 to +79
/// Appends a loop that clears a sequence of storage slots of the given type (excluding end).
/// @param _canOverflow whether the storage is treated as circular when clearing.
/// Stack pre: end_ref start_ref
/// Stack post: end_ref
void clearStorageLoop(Type const* _type) const;
void clearStorageLoop(Type const* _type, bool _canOverflow) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit off-topic, but reviewing this PR made me realize what the reason for the limitation from #15911 may be. For a type that takes up whole storage it is not possible to store its size in u256/uint. It would wrap around to zero. Similarly in the start-end representation we have here we'd end up with an empty range - an attempt to clear an array of this size would unexpectedly not run out of gas because it would not clear anything.

/// Stack pre: end_ref start_ref
/// Stack post: end_ref
void clearStorageLoop(Type const* _type) const;
void clearStorageLoop(Type const* _type, bool _canOverflow) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more concrete suggestions for the bug description and naming later, but one thing I'd like to point out already is that "overflow" sounds a bit ambiguous to me in the context of this bug. It could mean many things. I think that "wrap around" would generally be clearer.

@ekpyron
Copy link
Member

ekpyron commented May 7, 2025

If @ekpyron did, I'd probably be fine with it, but if I have to make the call myself, I'll need more time with the PR.

Take your time to make sure this is sound - it's rather delicate and we should make sure that we don't introduce new (and if we're unlucky even more likely) issues here in the attempt to fix an old corner-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent Treatment of Storage Arrays on the Slot Overflow Boundary
4 participants