-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Introduce subassembly offset output artifact #15710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Introduce subassembly offset output artifact #15710
Conversation
d086ad9
to
e4f8896
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some bugs, some missing parts and the overall implementation could be done more robustly.
I'm also not sure if the whole design actually accomplishes our goal. Only giving sub locations may not be enough to locate metadata without heuristics because data objects are not placed in separate subs at evmasm level.
See comments below for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the feature to the CLI too. We should keep them at parity (and it's a pain for testing/development if the only way to access a feature is through StandardJSON).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Yul compilation is not covered. Aside from general inconsistency, this makes two-step compilation less powerful, which may be a problem for future parallelization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be implemented for EOF as well (i.e. in assembleEOF()
, or, if possible, just in assemble()
covering both with the same code).
Since we're at the stage where EOF is passing semantic tests (and they will be enabled by default quite soon), we should start requiring all new features for work on EOF as well.
libevmasm/Assembly.cpp
Outdated
{ | ||
for (auto& subAssembly: _subAssemblies) | ||
{ | ||
subAssembly.start = _currentBytecodeSize - subAssembly.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm either missing something or this assumes that all subassemblies overlap and extend to the end of their parent assembly. Does it even work with more than one subassembly? If you have two subassemblies of the same length then you will end up with the same start location for both. And even if they're of different lengths it will be wrong.
EDIT: Yeah, you even have a test showing this overlap (standard_subassembly_offsets/
):
{
"isCreation": false,
"length": 780,
"start": 1007
},
{
"isCreation": true,
"length": 130,
"start": 1657,
"subs": [
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also have some asserts here. At the very least that a subassembly does not stick outside of its parent assembly.
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": true, | ||
"length": 130, | ||
"start": 0, | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
"length": 104, | ||
"start": 26 | ||
} | ||
] | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is Sourcify ok with getting only subassembly locations? I've always been thinking about metadata as a separate subassembly myself, because it's separate data
Object
in Yul, but looking at the PR I remembered that it's actually not the case at evmasm level. Metadata goes into Assembly::m_auxiliaryData
and logically becomes a part of each assembly's bytecode, not a separate object. It's simply appended after all the subs and other data. I think that due to this it will still be necessary to use heuristics to fish out its location within the assembly, even knowing location of all subassemblies.
I think we may need to include the start and length of Assembly::m_auxiliaryData
separately for each sub (when it's non-empty). For completeness we may want to simply list all the data chunks (that would actually have been nice to have for compiler debugging in some cases).
CC @kuzdogan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we should have some tests that include data
objects between assemblies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we had the exact location of metadata, we could create a more robust, Boost-based test that would get the structure info and the bytecode and diff the CBOR bit. It's not easy to spot a problem just looking at the values in command-line tests and we're not even shown the bytecode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this, I'll have a look at this tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked about it later with @nikola-matic and he said that the justification for not adding data locations was that Sourcify has no problem detecting metadata at the top level and it's only the nested contracts that cause problems so adding information about where whole contracts start and end is enough to extend the existing mechanism to cover them.
I still think we should include the exact location though. We do have it and it's very easy to add, I see no reason to force tools to use heuristics to find it.
One thing we agreed on though was that the info about data locations does not have to be a part of this very PR. This will still be a working feature without it and the extra fields can be added on top of it as an extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cameel Isn't it safe to assume the CBOR will be at the end of all of the assemblies that are not creation: true
? So we can just look at all of them, get the last two bytes and decode. So it's technically not a heuristic but a rule?
Of course I wouldn't say no to this and it would make our lives easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for pointing that out. I completely forgot that CBOR is not the only thing there and that we actually also add the length ourselves. You're right. With that the check should be reliable, at least when we're talking about the contracts you compile yourself and can be sure that the metadata is supposed to be present.
Ok then, I guess it's not strictly necessary for Sourcify in that case.
I still don't see much downside in providing that information though :)
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": false, | ||
"length": 87, | ||
"start": 0 | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input file has two assemblies but the output shows only one. Is this a bug in your feature or an instance of #15725 (because the nested assembly is unreferenced)?
"subAssemblyOffsets": { | ||
"subs": [ | ||
{ | ||
"isCreation": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I wonder why this isn't true
. Does exporting and reimporting asm JSON lose the creation status or is it just because of how the artificial input was crafted? If the status gets lost, this would be a bug (and should be reported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we also need some coverage for optimized compilation.
Some more thoughts after today's call:
|
939e863
to
a624455
Compare
Another thing missing here are docs. The new option should at the very least be mentioned in the Standard JSON input description. It would also be good to have a paragraph on the page about Metadata explaining that nested contracts can have metadata as well and that the output of this option (combined with the length marker at the end of metadata) can be used to locate it. |
6726747
to
8733981
Compare
21e2558
to
1eebef2
Compare
1af3027
to
218f4d7
Compare
This pull request is stale because it has been open for 14 days with no activity. |
c3c04f6
to
8846e75
Compare
@@ -137,6 +137,7 @@ static std::string const g_strSrcMapRuntime = "srcmap-runtime"; | |||
static std::string const g_strStorageLayout = "storage-layout"; | |||
static std::string const g_strTransientStorageLayout = "transient-storage-layout"; | |||
static std::string const g_strVersion = "version"; | |||
static std::string const g_strAssemblyStructure = "assembly-structure"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: options are in alphabetical order here.
@@ -89,6 +89,15 @@ struct LinkerObject | |||
/// Bytecode offsets of named tags like function entry points. | |||
std::map<std::string, FunctionDebugData> functionDebugData; | |||
|
|||
struct Structure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe AssemblyStructure
is better?
Resolves #14827.
bytecode
string fromSubAssembly
isCreation
is preserved during import