-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(consume): allow geth to validate eof test vectors via direct (#1232) #1246
base: main
Are you sure you want to change the base?
Conversation
52140ef
to
d08cdd7
Compare
0724176
to
5e0bd80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @felix314159!
This needs to be tested against some test fixtures that are expected to fail, as
INFO [02-25|10:26:56.473] Executed tests passed=3 "total executed"=4
still leads to all fixtures in the file passing the test.
Two (obvious) comments:
- We're forced into hacky methods as the
eofparse
tool doesn't provide any structured output. Perhaps we should add this to geth'seofparse
instead of writing brittle code in EEST. I looked at Besu's tool yesterday and they do return structured output. We could adopt this:
{
"fileName": "/home/dtopz/.cache/ethereum-execution-spec-tests/cached_downloads/v4.0.0/fixtures_eip7692/fixtures/eof_tests/osaka/eip7692_eof_v1/eip3540_eof_v1/eof_example/eof_example_custom_fields.json",
"group": "tests/osaka/eip7692_eof_v1/eip3540_eof_v1/test_eof_example.py::test_eof_example_custom_fields[fork_Osaka-eof_test]",
"name": "0",
"fork": "Osaka",
"pass": true,
"expectedError": null,
"actualError": null
}
- Regardless of whether we improve
eofparse
or not, better testing ofconsume direct
is a must. We can usepytester
-level tests (which can test the behavior of entire pytest test sessions) to catch fails like the one above. I can start adding some in a separate PR.
Let's discuss 1. in our meeting today.
): | ||
"""Geth's implementation of the fixture consumer.""" | ||
|
||
# ------------------------------------ state test --------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to remove these separators. Perhaps we need to organize the code into more logical groups if this class is getting out of hand.
# ------------------------------------ state test --------------------------------------------- |
): | ||
"""Geth's implementation of the fixture consumer.""" | ||
|
||
# ------------------------------------ state test --------------------------------------------- | ||
@cache # noqa: B019 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory leaks should not be a problem for consume
since we have a single instance per run as you mention, and also consume
is not a long-running process anyway, so it should be ok for the sake of code simplicity.
@cache # noqa | ||
def consume_state_test_file( | ||
# ------------------------------------ eof test ----------------------------------------------- | ||
def extract_int_from_subprocess_string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
def extract_int_from_subprocess_string( | |
def extract_test_count_from_eofparse_summary( |
def extract_int_from_subprocess_string( | ||
self, s: str, substring_of_interest_prefix: str | ||
) -> int | None: | ||
"""Hacky method to extract relevant substring from string returned by subprocess execution.""" # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically one-liners should be concise and fit within our ruff-line-length (99). If clarification is needed, then it should be added as a separate block below. I think this is easy to fix in this case :)
"""Hacky method to extract relevant substring from string returned by subprocess execution.""" # noqa: E501 | |
"""Hacky method to extract test counts from `eofparse` console output.""" |
) -> int | None: | ||
"""Hacky method to extract relevant substring from string returned by subprocess execution.""" # noqa: E501 | ||
# get everything after this substring occurrence, then remove everything not in [0,9] | ||
relevant_substring = re.sub(r"\D", "", s.split('total executed"=', 1)[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to raise an exception if the Executed tests
line isn't found in the eofparse
output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function could be made more explicit by re-writing it to match 2 regex groups, one for the "passed" and one for the "total" count and return them both in a single call.
all_tests_passed: bool = False | ||
if total_tests_amount is not None and passed_tests_amount is not None: | ||
if total_tests_amount == passed_tests_amount: | ||
all_tests_passed = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will result in fixture file level pass/fail resolution and not give a detailed per test case (test fixture) resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, but looking good overall 👍
): | ||
"""Geth's implementation of the fixture consumer.""" | ||
|
||
# ------------------------------------ state test --------------------------------------------- | ||
@cache # noqa: B019 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the memory leaks should not be a problem for consume
since we have a single instance per run as you mention, and also consume
is not a long-running process anyway, so it should be ok for the sake of code simplicity.
) -> int | None: | ||
"""Hacky method to extract relevant substring from string returned by subprocess execution.""" # noqa: E501 | ||
# get everything after this substring occurrence, then remove everything not in [0,9] | ||
relevant_substring = re.sub(r"\D", "", s.split('total executed"=', 1)[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be:
relevant_substring = re.sub(r"\D", "", s.split('total executed"=', 1)[1]) | |
relevant_substring = re.sub(r"\D", "", s.split(substring_of_interest_prefix, 1)[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is failing after the change for me:
For s
equal to 'INFO [03-17|17:33:22.741] Executed tests passed=4 "total executed"=4'
, relevant_substring
becomes 44
when substring_of_interest_prefix=='tests passed='
.
I think this function can be replaced with:
m = re.search(r'tests passed=(\d+) "total executed"=(\d+)', s)
and then tests passed is int(m.group(1))
and tests executed is int(m.group(2))
.
def extract_int_from_subprocess_string( | ||
self, s: str, substring_of_interest_prefix: str | ||
) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def extract_int_from_subprocess_string( | |
self, s: str, substring_of_interest_prefix: str | |
) -> int | None: | |
@staticmethod | |
def extract_int_from_subprocess_string( | |
s: str, substring_of_interest_prefix: str | |
) -> int | None: |
This should be static since it doesn't access any instance properties, then you can call it with GethFixtureConsumer.extract_int_from_subprocess_string
.
def eofparse_subprocess_output_to_dict( | ||
self, process_result: str, fixture_name: str | None, fork_name: str | ||
) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def eofparse_subprocess_output_to_dict( | |
self, process_result: str, fixture_name: str | None, fork_name: str | |
) -> Dict: | |
@staticmethod | |
def eofparse_subprocess_output_to_dict( | |
process_result: str, fixture_name: str | None, fork_name: str | |
) -> Dict: |
Also can be an static method, because self.extract_int_from_subprocess_string
can become GethFixtureConsumer.extract_int_from_subprocess_string
if that method is also static, and then to call this one we would do GethFixtureConsumer.eofparse_subprocess_output_to_dict
.
🗒️ Description
Geth supports eofparse and can be used to validate a given eof test (code valid yes/no without running the code). The command I used for fill was
uv run fill -vv -m "not slow" -m eof_test --fork=osaka --evm-bin=$HOME/Documents/evmone/build/bin/evmone-t8n
and the consume command I tested the code with wasuv run consume direct --input=./fixtures/eof_tests --bin=$HOME/Documents/go-ethereum/build/bin/evm -s
(I added-s
to see the output of some print statements I put intest_via_direct.py
to see more detailed info about what is going on). It shows all 3895 tests passed in less than a minute, lmk what can be improved. Next I would try to do the same for consume direct via evmone.🔗 Related Issues
✅ Checklist
mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.