Skip to content

Markdown to SpecObject refactoring #4330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

leolara
Copy link
Member

@leolara leolara commented May 19, 2025

Summary

This pull request introduces a refactoring effort, focusing on modularizing and improving the maintainability of the specification parsing logic. The changes include updates to the Makefile, the addition of a new Python module (mk_to_spec.py). The primary goal is to replace the get_spec function with the new MarkdownToSpec class, ensuring functional equivalence during the transition phase.

Questions

  1. I see that we categorise classes if they have @dataclass in one category, and if they don't have @dataclass and inherit from Container in another one. Error in any other case. For the second category wouldn't it make sense to belong there if you inherit from Container? Regardless if you have @dataclass or not?
  2. In the current code, if a python block starts with a function or class but contains more stuff, all the source code gets indexed by the first element of the code block. With the new use of AST right now, it would be trivial to process each element independently, even if they are in the same code block. Should I do this? I think it gives more flexibility.
  3. I detected that in at least one case we were adding a dataclass to the dataclass dictionary with None as index. I assumed this was a bug and I fixed it.
  4. ConfigVars are typed as config_vars: Dict[str, VariableDefinition] but we are not using them that way, I guess I should change the typing to reflect how it is actually used?

Changes

1. New Module: mk_to_spec.py

  • Added a new Python module mk_to_spec.py under the pysetup directory.
  • This module encapsulates the logic for parsing markdown specification files and converting them into structured Python objects (SpecObject).
  • Full type hinting.
  • Use more extensive of AST library instead of string manipulation for parsing Python code.
  • Key features include:
    • Parsing markdown tables, code blocks, and headings.
    • Handling custom types, constants, presets, and configuration variables.
    • Support for list-of-records tables and HTML comments for skip logic.
    • Fixed a bug in the get_spec function where dataclass names were not being parsed correctly.

2. Makefile Updates

  • Modified the test target to include the tests/infra directory in the test suite.
  • Updated the clean target to only remove files that are generated by the build, and not remove all the files that are not being tracked.

3. Dependency Updates

  • Added deepdiff==8.5.0 to the pyproject.toml file for deep comparison of Python objects.

4. Enhancements to setup.py

  • Integrated the new MarkdownToSpec class from mk_to_spec.py.
  • Added a new function get_spec_new to utilize the MarkdownToSpec class for generating specification objects.
  • Updated the build_spec function to compare the new and old specification generation methods using DeepDiff to ensure consistency.

Refactoring Details

  • The MarkdownToSpec class is a refactored and enhanced version of the get_spec function, offering better modularity, maintainability, and feature support.
  • I have tried to make MarkdownToSpec identical in input/output to get_spec with the only difference that was previously adding classes with None index to the dataclass dictionary, and I think this was a bug.
  • The get_spec function remains in place temporarily to validate the functional equivalence of the new implementation.
  • The build_spec function includes assertions to compare the outputs of get_spec and MarkdownToSpec using DeepDiff.
  • Once the MarkdownToSpec class is reviewed and validated, the get_spec function will be removed, as well as auxiliary functions copied to pysetup/mk_to_spec.py and DeedDiff dependency.

Testing

  • Verified that the new MarkdownToSpec class produces consistent results with the existing get_spec function.
  • Added assertions in setup.py to ensure that the new and old methods produce identical outputs.
  • Updated the test suite to include the tests/infra directory. And added there test for the new MarkdownToSpec class.

Impact

  • Improves the maintainability of the codebase by modularizing the specification parsing logic.
  • Adds unit testing to this part of the code

@leolara leolara marked this pull request as ready for review May 20, 2025 12:30
Copy link
Member

@jtraglia jtraglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leolara, thanks for cleaning this up. I agree this is better than the current code. I've left some comments with various changes I'd like to see.

Comment on lines 45 to 57
def run(self) -> SpecObject:
"""
Orchestrates the parsing and processing of the markdown spec file.
- Calls _parse_document()
- Iterates over self.document.children and processes each child
- Calls _finalize_types() and _build_spec_object() after processing
Returns:
SpecObject: The constructed specification object.
"""
while (child := self._get_next_element()) is not None:
self._process_child(child)
self._finalize_types()
return self._build_spec_object()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of comments which just outline what the function does. Especially when it's wrong... For example, this function does not call _parse_document(). Also the list items are missing punctuation. Let's just simplify this (and remove the fancy "orchestrates" word, did AI write this function?). Maybe something like:

"""
Returns parsed markdown specification.
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it was the original implementation that was calling _parse_document, I will review the comments

Comment on lines 613 to 614
def _has_decorator(decorateable: ast.ClassDef | ast.FunctionDef, name: str) -> bool:
return any(_is_decorator(d, name) for d in decorateable.decorator_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function outside of the class? It's only used in the class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this is a good idea, thanks. Reminder to rename the file.

@jtraglia
Copy link
Member

jtraglia commented May 20, 2025

  • I see that we categorise classes if they have @dataclass in one category, and if they don't have @dataclass and inherit from Container in another one. Error in any other case. For the second category wouldn't it make sense to belong there if you inherit from Container? Regardless if you have @dataclass or not?

Hmm interesting question. I don't believe there should be a situation in which a class is both. Is there?

  • In the current code, if a python block starts with a function or class but contains more stuff, all the source code gets indexed by the first element of the code block. With the new use of AST right now, it would be trivial to process each element independently, even if they are in the same code block. Should I do this? I think it gives more flexibility.

Oh. Yes, it would be both fixing this. But in a follow up PR.

  • I detected that in at least one case we were adding a dataclass to the dataclass dictionary with None as index. I assumed this was a bug and I fixed it.

Was that the OptimisticStore dataclass? If so, I had been meaning to fix this. Thank you!

image
  • ConfigVars are typed as config_vars: Dict[str, VariableDefinition] but we are not using them that way, I guess I should change the typing to reflect how it is actually used?

I'm pretty sure this is due to our recent "list of records" support, right? Yeah, I'd either remove the type hint or correct it in some way; maybe Dict[str, Any] would be sufficient here. You can make that change here if you want.

@jtraglia
Copy link
Member

Hey @leolara, I've merged the following PR. Could you pull in these changes & run make lint?

@jtraglia
Copy link
Member

  • Updated the clean target to remove additional directories and files, such as venv, .mypy_cache, and pysetup/__pycache__.

Also regarding this, the current version of make clean will remove these. For example:

$ make clean
Removing .mypy_cache/
Removing build/
Removing eth2spec.egg-info/
Removing tests/core/pyspec/eth2spec/altair/
Removing tests/core/pyspec/eth2spec/bellatrix/
Removing tests/core/pyspec/eth2spec/capella/
Removing tests/core/pyspec/eth2spec/deneb/
Removing tests/core/pyspec/eth2spec/eip6800/
Removing tests/core/pyspec/eth2spec/eip7441/
Removing tests/core/pyspec/eth2spec/eip7732/mainnet.py
Removing tests/core/pyspec/eth2spec/eip7732/minimal.py
Removing tests/core/pyspec/eth2spec/eip7805/
Removing tests/core/pyspec/eth2spec/electra/
Removing tests/core/pyspec/eth2spec/fulu/
Removing tests/core/pyspec/eth2spec/phase0/
Removing venv/

@leolara
Copy link
Member Author

leolara commented May 21, 2025

  • Updated the clean target to only remove files that are generated by the build, and not remove all the files that are not being tracked.

I updated the comment because it was explained wrong:

Updated the clean target to only remove files that are generated by the build, and not remove all the files that are not being tracked.

@leolara leolara force-pushed the leolara/mk_to_spec-refactor branch from 5df98f3 to af4b2b4 Compare May 21, 2025 16:54
@jtraglia
Copy link
Member

Updated the clean target to only remove files that are generated by the build, and not remove all the files that are not being tracked.

Ah but this is intentional. I want this to happen 🙂

@leolara
Copy link
Member Author

leolara commented May 22, 2025

Current status:

  • Split the Makefile test line into two lines.
  • Sort the list of dependencies alphabetically in pyproject.toml.
  • Rename pysetup/mk_to_spec.py to md_to_spec.py to match the class and file purpose.
  • Sort dictionary initialization in MarkdownToSpec.
  • Run linting
  • Group imports at the top
  • Simplify the docstring for the run method in MarkdownToSpec
  • Remove extra blank lines at the end of setup.py

To be discussed:

  • Revert the change to the clean target in the Makefile.
  • Remove blank lines in pysetup/md_to_spec.py.
  • Use pseudo values in code examples for list-of-records (e.g., fruit table) instead of real ones.

After review:

  • Remove the DeepDiff check and dependency, remove the old get_spec function, and rename get_spec_new to get_spec in setup.py.

@leolara leolara closed this May 22, 2025
@leolara leolara reopened this May 22, 2025
@leolara
Copy link
Member Author

leolara commented May 22, 2025

  • I detected that in at least one case we were adding a dataclass to the dataclass dictionary with None as index. I assumed this was a bug and I fixed it.

Was that the OptimisticStore dataclass? If so, I had been meaning to fix this. Thank you!

Yes, that one, it is solved now because it uses the AST to get the name of the class

@leolara
Copy link
Member Author

leolara commented May 22, 2025

  • ConfigVars are typed as config_vars: Dict[str, VariableDefinition] but we are not using them that way, I guess I should change the typing to reflect how it is actually used?

I'm pretty sure this is due to our recent "list of records" support, right? Yeah, I'd either remove the type hint or correct it in some way; maybe Dict[str, Any] would be sufficient here. You can make that change here if you want.

I changed it to dict[str, str | list[dict[str, str]]] which correspond with the data. I think Any should be avoided if there is a posibility.

@leolara
Copy link
Member Author

leolara commented May 22, 2025

  • I see that we categorise classes if they have @dataclass in one category, and if they don't have @dataclass and inherit from Container in another one. Error in any other case. For the second category wouldn't it make sense to belong there if you inherit from Container? Regardless if you have @dataclass or not?

Hmm interesting question. I don't believe there should be a situation in which a class is both. Is there?

I understand there are two categories, what I am saying that perhaps the rule for which category could be simpler. Like Container is one category and not Container is another one, regardless of @dataclass. For example.

Currently the rule is:

  • Has @dataclass -> dataclass
  • Has not @dataclass and inherit Container -> szz_object
  • Any other class -> error

Should it be like this or is there a simplified version that doesn't have error? Like:

  • Inherit Container -> szz_class
  • otherwise -> class

@jtraglia
Copy link
Member

  • I see that we categorise classes if they have @dataclass in one category, and if they don't have @dataclass and inherit from Container in another one. Error in any other case. For the second category wouldn't it make sense to belong there if you inherit from Container? Regardless if you have @dataclass or not?

Hmm interesting question. I don't believe there should be a situation in which a class is both. Is there?

I understand there are two categories, what I am saying that perhaps the rule for which category could be simpler. Like Container is one category and not Container is another one, regardless of @dataclass.

There's sort of an implicit check here. I don't think we want to allow classes that aren't one of those categories. I get what you're suggesting, I'm just not sure if it's a good idea. Might be best to save this for later, so we can think more about it.

@leolara
Copy link
Member Author

leolara commented May 27, 2025

I think I have done everything you mentioned @jtraglia ; have another look and then I will remove the old implementation

@jtraglia
Copy link
Member

I think I have done everything you mentioned @jtraglia ; have another look and then I will remove the old implementation

Yup, it looks great 😄 let's proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants