Skip to content

Updated parsing of problem.yaml by using checks as described in the specification #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

Zazmuz
Copy link
Contributor

@Zazmuz Zazmuz commented Apr 10, 2025

Added checks for legacy and 2023-07 for parsing problem.yaml, making sure its correct and defaulting values.
Made sure to keep the code itself simple so that eventual changes and upkeep to it will be easy to go through with.
It has been tested on most PO problems from the last 5 years as well as islandic problems.

For the new format it has not been tested as much (due to lack of available configs), only a few edgecases and few self made configs.

If there is anything that looks weird or any questions I will be available.

Zazmuz added 7 commits April 10, 2025 10:36
…m.yaml.

Co-authored-by: square-cylinder <[email protected]>

Cleaned up and added more checks for problem.yaml

Co-authored-by: square-cylinder <[email protected]>

detect unknown fields

f-strings compatible with older python versions

fix bug
made it more obvious if old folder name is used, or folder is misspelled

made check error messages clearer for people newer to problemtools

closest word functionality

bugfix index error
readded comment stating weird it is

Name change and pytest fix
Merged conflict in hello test
Copy link
Contributor

@gkreitz gkreitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking at this PR, and I think it needs relatively large amount of work. I don't understand several of the approach changes from #294 at all (not trying to merge the data into a common structure, putting all code in verifyproblem instead of a separate module, not having any test cases).

I ran out of time while reviewing this, but setting this as request changes to not delay giving you the feedback I had for the part I had time to review.

@@ -262,7 +269,7 @@ def check(self, context: Context) -> bool:
self.check_size_limits(self.ansfile)
self._problem.getProblemPart(InputValidators).validate(self)
anssize = os.path.getsize(self.ansfile) / 1024.0 / 1024.0
outputlim = self._problem.get(ProblemConfig)['limits']['output']
outputlim = self._problem.get(ProblemConfigLegacy)['limits']['output']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing ProblemConfig to ProblemConfigLegacy like this feels dangerous, and makes a reader worry about how this works when reading in a new problem config. I'd advice at the very least signaling some intent here by accessing via ProblemConfigBase for fields that are the same for both formats.

A better solution would be to ensure the config as read by the rest of the program looks the same (has the same keys, and the same types) regardless of what version was used. I believe you did this in #294.

# Some deprecated properties are inherited from problem config during a transition period
problem_grading = problem.get(ProblemConfig)['grading']
problem_grading = problem.get(ProblemConfigLegacy)['grading']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with a new config file, which lacks the grading key? (This question applies to many places where you access keys which are named differently, or have different value types in the two versions.)

self.error(f'License needs to be one of: {self._VALID_LICENSES}')
if self._data['license'] == 'unknown':
self.warning("License is 'unknown'")
def fix_config_structure(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like ProblemConfigBase is an abstract base class, and that this is an abstract method? If so, consider inheriting from ABC and use the decorator @abstractmethod here.



def check(self, context: Context) -> bool:
if self._check_res is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you rewrite the memoization of _check_res this way? It looks like the memoization is now broken.

return self._check_res
elif self._check_res is not False:
self._check_res = True
to_check = [prop for prop in dir(self) if prop.startswith('_check_') and callable(getattr(self, prop))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of trying to find all methods with a special name and running them like this. Why was this approach taken? Do you end up with so many check methods that you can't cleanly list them?

self._check_res = True
to_check = [prop for prop in dir(self) if prop.startswith('_check_') and callable(getattr(self, prop))]
for prop in to_check:
self.info(f'Checking "{prop}"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal methods called should not be logged at info level.

return best

class ProblemConfigLegacy(ProblemConfigBase):
DEFAULT_LIMITS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults used to be specified in a problem.yaml config file (and user-configurable, as is documented in README.md). Does the fact that you list the defaults here imply that you dropped support for having the defaults configurable via config file (and user-configurable via a config file in the home directory)? If so, it feels like a big step backwards to hardcode these values rather than have them in a config file.


def fix_config_structure(self):
self._data.setdefault('problem_format_version', formatversion.VERSION_LEGACY)
if self._data.get('problem_format_version') != formatversion.VERSION_LEGACY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks (checking that something is one of a few allowed values) get very repetitive. It looks like you ened a helper function, something like
check_allowed_values(self, key: str, values: list, error_fn = self.error)


if 'name' in self._data:
val = self._data['name']
if type(val) is not str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, here, you should have a check_type() function to make the code less repetitive.

Alternatively, both such check functions can probably be implemented using json-schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants