Skip to content

New config system #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: develop
Choose a base branch
from

Conversation

square-cylinder
Copy link
Contributor

New system for problem config

A new system for handling loading of configs has been implemented. For a specification document of how it works, see problemtools/config_parser/spec.md. The overview is that you can now create a yaml file that will specify all types, defaults, and some requirements for a problem-format. This system was inspired by json-schemas for verifying the structure of json documents, although pretty specialized for configs for problem-formats. The thought is that this system will also later be used to handle the testdata.yaml and so on later on.

Many aspects of the system have been verified with tests, and for legacy we have extensively verified problems from many competitions to see that it works. Some more testing will be required when it comes to the new format however, as the only real config that has been loaded is a very minimal one.

I think all the checks that happened previously still happen for the legacy format, and every type and structure is now being tested which I don't think was the case previously.

Code overview

I will present a brief overview of the different parts that make up the new system.

Metadata class

This class represents a config that can be loaded. The constructor takes one specification dictionary, and the functions that are supposed to be used are load_config which basically take the data and some data to "inject". This means that the data will be thrown into the loaded config which is meant to be use for copying standard values. Also a check_config function is supposed to be used to detect some additional errors like incompatible alternatives.

Path class

This class is meant to be used to index multiple layers of dictionaries and lists. Maybe some dependency could be added to handle this functionality. These functions were just implemented as pretty basic stuff is required.

AlternativeMatcher class

It was required that we could match strings, ints, floats and bools for various purposes by providing strings. How these match-strings look is described in the spec.md. AlternativeMatch is a baseclass for all the matcher classes, and provides a factory to get the corresponding one based on a type string.

Parser class

To handle many peculiarities of the problem format, custom parsing was added, so a lot of parsers were implemented for different things to get them into a easier-to-work-with format. For example legacy validation is a string with space-separated keywords, which would be much easier to work with if it was an object with the corresponding properties as bools, so a parser "legacy-validation" converts this string to this object representation. Also for weird resolutions of copying values like rights_holder a parsing-rule looked at if the license was public domain and if so did not copy this value from author/source (if not specified) otherwise it does. Parsing rules are pretty powerful as they can list what paths should be resolved before they are enacted. The parser class provides a factory to get the corresponding parsing rule based on its name and type.

Quirks of specification

Because the specification has some quirks like discussed in section about parsers, we took some liberties to convert everything to a more standardized way after loading in the config. This does not mean it will misinterpret the config as it is specified, but rather that after the config is loaded it might not look exactly the same as in the specification. For example for "grading" in the legacy format (which was deprecated from legacy but should stay I was told), the format specified some floats as type "string", which we just interpret as floats instead. Also, after having parsed the config the layout of the config will be entirely standardized, and a given path into the dictionary will always give the same type no matter the input data. All values should be defaulted to something after config is loaded, so as long as the path exists, it should not give a indexing-error.

square-cylinder and others added 30 commits March 25, 2025 21:59
standardise casing on bools
Co-authored-by: Zazmuz <[email protected]>
Co-authored-by: Zazmuz <[email protected]>
Co-authored-by: Zazmuz <[email protected]>
@Zazmuz
Copy link
Contributor

Zazmuz commented Mar 25, 2025

This might be a lot to bite into, @gkreitz, if you wonder anything along the way, just ask us, we are happy to help!

@gkreitz
Copy link
Contributor

gkreitz commented Mar 26, 2025

We've had a slack discussion on the general approach taken here. I think the conclusion is that a different approach using json-schemas will be attempted, to see if that leads to a simpler code base. So we'll "pause" this PR until we see how that plays out.

Thanks @sjoqvist for having written an example schema in #285. Its existence was helpful in informing our discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants