-
Notifications
You must be signed in to change notification settings - Fork 551
Description
I've encountered a recurring parsing error (KeyError: ';' or KeyError: '<point_name>;') when defining problems in my_problems.txt for AlphaGeometry, run via Google Colab. The error suggests that the parser in problem.py (specifically around Clause.from_txt and Construction.from_string during the translate method) sometimes misinterprets a semicolon or a point name immediately followed by a semicolon as part of a point's identifier, leading to a failed lookup in the mapping dictionary.
Problem Details:
The issue manifests even when the problem definition string appears syntactically correct according to the AlphaGeometry input format. We've confirmed the following:
The my_problems.txt file is formatted correctly with the problem name on the first line and the definition string on the second. This was verified using !cat in the Colab environment.
Semicolons are used to separate constructions, and a semicolon precedes the final goal statement (e.g., ...; m = midpoint m p q; ? coll a b m).
The point names themselves do not intrinsically contain semicolons.
Example of a problematic (or previously problematic) line structure where errors like KeyError: ';' or KeyError: 'q;' occurred:
problem_name_here
a b = segment a b; k1 = free k1; c = on_circum c a b k1; o1 = circumcenter o1 a b c; p = intersection_tt p b o1 b c o1 c; k2 = free k2; o2 = circumcenter o2 a b k2; c_refl = reflect c_refl c a b; d = intersection_lc d a c_refl o2 a; q = intersection_tt q b o2 b d o2 d; m = midpoint m p q; ? coll a b m
The error often pointed to the last few constructions or the transition to the goal statement.
Key Observation & Sensitivity:
The most puzzling aspect is the intermittent nature and high sensitivity of this parsing error.
A problem definition string that was previously working might suddenly start throwing this KeyError after minor, seemingly innocuous edits (even if those edits are later reverted).
Conversely, a string that was causing the error might start working after some edits, or by reverting to a byte-for-byte identical previous version of the string that was known to work.
This suggests the parser might be sensitive to:
Extremely subtle variations in whitespace (e.g., different types of spaces, or trailing spaces that are hard to see).
Non-printing characters potentially introduced during copy-pasting or editing within the Google Colab %%writefile cell.
The specific sequence of characters around semicolons when splitting constructions or arguments.
Debugging Attempts:
We tried various permutations:
Ensuring strict two-line formatting for %%writefile.
Adding/removing spaces around semicolons and point names.
Verifying file content with !cat.
Simplifying the problem to a minimal case (which also sometimes exhibited similar parsing fragility).
The fact that reverting to a "known good" version of the exact same problem string can resolve the error implies the parser's state or its interpretation of the input string can be easily disrupted by very minor, hard-to-detect string differences.
Expected Behavior:
The parser in problem.py should be robust to common whitespace variations around valid tokens (point names, constructor names, operators like =, and delimiters like ;). It should not interpret a correctly placed semicolon delimiter as part of a point's name.
Request:
Could the parsing logic in problem.py (particularly how it splits by ; and then parses individual constructions and their arguments) be reviewed for potential brittleness? It might be beneficial to:
Ensure more robust trimming of whitespace around tokens.
Explicitly check for and handle or warn about unexpected characters within or adjacent to identifiers/delimiters.
Provide more detailed parsing error messages that pinpoint the exact character position or token causing the issue, rather than just the KeyError.
This issue can be a significant time sink during problem definition, as it's hard to diagnose when the input looks perfectly valid.
Environment:
AlphaGeometry (latest commit from GitHub, or specify if using a particular version)
Google Colab
Python 3.10 (as per Colab default)