Skip to content

Expressions as Countable #13218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 1, 2025
Merged

Conversation

SomeTroglodyte
Copy link
Collaborator

...Not ChatGPT generated...

;
abstract fun isOK(strict: Boolean): Boolean
companion object {
operator fun invoke(bool: Boolean?) = when(bool) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just call this 'from' without operator overloading.
When you're young it's nice to experiment, it's fine to be operator-curious, but I think we're both past that phase :)

@yairm210
Copy link
Owner

And why, indeed, NOT use Keval?

@SomeTroglodyte
Copy link
Collaborator Author

And why, indeed, NOT use Keval?

Already mentioned some reasons. Foremost maybe: the tokenizer isn't customizable when consumed as library and won't suit our needs. Well, maybe it is by bypassing their String.toAST... No, even more importantly is we need domain lookup for our countables, and Keval is, as I said, highly customizable only in its Grammar - with builders for Constants, Unary operators, Binary operators and functions - and none fits. Constants need their result up-front, while functions need the identifier(vararg term) syntax. So the notation to fetch a Countable, done with the unmodified library, could be countable([Your] Cities) - not [[Your] Cities]. Most elegant we could do would be to fork it and provide a sibling to KevalConstant such as internal data class KevalDynamicValue(val delegate: (context: ?) -> Double) : KevalOperator, but then the problem immediately presents itself how to pass context through - Keval doesn't cater for one. Same problem for the function approach. Drill it open to add that concept, and what you get is more complex than this.

@yairm210
Copy link
Owner

For once we have a need that is actually generic, and even for that we can't use a generic solution?
Changing countable to a function call sounds A-OK to me, certainly less work than rolling our own parser, no?

@yairm210
Copy link
Owner

I mean with getplaceholders + placeholdertext, generating a countable() version of the user text sounds simple to me
The question is, does Keval accept string functions, or is it all numbers...

@SomeTroglodyte
Copy link
Collaborator Author

less work than rolling our own parser

Except it's done and already passes quite a load of unit tests

@yairm210
Copy link
Owner

That doesn't super help when what I'm trying to avoid is having to maintain this in the future.
Sounds to me like Keval, which does accept multi-input functions, is a better fit.
But even more so, if exp4j DOES allow for functions

At times like this I have to ask myself, how did we get here? I thought I was making a strategy game, and now I need to maintain a tokenizer/parser for generic expressions?
Why the hell are we allowing Pi and e if we want to get countables, which are Ints?
And why on earth would we want sin and cos? Surely if there were ANY function a modder would want, it would be min or max between two ints?

Most of all, this sounds like solving a solved problem.
If we take exp4j as an example, it actually solves all the problems - allows parsing prior to eval, setting variables, etc.
I don't care that it's unmaintained, it solves the problem we have.

And if not? We should be making a new library. Or changing an existing one - make Keval accept parse/eval distinctions, make it accept parameters.
The "worst of both worlds" is one where I have all the hassle of figuring out modder's problems, but even if I put in the work it only helps people with this specific demand modding this game. That's not great open source.

I think the root of Keval's problems is that it doesn't make the parse/eval distinction (outside - inside it exists). That's why it only allows constants (there are no runtime parameters since there is no runtime separate from parsing). The author is active so this definitely seems doable to me.

@yairm210
Copy link
Owner

Another thing real parsers have is "where is the problem", start to finish. I see Keval has "position" which is a start (ba-dum tish). This is important since we need to tell the user "your problem is HERE--> <--" and that requires positions. Start position will let you insert a ❌ emoji at the location, which is Good Enough ™️

@SomeTroglodyte
Copy link
Collaborator Author

🤷

@yairm210
Copy link
Owner

It's 2 in the morning here, maybe I'm overreacting. I'll see what I think of this tomorrow. Maybe we can treat this as just a fun coding exercise, in which case we can do what we want but modders should be aware this is breakable

@yairm210 yairm210 reopened this Apr 16, 2025
@SomeTroglodyte
Copy link
Collaborator Author

overreacting

No I get it. Without checking it out and playing a bit with it (e.g. setting breakpoints in the unit tests and inspecting an AST) it would be hard to get a real feel for it.

And the maintainability burden is of course very real, which is why it must be readable, and once adopted it should be possible to treat is as "black box", meaning it works, period, and changes in other parts of the project shouldn't be able to break it - or at least any cross-influence should be caught by the tests. In that, I'm pretty confident, though of course you're right about the "fun coding exercise", which may have clouded my vision.

As for "where is the problem", yes that was an aspect of Keval I consciously dropped - partially. The tokenizer has the full info so it can point to character indices, but the AST - why should it maintain the original input? When the AST parser needs to complain, the token string should be enough as hint, and when something goes wrong at eval time, each Node can reconstruct a logical "view" of its part of the original expression, which should likewise be enough for messages. Keval kept the original expression so the AST parser would "consume" already tokenized tokens one by one, string matching something already matched previously. I deemed it unnecessary. The integration in the validator is so far rudimentary in the sense I have refrained from patching up the validator code itself, but if more detail is needed it's certainly possible.

@touhidurrr
Copy link
Contributor

touhidurrr commented Apr 17, 2025

When did you guys write a parser by the way. The first time I saw the find-replace code for properties parser while writing my Number formatter pr, I was honestly like: what is this. Why do we not have an parser already if the usage of regex is pushed that far. But it is also true that writing a parser is way too much work and have way to many edge cases. Remember the time when I wrote a json parser myself that was actually in production for quite a bit of time and worked in 99% of the time. (The only exception I remember when user inputted strings inside json caused some unexpected issues)

But anyways, when did you guys write an actual parser for this. And maybe write up a libray now naming it UncivProperties. Lol. The description will be something like this:
The usal properties was not enough for our game and we had to write our own language to have a proper properties syntax.
And yair would write yet another cheesy blog post about it. Not a bad idea.

@yairm210
Copy link
Owner

This satisfies most of my basic requirements. All errors now indicate position, we can compile prior to evaluation so we can indicate parsing errors to users.

However

  • I don't believe in ETAs for open source projects. Who, exactly, is guaranteeing this? No one, so the date can easily roll around with no changes

  • I like the caching of parse results 👍🏿. I'm not sure how I feel about the fact that it's global. Theoretically you're right that the same text should parse to the same tree always... but using globals to share cache results from different rulesets feels wrong... in this case I think it's ok but I'm not super comfortable with it, don't have a better idea though

  • Explainability is still lacking. It's not enough to know that it "doesn't match", it needs to be exact.

@yairm210
Copy link
Owner

image Not great but better than before

@yairm210
Copy link
Owner

BTW 2 *** 3 doesn't show as an error, that's a bug :)

@yairm210
Copy link
Owner

@touhidurrr
I don't remember there being anything particularly complicated in the properties find/replace? All the translation file stuff is pretty benign from what I recall.
You mean like line 223?

There's a lot going on in that file for sure, but each part is self contained and there's no complex sequence. Unlike here, where there IS a complex sequence: Text to tokens, tokens to AST, AST to result, so to get any value at all out of this you need to go though all stages

@yairm210
Copy link
Owner

Found the problem...
"2 *** 3" is one of the parameters.
getErrorSeverity on MatchResult Maybe, returns a PossibleFilteringUnique.
So the mod checker looks at it, says "ok that could be a filtering unique", checks and indeed it's one of the parameters in another unique ("happens" to be this exact one) so it says "OK filtering it is"...

This is because Expression.matches("2 *** 3", ruleset) returns Maybe. This is a bug, since this is obviously NOT a valid expression.
THIS is because SyntaxError is subclassed to exceptions like: UnmatchedParentheses, MissingOperand, InvalidConstant, which are NOT "maybe this is a problem" but "this is definitely a problem".

TL;DR, in the current setup there can be no "maybe this is an expression", since all "maybe" turns into "yes", so I'm removing that option entirely so we get user visibility on expression parsing for all failed countables

@touhidurrr
Copy link
Contributor

touhidurrr commented Apr 17, 2025

? This is not about translations at all, in fact, the problem of internationalized presentation is knowingly ignored/postponed

Opps. I meant unique but said translations instead. Or maybe thought both are similar. How are we handling uniques currently by the way. More regex?

Always wanted to contribute uniques code in Unciv. Where to start?

@yairm210
Copy link
Owner

@touhidurrr
Uniques parsing is done via separating out the conditionals (<...>) and taking the base uniques, taking out the placeholders and matching both the base unique and the conditionals to unique types.
There are a few regex-y parts but it's mostly string parsing (see String.getPlaceholderParams)
Regarding work, there's nothing missing from that system currently, it Just Works ™️

@SomeTroglodyte
I see why you introduced the concept of Maybe, it's so the parser could determine if a countable within the expression is bad, right? Unfortunately that doesn't actually give us what we want, since any text can theoretically be a resource name, so with "ruleset-less" parsing we'd always get "any conditional is a potential resource name".

Instead, we should return to the way it used to work - with only booleans. Either it matches or it doesn't.
How should the parser deal with this? By accepting that "this thing wrapped in square brackets is a countable token".
Is that countable actually acceptable? That's only knowable on a ruleset basis, therefore: NOT MY BUSINESS!

So when should we actually validate that? Only when checking the unique parameters, at which point we DO have a ruleset.
So basically:

  • The tokenizer does not accept a Ruleset. It just knows "looks like countable or not"
  • Expressions.parse() therefore catches all errors except for ruleset-specific ones (countables are ok)
  • Expressions.getErrorSeverity(), which ALWAYS gets a ruleset, is an if: if it's unparseable, bad; Else, if any countable bad, bad; Else, good.

This means that apart from eval() which gives a double, Node also needs to have another overridable function for validity checking which accepts a ruleset. This is DFS-y in nature which is super simple since you have numeric constant (true), unary (validity = operand.validity), binary (validity = left.validity && right.validity), and countable (here's where the magic happens).

@yairm210
Copy link
Owner

Now that I think of it this would also resolve the reservations I have with intra-ruleset caching
If the parsing is entirely ruleset-agnostic and rulesets are only relevant when validating uniques and evaluating, then the cache is AOK :)

@SomeTroglodyte
Copy link
Collaborator Author

This is a bug, since this is obviously NOT a valid expression.

Very true 👀

in the current setup there can be no "maybe this is an expression"

Doesn't MalformedCountable still count as "maybe"? 🤔

I see why you introduced the concept of Maybe

...and I introduced it in a PR named "RFC" because I fully expected the names might need to change. I introduced it so a Countable like FilteredStuff when presented with "[crap] Stuff" could say "well YES the pattern is one of mine BUT it's either badly parameterized or I can't check parameters right now". So.... The AST parser might need to postpone the error to eval time? Or would we need 4 levels and split the enum into "NotMine, MineButBad, OKButCantCkeckWithoutRuleset, Perfect"? No, was that why I chose to already pass the ruleset into the tokenizer? NO, that was purely to accomodate that 🤬 patternless TileResources. A conundrum.

rulesets are only relevant when validating uniques and evaluating

Sounds about right as I envisioned it - except the TileResources thing.

since any text can theoretically be a resource name

Ah yes exactly, you nailed it. I can see more clearly now the rain is gone 🎶 .

DFS-y in nature

DFS as in Distributed file system? Dynamic frequency scaling?

...
But that begins to smell like a two-level cache: Caching an AST with all Countable terms only having a generic representation, not already a reference to the responsible instance, would still help, but for performance we would also want an AST cached with all handlers as resolved as possible. First full validation or eval could replace the cache entry?

Anyway, good thinking. For me to implement these ideas, however, you'd have to holler and/or beg, at the moment I'm more driven to clean up my "how I want to play" branch in another project or work on my unseen movies backlog.

To think I only dove into an actual implementation 'cuz some outsider (in precisely that "other project") needled me into trying ChatGPT for my first time, then half-earnestly I tried to ask it about evaluators and after a few back and forths I recognized the code it presented verbatim... I should have resisted maybe.

@yairm210
Copy link
Owner

This can be "pending" potentially forever, don't change your plans
As for double cache, I think that's overthinking for now. It's not like we cache Countable resolution currently. This sounds like more of the same of what we have.

@SomeTroglodyte
Copy link
Collaborator Author

not like we cache Countable resolution currently

Does. The AST stores a Node.Countable which already knows which instance handles eval - its countable field is strongly typed and non-nullable...

@yairm210
Copy link
Owner

I mean "currently for countables in main branch".
So I'm ok with not having resolution in the tree as well

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

Conflicts have been resolved.

@yairm210
Copy link
Owner

So, what do we have now?

  • Countables are detected at runtime, not compile time eval time, not parse time
  • This means an AST is ruleset-independent and can be shared across rulesets, so the cache is fine - same text will always parse to the same AST
  • If you don't use any countables you can eval as-is, if you do use countables you need a context. Luckily, there's already a state, which - look at that - contains the game, contains the ruleset! Golly gee willikers
    • I think I did make a mistake regarding eval - if you don't provide a ruleset, the answer should be 0, not exception
  • doubles are not countables - by definition a countable gives you an Int. This is non-negotiable :D

So what's actually missing? Checking viability, as I said previously - scanning the tree until the leaf nodes to see if any returns "I can't match this to a countable".
From a performance point of view, I would like to cache matching countables in the tree - most countables are not like resources, they don't require a ruleset in order to match, so those should be ruleset-independent as well and we can catch that at parse time also.
Stay Tuned, hopefully

@yairm210
Copy link
Owner

OK, errors are wired up!

There are 2 kinds of errors: "We couldn't parse the AST"...
image

And "The AST is fine, but this ruleset can't provide meaning to countables"
image

Not all is good, this seems to break subtly in this case:
image

image Apparently there was a flaw in the placeholderText function :| Well, now we can test it until it's solved :D

@yairm210
Copy link
Owner

Ohhh I see...

image

Since we replaced all [Iron] with [], we can now no longer find [3 * 2 + [Iron] + [bob]] to replace with []...hrm

@yairm210
Copy link
Owner

So, since we found the placeholders in the first place in sequential order, the solution is to do .replaceFirst() instead of .replace(), and everything works :D

@yairm210
Copy link
Owner

image Perfect ;)

@yairm210
Copy link
Owner

The user will only see the red error, but the mod checker shows the yellow error, which is the "subtype"

@yairm210
Copy link
Owner

And with that... I think it's just about ready 0_0
I'm a bit scared it will break in subtle ways but I've already handled all the types of ways I can think of... the only next move is to start using it and see how else it can break 😅

@yairm210
Copy link
Owner

yairm210 commented May 1, 2025

There are actually 2 other issues we need to deal with before this goes live

  • Performance - currently every single countable undergoes expression parsing. This is Obviously Wrong, and it stems from another wrong part of the design - getMatching. This should obviously be getFirstMatching, having possible conflicts between countables is silly, and that way if it matches other countables. we won't need to parse
  • Documentation - I'll expand it now and we'll see from users what's missing

@yairm210
Copy link
Owner

yairm210 commented May 1, 2025

OK, I can't think of anything else
So time to take the plunge

@yairm210 yairm210 merged commit e38e76b into yairm210:master May 1, 2025
4 checks passed
@SomeTroglodyte SomeTroglodyte deleted the Expressions1 branch June 11, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants