Expressions as Countable #13218

SomeTroglodyte · 2025-04-16T18:22:29Z

...Not ChatGPT generated...

…the "engine" character of that interface

…obustness

yairm210 · 2025-04-16T19:25:03Z

core/src/com/unciv/models/ruleset/unique/Countables.kt

+        ;
+        abstract fun isOK(strict: Boolean): Boolean
+        companion object {
+            operator fun invoke(bool: Boolean?) = when(bool) {


Please just call this 'from' without operator overloading.
When you're young it's nice to experiment, it's fine to be operator-curious, but I think we're both past that phase :)

yairm210 · 2025-04-16T20:25:42Z

And why, indeed, NOT use Keval?

SomeTroglodyte · 2025-04-16T20:50:11Z

And why, indeed, NOT use Keval?

Already mentioned some reasons. Foremost maybe: the tokenizer isn't customizable when consumed as library and won't suit our needs. Well, maybe it is by bypassing their String.toAST... No, even more importantly is we need domain lookup for our countables, and Keval is, as I said, highly customizable only in its Grammar - with builders for Constants, Unary operators, Binary operators and functions - and none fits. Constants need their result up-front, while functions need the identifier(vararg term) syntax. So the notation to fetch a Countable, done with the unmodified library, could be countable([Your] Cities) - not [[Your] Cities]. Most elegant we could do would be to fork it and provide a sibling to KevalConstant such as internal data class KevalDynamicValue(val delegate: (context: ?) -> Double) : KevalOperator, but then the problem immediately presents itself how to pass context through - Keval doesn't cater for one. Same problem for the function approach. Drill it open to add that concept, and what you get is more complex than this.

yairm210 · 2025-04-16T21:06:52Z

For once we have a need that is actually generic, and even for that we can't use a generic solution?
Changing countable to a function call sounds A-OK to me, certainly less work than rolling our own parser, no?

yairm210 · 2025-04-16T21:33:52Z

I mean with getplaceholders + placeholdertext, generating a countable() version of the user text sounds simple to me
The question is, does Keval accept string functions, or is it all numbers...

SomeTroglodyte · 2025-04-16T21:56:05Z

less work than rolling our own parser

Except it's done and already passes quite a load of unit tests

yairm210 · 2025-04-16T23:11:00Z

That doesn't super help when what I'm trying to avoid is having to maintain this in the future.
Sounds to me like Keval, which does accept multi-input functions, is a better fit.
But even more so, if exp4j DOES allow for functions

At times like this I have to ask myself, how did we get here? I thought I was making a strategy game, and now I need to maintain a tokenizer/parser for generic expressions?
Why the hell are we allowing Pi and e if we want to get countables, which are Ints?
And why on earth would we want sin and cos? Surely if there were ANY function a modder would want, it would be min or max between two ints?

Most of all, this sounds like solving a solved problem.
If we take exp4j as an example, it actually solves all the problems - allows parsing prior to eval, setting variables, etc.
I don't care that it's unmaintained, it solves the problem we have.

And if not? We should be making a new library. Or changing an existing one - make Keval accept parse/eval distinctions, make it accept parameters.
The "worst of both worlds" is one where I have all the hassle of figuring out modder's problems, but even if I put in the work it only helps people with this specific demand modding this game. That's not great open source.

I think the root of Keval's problems is that it doesn't make the parse/eval distinction (outside - inside it exists). That's why it only allows constants (there are no runtime parameters since there is no runtime separate from parsing). The author is active so this definitely seems doable to me.

yairm210 · 2025-04-16T23:13:57Z

Another thing real parsers have is "where is the problem", start to finish. I see Keval has "position" which is a start (ba-dum tish). This is important since we need to tell the user "your problem is HERE--> <--" and that requires positions. Start position will let you insert a ❌ emoji at the location, which is Good Enough ™️

SomeTroglodyte · 2025-04-16T23:19:32Z

🤷

yairm210 · 2025-04-16T23:20:13Z

It's 2 in the morning here, maybe I'm overreacting. I'll see what I think of this tomorrow. Maybe we can treat this as just a fun coding exercise, in which case we can do what we want but modders should be aware this is breakable

SomeTroglodyte · 2025-04-16T23:45:53Z

overreacting

No I get it. Without checking it out and playing a bit with it (e.g. setting breakpoints in the unit tests and inspecting an AST) it would be hard to get a real feel for it.

And the maintainability burden is of course very real, which is why it must be readable, and once adopted it should be possible to treat is as "black box", meaning it works, period, and changes in other parts of the project shouldn't be able to break it - or at least any cross-influence should be caught by the tests. In that, I'm pretty confident, though of course you're right about the "fun coding exercise", which may have clouded my vision.

As for "where is the problem", yes that was an aspect of Keval I consciously dropped - partially. The tokenizer has the full info so it can point to character indices, but the AST - why should it maintain the original input? When the AST parser needs to complain, the token string should be enough as hint, and when something goes wrong at eval time, each Node can reconstruct a logical "view" of its part of the original expression, which should likewise be enough for messages. Keval kept the original expression so the AST parser would "consume" already tokenized tokens one by one, string matching something already matched previously. I deemed it unnecessary. The integration in the validator is so far rudimentary in the sense I have refrained from patching up the validator code itself, but if more detail is needed it's certainly possible.

Otherwise, modders can't actually debug.

touhidurrr · 2025-04-17T11:54:21Z

When did you guys write a parser by the way. The first time I saw the find-replace code for properties parser while writing my Number formatter pr, I was honestly like: what is this. Why do we not have an parser already if the usage of regex is pushed that far. But it is also true that writing a parser is way too much work and have way to many edge cases. Remember the time when I wrote a json parser myself that was actually in production for quite a bit of time and worked in 99% of the time. (The only exception I remember when user inputted strings inside json caused some unexpected issues)

But anyways, when did you guys write an actual parser for this. And maybe write up a libray now naming it UncivProperties. Lol. The description will be something like this:
The usal properties was not enough for our game and we had to write our own language to have a proper properties syntax.
And yair would write yet another cheesy blog post about it. Not a bad idea.

yairm210 · 2025-04-17T11:57:23Z

This satisfies most of my basic requirements. All errors now indicate position, we can compile prior to evaluation so we can indicate parsing errors to users.

However

I don't believe in ETAs for open source projects. Who, exactly, is guaranteeing this? No one, so the date can easily roll around with no changes
I like the caching of parse results 👍🏿. I'm not sure how I feel about the fact that it's global. Theoretically you're right that the same text should parse to the same tree always... but using globals to share cache results from different rulesets feels wrong... in this case I think it's ok but I'm not super comfortable with it, don't have a better idea though
Explainability is still lacking. It's not enough to know that it "doesn't match", it needs to be exact.

yairm210 · 2025-04-17T12:26:22Z

Not great but better than before

yairm210 · 2025-04-17T12:29:33Z

BTW 2 *** 3 doesn't show as an error, that's a bug :)

yairm210 · 2025-04-17T12:40:17Z

@touhidurrr
I don't remember there being anything particularly complicated in the properties find/replace? All the translation file stuff is pretty benign from what I recall.
You mean like line 223?

There's a lot going on in that file for sure, but each part is self contained and there's no complex sequence. Unlike here, where there IS a complex sequence: Text to tokens, tokens to AST, AST to result, so to get any value at all out of this you need to go though all stages

yairm210 · 2025-04-17T19:30:32Z

Found the problem...
"2 *** 3" is one of the parameters.
getErrorSeverity on MatchResult Maybe, returns a PossibleFilteringUnique.
So the mod checker looks at it, says "ok that could be a filtering unique", checks and indeed it's one of the parameters in another unique ("happens" to be this exact one) so it says "OK filtering it is"...

This is because Expression.matches("2 *** 3", ruleset) returns Maybe. This is a bug, since this is obviously NOT a valid expression.
THIS is because SyntaxError is subclassed to exceptions like: UnmatchedParentheses, MissingOperand, InvalidConstant, which are NOT "maybe this is a problem" but "this is definitely a problem".

TL;DR, in the current setup there can be no "maybe this is an expression", since all "maybe" turns into "yes", so I'm removing that option entirely so we get user visibility on expression parsing for all failed countables

touhidurrr · 2025-04-17T19:35:12Z

? This is not about translations at all, in fact, the problem of internationalized presentation is knowingly ignored/postponed

Opps. I meant unique but said translations instead. Or maybe thought both are similar. How are we handling uniques currently by the way. More regex?

Always wanted to contribute uniques code in Unciv. Where to start?

yairm210 · 2025-04-17T19:58:12Z

@touhidurrr
Uniques parsing is done via separating out the conditionals (<...>) and taking the base uniques, taking out the placeholders and matching both the base unique and the conditionals to unique types.
There are a few regex-y parts but it's mostly string parsing (see String.getPlaceholderParams)
Regarding work, there's nothing missing from that system currently, it Just Works ™️

@SomeTroglodyte
I see why you introduced the concept of Maybe, it's so the parser could determine if a countable within the expression is bad, right? Unfortunately that doesn't actually give us what we want, since any text can theoretically be a resource name, so with "ruleset-less" parsing we'd always get "any conditional is a potential resource name".

Instead, we should return to the way it used to work - with only booleans. Either it matches or it doesn't.
How should the parser deal with this? By accepting that "this thing wrapped in square brackets is a countable token".
Is that countable actually acceptable? That's only knowable on a ruleset basis, therefore: NOT MY BUSINESS!

So when should we actually validate that? Only when checking the unique parameters, at which point we DO have a ruleset.
So basically:

The tokenizer does not accept a Ruleset. It just knows "looks like countable or not"
Expressions.parse() therefore catches all errors except for ruleset-specific ones (countables are ok)
Expressions.getErrorSeverity(), which ALWAYS gets a ruleset, is an if: if it's unparseable, bad; Else, if any countable bad, bad; Else, good.

This means that apart from eval() which gives a double, Node also needs to have another overridable function for validity checking which accepts a ruleset. This is DFS-y in nature which is super simple since you have numeric constant (true), unary (validity = operand.validity), binary (validity = left.validity && right.validity), and countable (here's where the magic happens).

yairm210 · 2025-04-17T21:35:35Z

Now that I think of it this would also resolve the reservations I have with intra-ruleset caching
If the parsing is entirely ruleset-agnostic and rulesets are only relevant when validating uniques and evaluating, then the cache is AOK :)

SomeTroglodyte · 2025-04-17T22:05:33Z

This is a bug, since this is obviously NOT a valid expression.

Very true 👀

in the current setup there can be no "maybe this is an expression"

Doesn't MalformedCountable still count as "maybe"? 🤔

I see why you introduced the concept of Maybe

...and I introduced it in a PR named "RFC" because I fully expected the names might need to change. I introduced it so a Countable like FilteredStuff when presented with "[crap] Stuff" could say "well YES the pattern is one of mine BUT it's either badly parameterized or I can't check parameters right now". So.... The AST parser might need to postpone the error to eval time? Or would we need 4 levels and split the enum into "NotMine, MineButBad, OKButCantCkeckWithoutRuleset, Perfect"? No, was that why I chose to already pass the ruleset into the tokenizer? NO, that was purely to accomodate that 🤬 patternless TileResources. A conundrum.

rulesets are only relevant when validating uniques and evaluating

Sounds about right as I envisioned it - except the TileResources thing.

since any text can theoretically be a resource name

Ah yes exactly, you nailed it. I can see more clearly now the rain is gone 🎶 .

DFS-y in nature

DFS as in Distributed file system? Dynamic frequency scaling?

...
But that begins to smell like a two-level cache: Caching an AST with all Countable terms only having a generic representation, not already a reference to the responsible instance, would still help, but for performance we would also want an AST cached with all handlers as resolved as possible. First full validation or eval could replace the cache entry?

Anyway, good thinking. For me to implement these ideas, however, you'd have to holler and/or beg, at the moment I'm more driven to clean up my "how I want to play" branch in another project or work on my unseen movies backlog.

To think I only dove into an actual implementation 'cuz some outsider (in precisely that "other project") needled me into trying ChatGPT for my first time, then half-earnestly I tried to ask it about evaluators and after a few back and forths I recognized the code it presented verbatim... I should have resisted maybe.

yairm210 · 2025-04-18T04:37:26Z

This can be "pending" potentially forever, don't change your plans
As for double cache, I think that's overthinking for now. It's not like we cache Countable resolution currently. This sounds like more of the same of what we have.

SomeTroglodyte · 2025-04-18T04:43:33Z

not like we cache Countable resolution currently

Does. The AST stores a Node.Countable which already knows which instance handles eval - its countable field is strongly typed and non-nullable...

yairm210 · 2025-04-18T06:22:09Z

I mean "currently for countables in main branch".
So I'm ok with not having resolution in the tree as well

github-actions · 2025-04-18T09:10:38Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-04-29T20:09:34Z

Conflicts have been resolved.

yairm210 · 2025-04-29T20:56:41Z

So, what do we have now?

Countables are detected at ~~runtime, not compile time~~ eval time, not parse time
This means an AST is ruleset-independent and can be shared across rulesets, so the cache is fine - same text will always parse to the same AST
If you don't use any countables you can eval as-is, if you do use countables you need a context. Luckily, there's already a state, which - look at that - contains the game, contains the ruleset! Golly gee willikers
- I think I did make a mistake regarding eval - if you don't provide a ruleset, the answer should be 0, not exception
doubles are not countables - by definition a countable gives you an Int. This is non-negotiable :D

So what's actually missing? Checking viability, as I said previously - scanning the tree until the leaf nodes to see if any returns "I can't match this to a countable".
From a performance point of view, I would like to cache matching countables in the tree - most countables are not like resources, they don't require a ruleset in order to match, so those should be ruleset-independent as well and we can catch that at parse time also.
Stay Tuned, hopefully

yairm210 · 2025-04-30T12:17:10Z

OK, errors are wired up!

There are 2 kinds of errors: "We couldn't parse the AST"...

And "The AST is fine, but this ruleset can't provide meaning to countables"

Not all is good, this seems to break subtly in this case:

Apparently there was a flaw in the placeholderText function :| Well, now we can test it until it's solved :D

yairm210 · 2025-04-30T12:21:56Z

Ohhh I see...

Since we replaced all [Iron] with [], we can now no longer find [3 * 2 + [Iron] + [bob]] to replace with []...hrm

yairm210 · 2025-04-30T12:31:36Z

So, since we found the placeholders in the first place in sequential order, the solution is to do .replaceFirst() instead of .replace(), and everything works :D

yairm210 · 2025-04-30T12:32:49Z

Perfect ;)

yairm210 · 2025-04-30T12:35:30Z

The user will only see the red error, but the mod checker shows the yellow error, which is the "subtype"

yairm210 · 2025-04-30T12:36:37Z

And with that... I think it's just about ready 0_0
I'm a bit scared it will break in subtle ways but I've already handled all the types of ways I can think of... the only next move is to start using it and see how else it can break 😅

yairm210 · 2025-05-01T07:15:01Z

There are actually 2 other issues we need to deal with before this goes live

Performance - currently every single countable undergoes expression parsing. This is Obviously Wrong, and it stems from another wrong part of the design - getMatching. This should obviously be getFirstMatching, having possible conflicts between countables is silly, and that way if it matches other countables. we won't need to parse
Documentation - I'll expand it now and we'll see from users what's missing

…as Expressions

yairm210 · 2025-05-01T17:26:18Z

OK, I can't think of anything else
So time to take the plunge

SomeTroglodyte added 6 commits April 14, 2025 01:43

Redefine ICountable.matches, getDeprecationAnnotation is not part of …

cb6f247

…the "engine" character of that interface

Add a way to mark not-yet-finished Countables

c9f27b8

Add an empty framework for @AutumnPizazz's Expression engine

6d95ea3

Make "which does not fit parameter type" constant for pattern match r…

eee7dd0

…obustness

New expression evaluator engine

adfa8bf

Fix countable tests: correct expected failures

dd46c40

SomeTroglodyte mentioned this pull request Apr 16, 2025

support +-*/^% and log (maybe) #13200

Closed

yairm210 reviewed Apr 16, 2025

View reviewed changes

SomeTroglodyte closed this Apr 16, 2025

yairm210 reopened this Apr 16, 2025

yairm210 added 4 commits April 17, 2025 13:01

All syntax errors MUST indicate position.

9b6161f

Otherwise, modders can't actually debug.

Better positions + documentation

4ad2cb7

Get rid of functions that modders *should not* be using

b5bf57a

Fix tests given the new changes

d52c277

Add modder-visible expression parsing errors

442600c

github-actions bot added the Conflicts label Apr 18, 2025

Revert Countables to current state

5bd7727

github-actions bot removed the Conflicts label Apr 29, 2025

yairm210 added 2 commits April 29, 2025 23:22

Fix the rest of the damn owl - compilation, not tests

eb9d6b9

Fix all tests as well

f934f36

Detect countables at parse; Don't crash

e5ad620

Add error detection for all types of expression errors

2fad8c9

yairm210 added 2 commits May 1, 2025 10:16

Better documentation

0aac347

Countables catches first matching, to avoid parsing known countables …

5ceb830

…as Expressions

yairm210 merged commit e38e76b into yairm210:master May 1, 2025
4 checks passed

SomeTroglodyte deleted the Expressions1 branch June 11, 2025 21:13

Expressions as Countable #13218

Expressions as Countable #13218

Uh oh!

Conversation

SomeTroglodyte commented Apr 16, 2025

Uh oh!

yairm210 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

SomeTroglodyte commented Apr 16, 2025

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

SomeTroglodyte commented Apr 16, 2025

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

SomeTroglodyte commented Apr 16, 2025

Uh oh!

yairm210 commented Apr 16, 2025

Uh oh!

SomeTroglodyte commented Apr 16, 2025

Uh oh!

touhidurrr commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

touhidurrr commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 17, 2025

Uh oh!

SomeTroglodyte commented Apr 17, 2025

Uh oh!

yairm210 commented Apr 18, 2025

Uh oh!

SomeTroglodyte commented Apr 18, 2025

Uh oh!

yairm210 commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 29, 2025

Uh oh!

yairm210 commented Apr 29, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented Apr 30, 2025

Uh oh!

yairm210 commented May 1, 2025

Uh oh!

yairm210 commented May 1, 2025

Uh oh!

Uh oh!

touhidurrr commented Apr 17, 2025 •

edited

Loading

touhidurrr commented Apr 17, 2025 •

edited

Loading