Skip to content

Expressions as Countable #13218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

SomeTroglodyte
Copy link
Collaborator

...Not ChatGPT generated...

;
abstract fun isOK(strict: Boolean): Boolean
companion object {
operator fun invoke(bool: Boolean?) = when(bool) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just call this 'from' without operator overloading.
When you're young it's nice to experiment, it's fine to be operator-curious, but I think we're both past that phase :)

@yairm210
Copy link
Owner

And why, indeed, NOT use Keval?

@SomeTroglodyte
Copy link
Collaborator Author

And why, indeed, NOT use Keval?

Already mentioned some reasons. Foremost maybe: the tokenizer isn't customizable when consumed as library and won't suit our needs. Well, maybe it is by bypassing their String.toAST... No, even more importantly is we need domain lookup for our countables, and Keval is, as I said, highly customizable only in its Grammar - with builders for Constants, Unary operators, Binary operators and functions - and none fits. Constants need their result up-front, while functions need the identifier(vararg term) syntax. So the notation to fetch a Countable, done with the unmodified library, could be countable([Your] Cities) - not [[Your] Cities]. Most elegant we could do would be to fork it and provide a sibling to KevalConstant such as internal data class KevalDynamicValue(val delegate: (context: ?) -> Double) : KevalOperator, but then the problem immediately presents itself how to pass context through - Keval doesn't cater for one. Same problem for the function approach. Drill it open to add that concept, and what you get is more complex than this.

@yairm210
Copy link
Owner

For once we have a need that is actually generic, and even for that we can't use a generic solution?
Changing countable to a function call sounds A-OK to me, certainly less work than rolling our own parser, no?

@yairm210
Copy link
Owner

I mean with getplaceholders + placeholdertext, generating a countable() version of the user text sounds simple to me
The question is, does Keval accept string functions, or is it all numbers...

@SomeTroglodyte
Copy link
Collaborator Author

less work than rolling our own parser

Except it's done and already passes quite a load of unit tests

@yairm210
Copy link
Owner

That doesn't super help when what I'm trying to avoid is having to maintain this in the future.
Sounds to me like Keval, which does accept multi-input functions, is a better fit.
But even more so, if exp4j DOES allow for functions

At times like this I have to ask myself, how did we get here? I thought I was making a strategy game, and now I need to maintain a tokenizer/parser for generic expressions?
Why the hell are we allowing Pi and e if we want to get countables, which are Ints?
And why on earth would we want sin and cos? Surely if there were ANY function a modder would want, it would be min or max between two ints?

Most of all, this sounds like solving a solved problem.
If we take exp4j as an example, it actually solves all the problems - allows parsing prior to eval, setting variables, etc.
I don't care that it's unmaintained, it solves the problem we have.

And if not? We should be making a new library. Or changing an existing one - make Keval accept parse/eval distinctions, make it accept parameters.
The "worst of both worlds" is one where I have all the hassle of figuring out modder's problems, but even if I put in the work it only helps people with this specific demand modding this game. That's not great open source.

I think the root of Keval's problems is that it doesn't make the parse/eval distinction (outside - inside it exists). That's why it only allows constants (there are no runtime parameters since there is no runtime separate from parsing). The author is active so this definitely seems doable to me.

@yairm210
Copy link
Owner

Another thing real parsers have is "where is the problem", start to finish. I see Keval has "position" which is a start (ba-dum tish). This is important since we need to tell the user "your problem is HERE--> <--" and that requires positions. Start position will let you insert a ❌ emoji at the location, which is Good Enough ™️

@SomeTroglodyte
Copy link
Collaborator Author

🤷

@yairm210
Copy link
Owner

It's 2 in the morning here, maybe I'm overreacting. I'll see what I think of this tomorrow. Maybe we can treat this as just a fun coding exercise, in which case we can do what we want but modders should be aware this is breakable

@yairm210 yairm210 reopened this Apr 16, 2025
@SomeTroglodyte
Copy link
Collaborator Author

overreacting

No I get it. Without checking it out and playing a bit with it (e.g. setting breakpoints in the unit tests and inspecting an AST) it would be hard to get a real feel for it.

And the maintainability burden is of course very real, which is why it must be readable, and once adopted it should be possible to treat is as "black box", meaning it works, period, and changes in other parts of the project shouldn't be able to break it - or at least any cross-influence should be caught by the tests. In that, I'm pretty confident, though of course you're right about the "fun coding exercise", which may have clouded my vision.

As for "where is the problem", yes that was an aspect of Keval I consciously dropped - partially. The tokenizer has the full info so it can point to character indices, but the AST - why should it maintain the original input? When the AST parser needs to complain, the token string should be enough as hint, and when something goes wrong at eval time, each Node can reconstruct a logical "view" of its part of the original expression, which should likewise be enough for messages. Keval kept the original expression so the AST parser would "consume" already tokenized tokens one by one, string matching something already matched previously. I deemed it unnecessary. The integration in the validator is so far rudimentary in the sense I have refrained from patching up the validator code itself, but if more detail is needed it's certainly possible.

@touhidurrr
Copy link
Contributor

touhidurrr commented Apr 17, 2025

When did you guys write a parser by the way. The first time I saw the find-replace code for properties parser while writing my Number formatter pr, I was honestly like: what is this. Why do we not have an parser already if the usage of regex is pushed that far. But it is also true that writing a parser is way too much work and have way to many edge cases. Remember the time when I wrote a json parser myself that was actually in production for quite a bit of time and worked in 99% of the time. (The only exception I remember when user inputted strings inside json caused some unexpected issues)

But anyways, when did you guys write an actual parser for this. And maybe write up a libray now naming it UncivProperties. Lol. The description will be something like this:
The usal properties was not enough for our game and we had to write our own language to have a proper properties syntax.
And yair would write yet another cheesy blog post about it. Not a bad idea.

@yairm210
Copy link
Owner

This satisfies most of my basic requirements. All errors now indicate position, we can compile prior to evaluation so we can indicate parsing errors to users.

However

  • I don't believe in ETAs for open source projects. Who, exactly, is guaranteeing this? No one, so the date can easily roll around with no changes

  • I like the caching of parse results 👍🏿. I'm not sure how I feel about the fact that it's global. Theoretically you're right that the same text should parse to the same tree always... but using globals to share cache results from different rulesets feels wrong... in this case I think it's ok but I'm not super comfortable with it, don't have a better idea though

  • Explainability is still lacking. It's not enough to know that it "doesn't match", it needs to be exact.

@yairm210
Copy link
Owner

image Not great but better than before

@yairm210
Copy link
Owner

BTW 2 *** 3 doesn't show as an error, that's a bug :)

@yairm210
Copy link
Owner

@touhidurrr
I don't remember there being anything particularly complicated in the properties find/replace? All the translation file stuff is pretty benign from what I recall.
You mean like line 223?

There's a lot going on in that file for sure, but each part is self contained and there's no complex sequence. Unlike here, where there IS a complex sequence: Text to tokens, tokens to AST, AST to result, so to get any value at all out of this you need to go though all stages

@touhidurrr
Copy link
Contributor

touhidurrr commented Apr 17, 2025

less work than rolling our own parser

Except it's done and already passes quite a load of unit tests

regex is pushed that far

there's no complex sequence

complex or not why not make it a package @GGGuenni.

@SomeTroglodyte
Copy link
Collaborator Author

I don't believe in ETAs for open source projects

That's because this PR includes commit Add a way to mark not-yet-finished Countables which was offered as not-quite-serious element in Another Countables Pass - RFC. I couldn't base it on master 'cuz I needed commits 1 and 3 from that.

As you said - a cute exercise in coding, fun, a demo "we could if we wanted to", but ultimately of minor use: The idea was to later make a feature available to users as opt-in and the opt-in would automatically expire unless the experiment progresses. As in, modders would be the ones to stress-test the feature, but would need to do so knowing we still expect trouble. Or somesuch vague idea.

@SomeTroglodyte
Copy link
Collaborator Author

global caching

Yes very much so. Needs an Eureka. I'd prefer it on GameInfo even - pass a gameInfo even at parse time just for that? That was a 'code easy way now, forget to revisit later' thing.

@SomeTroglodyte
Copy link
Collaborator Author

But anyways, when did you guys write an actual parser for this. And maybe write up a libray now naming it UncivProperties

? This is not about translations at all, in fact, the problem of internationalized presentation is knowingly ignored/postponed

@SomeTroglodyte
Copy link
Collaborator Author

BTW 2 *** 3 doesn't show as an error, that's a bug :)

How do you figure?

Index: tests/src/com/unciv/uniques/ExpressionTests.kt
<+>UTF-8
===================================================================
diff --git a/tests/src/com/unciv/uniques/ExpressionTests.kt b/tests/src/com/unciv/uniques/ExpressionTests.kt
--- a/tests/src/com/unciv/uniques/ExpressionTests.kt	(revision 442600c53f781e4c5f2d0447ad10dd1996d2e2cf)
+++ b/tests/src/com/unciv/uniques/ExpressionTests.kt	(date 1744912221824)
@@ -55,6 +55,7 @@
     @Test
     fun testInvalidExpressions() {
         val input = listOf(
+            "2 *** 3" to Parser.MissingOperand::class,
             "fake_function(2)" to Parser.UnknownIdentifier::class,
             "[fake countable]" to Parser.UnknownCountable::class,
             "98.234.792.374" to Parser.InvalidConstant::class,

...passes. (Did so too before your improvements).

BTW, good job extending the ParsingError exception - I must have had exactly that in mind at some point.

@SomeTroglodyte
Copy link
Collaborator Author

Needs an Eureka

Maybe - warning: a demo not production code

Index: core/src/com/unciv/models/ruleset/unique/expressions/Expressions.kt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/core/src/com/unciv/models/ruleset/unique/expressions/Expressions.kt b/core/src/com/unciv/models/ruleset/unique/expressions/Expressions.kt
--- a/core/src/com/unciv/models/ruleset/unique/expressions/Expressions.kt	(revision 442600c53f781e4c5f2d0447ad10dd1996d2e2cf)
+++ b/core/src/com/unciv/models/ruleset/unique/expressions/Expressions.kt	(date 1744912139184)
@@ -1,5 +1,6 @@
 package com.unciv.models.ruleset.unique.expressions
 
+import com.unciv.UncivGame
 import com.unciv.models.ruleset.Ruleset
 import com.unciv.models.ruleset.unique.ICountable
 import com.unciv.models.ruleset.unique.StateForConditionals
@@ -25,18 +26,64 @@
     private data class ParseResult(val severity: ICountable.MatchResult, val node: Node?, val exception: Parser.ParsingError?)
 
     companion object {
-        private val cache: MutableMap<String, ParseResult> = mutableMapOf()
+        private val cache = ExpressionCache.getInstance(ExpressionCacheType.ClearingPerGameInfo)
+
+        private fun parse(parameterText: String, ruleset: Ruleset?): ParseResult = cache.getOrPut(parameterText, ruleset)
+
+        fun getParsingError(parameterText: String, ruleset: Ruleset?): Parser.ParsingError? = 
+            parse(parameterText, ruleset).exception
+
+    }
 
-        private fun parse(parameterText: String, ruleset: Ruleset?): ParseResult = cache.getOrPut(parameterText) {
-            try {
-                val node = Parser.parse(parameterText, ruleset)
-                ParseResult(ICountable.MatchResult.Yes, node, null)
-            } catch (ex: Parser.ParsingError) {
-                ParseResult(ex.severity, null, ex)
+    private enum class ExpressionCacheType { Global, PerGameInfo, ClearingPerGameInfo}
+
+    private abstract class ExpressionCache {
+        abstract fun getOrPut(parameterText: String, ruleset: Ruleset?): ParseResult
+        companion object {
+            fun getInstance(type: ExpressionCacheType): ExpressionCache = when (type) {
+                ExpressionCacheType.Global -> ExpressionCacheGlobal()
+                ExpressionCacheType.PerGameInfo -> ExpressionCachePerGameInfo()
+                ExpressionCacheType.ClearingPerGameInfo -> ExpressionCacheClearingPerGameInfo()
+            }
+        }
+        fun MutableMap<String, ParseResult>.defaultGetOrPut(parameterText: String, ruleset: Ruleset?): ParseResult {
+            return getOrPut(parameterText) {
+                try {
+                    val node = Parser.parse(parameterText, ruleset)
+                    ParseResult(ICountable.MatchResult.Yes, node, null)
+                } catch (ex: Parser.ParsingError) {
+                    ParseResult(ex.severity, null, ex)
+                }
             }
         }
-        
-        fun getParsingError(parameterText: String, ruleset: Ruleset?): Parser.ParsingError? = 
-            parse(parameterText, ruleset).exception
+    }
+
+    private class ExpressionCacheGlobal : ExpressionCache() {
+        private val cache = mutableMapOf<String, ParseResult>()
+        override fun getOrPut(parameterText: String, ruleset: Ruleset?) = cache.defaultGetOrPut(parameterText, ruleset)
+    }
+
+    private class ExpressionCachePerGameInfo : ExpressionCache() {
+        private val caches = LinkedHashMap<Int, MutableMap<String, ParseResult>>(1)
+
+        override fun getOrPut(parameterText: String, ruleset: Ruleset?): ParseResult {
+            val key = UncivGame.getGameInfoOrNull().hashCode()
+            val cache = caches.getOrPut(key) { mutableMapOf() }
+            return cache.defaultGetOrPut(parameterText, ruleset)
+        }
+    }
+
+    private class ExpressionCacheClearingPerGameInfo : ExpressionCache() {
+        private val cache = mutableMapOf<String, ParseResult>()
+        private var key = 0
+
+        override fun getOrPut(parameterText: String, ruleset: Ruleset?): ParseResult {
+            val newKey = UncivGame.getGameInfoOrNull().hashCode()
+            if (newKey != key) {
+                key = newKey
+                cache.clear()
+            }
+            return cache.defaultGetOrPut(parameterText, ruleset)
+        }
     }
 }
... just clear the cache when the loaded game changes? Giving the cache its own file with a similar interface so the implementation detail is encapsulated could be good anyway, even if the implementation stays trivial? Might be a challenge to **not** expose the `Node` class...

@yairm210
Copy link
Owner

Found the problem...
"2 *** 3" is one of the parameters.
getErrorSeverity on MatchResult Maybe, returns a PossibleFilteringUnique.
So the mod checker looks at it, says "ok that could be a filtering unique", checks and indeed it's one of the parameters in another unique ("happens" to be this exact one) so it says "OK filtering it is"...

This is because Expression.matches("2 *** 3", ruleset) returns Maybe. This is a bug, since this is obviously NOT a valid expression.
THIS is because SyntaxError is subclassed to exceptions like: UnmatchedParentheses, MissingOperand, InvalidConstant, which are NOT "maybe this is a problem" but "this is definitely a problem".

TL;DR, in the current setup there can be no "maybe this is an expression", since all "maybe" turns into "yes", so I'm removing that option entirely so we get user visibility on expression parsing for all failed countables

@touhidurrr
Copy link
Contributor

touhidurrr commented Apr 17, 2025

? This is not about translations at all, in fact, the problem of internationalized presentation is knowingly ignored/postponed

Opps. I meant unique but said translations instead. Or maybe thought both are similar. How are we handling uniques currently by the way. More regex?

Always wanted to contribute uniques code in Unciv. Where to start?

@yairm210
Copy link
Owner

@touhidurrr
Uniques parsing is done via separating out the conditionals (<...>) and taking the base uniques, taking out the placeholders and matching both the base unique and the conditionals to unique types.
There are a few regex-y parts but it's mostly string parsing (see String.getPlaceholderParams)
Regarding work, there's nothing missing from that system currently, it Just Works ™️

@SomeTroglodyte
I see why you introduced the concept of Maybe, it's so the parser could determine if a countable within the expression is bad, right? Unfortunately that doesn't actually give us what we want, since any text can theoretically be a resource name, so with "ruleset-less" parsing we'd always get "any conditional is a potential resource name".

Instead, we should return to the way it used to work - with only booleans. Either it matches or it doesn't.
How should the parser deal with this? By accepting that "this thing wrapped in square brackets is a countable token".
Is that countable actually acceptable? That's only knowable on a ruleset basis, therefore: NOT MY BUSINESS!

So when should we actually validate that? Only when checking the unique parameters, at which point we DO have a ruleset.
So basically:

  • The tokenizer does not accept a Ruleset. It just knows "looks like countable or not"
  • Expressions.parse() therefore catches all errors except for ruleset-specific ones (countables are ok)
  • Expressions.getErrorSeverity(), which ALWAYS gets a ruleset, is an if: if it's unparseable, bad; Else, if any countable bad, bad; Else, good.

This means that apart from eval() which gives a double, Node also needs to have another overridable function for validity checking which accepts a ruleset. This is DFS-y in nature which is super simple since you have numeric constant (true), unary (validity = operand.validity), binary (validity = left.validity && right.validity), and countable (here's where the magic happens).

@yairm210
Copy link
Owner

Now that I think of it this would also resolve the reservations I have with intra-ruleset caching
If the parsing is entirely ruleset-agnostic and rulesets are only relevant when validating uniques and evaluating, then the cache is AOK :)

@SomeTroglodyte
Copy link
Collaborator Author

This is a bug, since this is obviously NOT a valid expression.

Very true 👀

in the current setup there can be no "maybe this is an expression"

Doesn't MalformedCountable still count as "maybe"? 🤔

I see why you introduced the concept of Maybe

...and I introduced it in a PR named "RFC" because I fully expected the names might need to change. I introduced it so a Countable like FilteredStuff when presented with "[crap] Stuff" could say "well YES the pattern is one of mine BUT it's either badly parameterized or I can't check parameters right now". So.... The AST parser might need to postpone the error to eval time? Or would we need 4 levels and split the enum into "NotMine, MineButBad, OKButCantCkeckWithoutRuleset, Perfect"? No, was that why I chose to already pass the ruleset into the tokenizer? NO, that was purely to accomodate that 🤬 patternless TileResources. A conundrum.

rulesets are only relevant when validating uniques and evaluating

Sounds about right as I envisioned it - except the TileResources thing.

since any text can theoretically be a resource name

Ah yes exactly, you nailed it. I can see more clearly now the rain is gone 🎶 .

DFS-y in nature

DFS as in Distributed file system? Dynamic frequency scaling?

...
But that begins to smell like a two-level cache: Caching an AST with all Countable terms only having a generic representation, not already a reference to the responsible instance, would still help, but for performance we would also want an AST cached with all handlers as resolved as possible. First full validation or eval could replace the cache entry?

Anyway, good thinking. For me to implement these ideas, however, you'd have to holler and/or beg, at the moment I'm more driven to clean up my "how I want to play" branch in another project or work on my unseen movies backlog.

To think I only dove into an actual implementation 'cuz some outsider (in precisely that "other project") needled me into trying ChatGPT for my first time, then half-earnestly I tried to ask it about evaluators and after a few back and forths I recognized the code it presented verbatim... I should have resisted maybe.

@yairm210
Copy link
Owner

This can be "pending" potentially forever, don't change your plans
As for double cache, I think that's overthinking for now. It's not like we cache Countable resolution currently. This sounds like more of the same of what we have.

@SomeTroglodyte
Copy link
Collaborator Author

not like we cache Countable resolution currently

Does. The AST stores a Node.Countable which already knows which instance handles eval - its countable field is strongly typed and non-nullable...

@yairm210
Copy link
Owner

I mean "currently for countables in main branch".
So I'm ok with not having resolution in the tree as well

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants