Skip to content

Support for more string operations#60

Open
jurajsic wants to merge 20 commits into
mainfrom
more_functions
Open

Support for more string operations#60
jurajsic wants to merge 20 commits into
mainfrom
more_functions

Conversation

@jurajsic

@jurajsic jurajsic commented May 14, 2026

Copy link
Copy Markdown
Member

This PR adds support for the string functions trim, trim_left, trim_right, upper, lower. It also fixes the behaviour of substring for the case that length arg is negative and removes the handling of concat as it behaves like join which we currently cannot handle.

To be able to handle the new functions, I had to extend the structure Function by a member that takes the resulting "call" of the SMT function and returns extra axioms (asserts, fresh vars declarations). Furthermore, I added function GetBuiltinDecls() that returns the declarations/definitions of the mapped SMT functions.

I also added a new option for smt converter, -case-unicode, if passed to the converter, it will handle upper/lower function for the whole Unicode, otherwise only for ASCII.

@jurajsic jurajsic marked this pull request as ready for review May 14, 2026 12:26
@jurajsic jurajsic requested review from Copilot and vhavlena May 14, 2026 12:26
@jurajsic jurajsic requested a review from hohran May 14, 2026 12:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends SMT translation and type handling for additional Rego string operations, including trim variants, case conversion, and substring negative-length behavior, while removing direct concat handling.

Changes:

  • Adds SMT mappings/declarations and constraints for trim, trim_left, trim_right, substring, upper, and lower.
  • Adds -case-unicode CLI support for Unicode-aware case conversion.
  • Adds end-to-end tests for new string operation behavior and updates related type/inference tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/smt/main.go Adds CLI flag for Unicode case handling.
pkg/analysis/string_operations.go Tracks trim-left/right as string operations.
pkg/infer/infer_test.go Updates builtin hint expectations after concat removal.
pkg/smt/e2e_inline_test.go Adds end-to-end tests for string operations.
pkg/smt/exprs.go Routes builtin calls through constraint-aware handling.
pkg/smt/functions.go Adds SMT function mappings, declarations, and constraints for string operations.
pkg/smt/representation.go Updates SMT string literal escaping for Unicode.
pkg/smt/run_policy_test.go Adds test helper for Unicode case mode.
pkg/smt/translator.go Initializes translators with builtin SMT declarations.
pkg/smt/translator_test.go Adjusts test setup for builtin declarations.
pkg/types/predef_functions.go Updates predefined function typing for string builtins.
pkg/types/type_analysis_test.go Removes obsolete concat-related type test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/smt/functions.go
Comment on lines +14 to +20
// UseUnicodeCase controls whether to_lower/to_upper should operate on full Unicode
// or only ASCII. Default is false (ASCII-only behavior).
var UseUnicodeCase bool = false

// SetUseUnicodeCase sets the package-wide flag controlling case handling.
func SetUseUnicodeCase(v bool) {
UseUnicodeCase = v

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it after #54 so I can use the translator options.

Comment thread pkg/smt/functions.go Outdated
Comment thread pkg/smt/functions.go
chore: typo

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

@hohran hohran left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR refines the handling of multiple string operations. Most notably, i find the lower and upper functions to work as expected in all tests I tried.
For the other functions, specifically trim variants and substring, I found them working only when having all arguments constant. Introducing a variable argument most usually resulted in them behaving as uninterpreted functions.
I know you somehow mentioned this, I just wanted to bring it up, because I did not see any hints at that in the code (so that we do not forget about it later).

Comment thread pkg/smt/exprs.go
value SmtValue
}

func (et *ExprTranslator) callBuiltin(op *Function, params []*SmtValue) (*SmtValue, error) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what i see, this function also possibly handles user-defined functions. Is there any reason to have it named this way? Do you plan on handling user-defined functions differently?

Comment thread pkg/smt/functions.go
lenS := fmt.Sprintf("(str.len %s)", s.String())
correctNegative := fmt.Sprintf("(str.substr %s %s %s)", s.String(), i.String(), lenS)
correctNonNeg := fmt.Sprintf("(str.substr %s %s %s)", s.String(), i.String(), j.String())
iteExpr := fmt.Sprintf("(ite (< %s 0) %s %s)", j.String(), correctNegative, correctNonNeg)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer using the SmtValue functions, such as Ite, or Equals

@vhavlena vhavlena linked an issue May 15, 2026 that may be closed by this pull request
@jurajsic

Copy link
Copy Markdown
Member Author

This PR refines the handling of multiple string operations. Most notably, i find the lower and upper functions to work as expected in all tests I tried. For the other functions, specifically trim variants and substring, I found them working only when having all arguments constant. Introducing a variable argument most usually resulted in them behaving as uninterpreted functions. I know you somehow mentioned this, I just wanted to bring it up, because I did not see any hints at that in the code (so that we do not forget about it later).

Other than the second argument of trim (and its variants), it all should work with variables. Do you have some examples where it does not work? I will also play around more with it, I tested mostly with constants.

@hohran

hohran commented May 15, 2026

Copy link
Copy Markdown
Collaborator
package example

x := " - admin"
trimmed := trim_left(x, " -")
substr := substring(x, 0, 3)

trimmed is evaluated to "a", or sometimes empty string
substr is empty string

@jurajsic

Copy link
Copy Markdown
Member Author

That is a bug related to #61, x turns into the variable __lv0, but it is handled trough let for the assert coming from the equation while it is "the global __lv0 for the asserts created for trim_left:

(assert (str.in_re (__trim_left (str __lv1) " -") (re.union (re.* (re.diff re.allchar (re.union (str.to_re " ") (str.to_re "-")))) (re.++ (re.diff re.allchar (re.union (str.to_re " ") (str.to_re "-"))) re.all))))
(assert (= trimmed (let ((__lv1 x)) (let ((__lv0 (OString (__trim_left (str __lv1) " -")))) __lv0))))

This is a problem, the local variables have to be handled trough let, so I will have to come up with a different way to handle these functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend SMT translation to cover more built-in Rego functions

3 participants