Skip to content

Conversation

@chewbum
Copy link
Contributor

@chewbum chewbum commented Dec 26, 2025

Why I'm doing:

REGEXP_Position is supported by Trino but not supported by StarRocks
When this function call is passed to StarRocks in my company's scenario, this will result in error

What I'm doing:

Returns the position of the specified occurrence of the regular expression pattern in the string subject

Fixes #67246

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Introduces regexp_position to return the 1-based position of the Nth regex match in a string, with UTF-8 correctness and error handling.

  • BE: Implement StringFunctions::regexp_position with prepare/close, constant-pattern optimization, UTF-8 code-point indexing, support for start_pos and occurrence, returns -1 when not found, and surfaces invalid regex errors.
  • FE: Add REGEXP_POSITION to FunctionSet; analyzer fills default args (start_pos=1, occurrence=1) and resolves builtin via getRegexpPositionFunction.
  • Registration: Wire in gensrc/script/functions.py with INT return and prepare/close hooks.
  • Tests: Add BE unit tests and SQL tests (including invalid regex and multibyte cases).

Written by Cursor Bugbot for commit 75eb4c1. This will update automatically on new commits. Configure here.

@chewbum chewbum requested review from a team as code owners December 26, 2025 03:37
@chewbum chewbum changed the title Implement Regexp_Position in C++ mode [Feature] Implement Regexp_Position in C++ mode Dec 26, 2025
@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@alvin-celerdata
Copy link
Contributor

@mergify rebase

@mergify
Copy link
Contributor

mergify bot commented Dec 27, 2025

rebase

✅ Branch has been successfully rebased

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

context->set_function_state(scope, state);

// check if pattern is constant
if (context->is_constant_column(1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (context->is_constant_column(1)) {
if (!context->is_constant_column(1)) {
return Status::OK();
}


-- name: test_regexp_position

SELECT regexp_position('a.b:c;d', '[.]');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the cases are testing the optimized logic. Please add some cases to test the general path.
And please add some cases to test invalid inputs, like null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Regexp_Position functionality to StarRocks

2 participants