Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support index pattern selector syntax in SQL #120845

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Jan 24, 2025

Selector syntax (#118614) was introduced as part of the work to support implementing failure stores in data streams. This is a new feature of index patterns which allows a user to specify which indices inside of a data stream should be used in an action. Selectors are denoted by using the :: separator between a data stream name and the component of the data stream the user wants to target for an operation.

To search a data stream's backing indices, the ::data selector is used:

SELECT * FROM "my-data-stream::data"

To search a data stream's failure indices, the ::failures selector is used:

SELECT * FROM "my-data-stream::failures

By default, when a data stream has no selector specified, the ::data selector is implied to maintain backwards compatibility with current search functionality. The ::data selector primarily exists as a way to explicitly select the backing indices, but is not required for normal usage.

This PR updates the SQL grammar to include the selector portion of an index pattern. The qualifiedIndex() has been updated to include selectors in the resulting expression. Underlying search operations should already support this functionality, so this is primarily wiring it up where needed.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

*/
@Override
public void visitErrorNode(ErrorNode node) {}
/**
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file was regenerated with inconsistent formatting. Is this something that should be checked in as part of this PR? Does this get regenerated as needed during the build?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generated by ANTLR, I typically run a spotlessApply to fix it

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbaiera, I had a look and left a first round of comments.
I think this needs more strict validation at parse time, to make sure that we only accept valid selectors.

I also tried to run queries with normal indices (non-datastreams), and the behavior is not what I expected:

  • with select * from person::data everything works fine
  • with select * from person::failures I still get the data. I expected no such index [person::failures] as in _search, or at least no data
  • with select * from person::foo I still get the data. I expected a validation exception like in _search: Invalid index name [person::foo], invalid usage of :: separator, [foo] is not a recognized selector

@@ -342,6 +342,8 @@ identifier
tableIdentifier
: (catalog=identifier ':')? TABLE_IDENTIFIER
| (catalog=identifier ':')? name=identifier
| TABLE_IDENTIFIER ('::' selector=identifier)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we also want an option that includes the cluster name, ie. <cluster>:<table>::<selector>.
As long as we don't support it at runtime, we should return a meaningful error message


public void testIndexNameClusterSeletorCombined() {
ParsingException e = expectThrows(ParsingException.class, () -> parseStatement("SELECT * FROM cluster:foo::failures"));
assertThat(e.getMessage(), containsString("mismatched input '::' expecting {"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should parse it and return a meaningful error message, as we do in search, eg.

Invalid index name [foo:bar::data], Selectors are not yet supported on remote cluster patterns

@@ -21,15 +21,15 @@
public class UnresolvedRelationTests extends ESTestCase {
public void testEqualsAndHashCode() {
Source source = new Source(between(1, 1000), between(1, 1000), randomAlphaOfLength(16));
TableIdentifier table = new TableIdentifier(source, randomAlphaOfLength(5), randomAlphaOfLength(5));
TableIdentifier table = new TableIdentifier(source, randomAlphaOfLength(5), randomAlphaOfLength(5), randomAlphaOfLength(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a test where we also pass null as selector

@@ -391,6 +392,31 @@ public void testQuotedIndexNameWithQuotedCluster() {
assertEquals("elastic", relation.table().cluster());
}

public void testIndexNameDataSelector() {
Project plan = project(parseStatement("SELECT * FROM foo::data"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests with invalid selectors as well, and expect a meaningful parse time error, like

Invalid index name [foo::bar], invalid usage of :: separator, [bar] is not a recognized selector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants