Skip to content

[Feature][connector-elasticsearch] elasticsearch source support PIT #9150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 16, 2025

Conversation

CosmosNi
Copy link
Contributor

elasticsearch source support PIT

close #9149

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@hailin0 hailin0 requested a review from Copilot April 11, 2025 12:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • seatunnel-e2e/seatunnel-connector-v2-e2e/connector-elasticsearch-e2e/src/test/resources/elasticsearch/elasticsearch_source_with_pit.conf: Language not supported

Comment on lines 984 to 986
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));
Copy link
Preview

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'sortField' is assigned but never used; consider removing it to clean up the code.

Suggested change
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hailin0
Copy link
Member

hailin0 commented Apr 11, 2025

Do you have more information on PIT vs scroll?

@CosmosNi
Copy link
Contributor Author

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.

  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.

  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

@@ -87,4 +88,25 @@ public class ElasticsearchSourceOptions extends ElasticsearchBaseOptions {
Collections.singletonMap("match_all", new HashMap<String, String>()))
.withDescription(
"Elasticsearch query language. You can control the range of data read");

public static final Option<Boolean> USE_PIT =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add new enum type named search_api_type? Support scorll and pit and set scroll by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get

@hailin0
Copy link
Member

hailin0 commented Apr 12, 2025

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.
  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.
  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

Which versions of elasticsearch can be used?

@CosmosNi
Copy link
Contributor Author

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.
  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.
  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

Which versions of elasticsearch can be used?

7.10

@@ -30,7 +30,7 @@ support version >= 2.x and <= 8.x.
| index_list | array | no | used to define a multiple table task |
| source | array | no | - |
| query | json | no | {"match_all": {}} |
| search_type | json | no | Search method,sql or dsl,default dsl |
| search_type | json | no | Search method,SQL、PIT、SCROLL,default SCROLL |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_type and search_api_type not same.
search_type : SQL, DSL
search_api_type: SCROLL, PIT

@CosmosNi CosmosNi requested a review from Hisoka-X April 14, 2025 06:01
Comment on lines 984 to 986
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +68 to +69
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
SEARCH_API_TYPE,

Comment on lines 116 to 119
}
// DSL query
else {
// Check if we should use PIT API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// DSL query
else {
// Check if we should use PIT API
} else {
// Check if we should use PIT API

Comment on lines 123 to 125
}
// Default scroll API
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// Default scroll API
else {
} else {

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CosmosNi . LGTM if ci passes.

@hailin0 hailin0 merged commit 948d588 into apache:dev Apr 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][connector-elasticsearch] elasticsearch source support PIT
3 participants