Skip to content

Commit 945f77c

Browse files
chore: Standardise on queryAll operation in Bulk API 2.0 implementation (#77)
## Description Currently, the Bulk API 2.0 implementation uses `query` operation while both REST API and Bulk API 1.0 use `queryAll`. This inconsistency means that Bulk API 2.0 might miss deleted and archived records that are captured by the other API types. ## Current Behavior - REST API uses `queryAll` - Bulk API 1.0 uses `queryAll` - Bulk API 2.0 uses `query` ## Expected Behavior All API types should use `queryAll` to ensure consistent behavior and data completeness across different API implementations. ## Technical Details The change required is in `tap_salesforce/salesforce/bulk2.py`, updating the operation from "query" to "queryAll" in the `_create_job` method. ## References - [Salesforce REST API queryAll Documentation](https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_queryall.htm) - [Salesforce Bulk API Documentation](https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/query_create_job.htm) ## Impact This change will ensure that Bulk API 2.0 returns the same set of records as the other API types, including deleted and archived records. ## Type - [x] Chore - [x] Documentation --------- Co-authored-by: Edgar Ramírez-Mondragón <edgarrm358@gmail.com>
1 parent 2b3fac1 commit 945f77c

3 files changed

Lines changed: 3 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ The `client_id` and `client_secret` keys are your OAuth Salesforce App secrets.
6767

6868
The `start_date` is used by the tap as a bound on SOQL queries when searching for records. This should be an [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) formatted date-time, like "2018-01-08T00:00:00Z". For more details, see the [Singer best practices for dates](https://github.com/singer-io/getting-started/blob/master/BEST_PRACTICES.md#dates).
6969

70-
The `api_type` is used to switch the behavior of the tap between using Salesforce's "REST", "BULK" and "BULK 2.0" APIs. When new fields are discovered in Salesforce objects, the `select_fields_by_default` key describes whether or not the tap will select those fields by default.
70+
The `api_type` is used to switch the behavior of the tap between using Salesforce's "REST", "BULK" and "BULK 2.0" APIs (each using the `queryAll` operation to include deleted and archived records). When new fields are discovered in Salesforce objects, the `select_fields_by_default` key describes whether or not the tap will select those fields by default.
7171

7272
The `state_message_threshold` is used to throttle how often STATE messages are generated when the tap is using the "REST" API. This is a balance between not slowing down execution due to too many STATE messages produced and how many records must be fetched again if a tap fails unexpectedly. Defaults to 1000 (generate a STATE message every 1000 records).
7373

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
setup(
66
name="tap-salesforce",
7-
version="1.8.0",
7+
version="1.9.0",
88
description="Singer.io tap for extracting data from the Salesforce API",
99
author="Stitch",
1010
url="https://singer.io",

tap_salesforce/salesforce/bulk2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ def _create_job(self, catalog_entry, state):
3636
query = self.sf._build_query_string(catalog_entry, start_date, order_by_clause=False)
3737

3838
body = {
39-
"operation": "query",
39+
"operation": "queryAll",
4040
"query": query,
4141
}
4242

0 commit comments

Comments
 (0)