Our source Salesforce has a number of __History objects that track changes to individual columns in the actual object e.g. "MyObject__c" has a "MyObject__History".
The columns in all the __History objects are the same:
| Column Name |
Data type |
| Id |
id |
| IsDeleted |
boolean |
| ParentId |
reference |
| CreatedById |
reference |
| CreatedDate |
datetime |
| Field |
picklist |
| DataType |
picklist |
| OldValue |
anyType |
| NewValue |
anyType |
The issue is that the Old and New values are derived from all the columns and their datatypes from the base object, so the source data can be strings, ints, floats, booleans, dates etc.
The Old and New Values are sent to Varchar fields in the target (Snowflake).
We have some fields in our objects which are text, but contain numeric values e.g. "000356", "1234_1" etc.
In sync.py of tap-salesforce, the last method is fix_record_anytype() which reformats the data for anyType fields using this code (some noise removed for brevity):
(Sorry - the code block isn't working)
`
if schema['properties'][k].get("type") is None:
val = v
val = try_cast(v, int)
val = try_cast(v, float)
if v in ["true", "false"]:
val = (v == "true")
if v == "":
val = None
rec[k] = val
`
This has the following results:
- Booleans are sent as "True" or "False" rather than "true" or "false" respectively;
- Integers are sent with trailing .0 e.g. 1234 arrives as 1234.0 (note that the try_cast(v,int) is redundant)
- If the field is numeric but has leading zeroes, they are stripped and .0 is added e.g. 001234 arrives as 1234.0
- If the field has single underscores embedded but is otherwise numeric, the underscores are removed and .0 is appended e.g. 2435_23 arrives as 243523.0
- If the field is null it arrives as the string "None"
All of these transformations are incorrect, at least for our use case - we need the data exactly as it is in Salesforce - we reconcile each field for each row. The transformations make the data impossible to reconcile (some can be worked around but still...)
Our solution is to remove the calls to the method above, which produces the proper results by taking the value from the Salesforce CSV file, which by definition arrives as a string, and sending it direct to the target.
(Note that we have our own variation of the tap for Oauth2 reasons...)
A second note is that we have to add a stream-map to get the anyType fields to be transmitted (note that they are renamed from OldValue to OldFieldValue etc).
`
mappers:
- name: meltano-map-transformer
variant: meltano
pip_url: <whatever - we have a clone of the repo>
namespace: meltano-map-transform
executable: meltano-map-transform
mappings:
- name: stream-maps
config:
stream_maps:
ObjectName:
OldValue:
NewValue:
OldFieldValue: str(OldValue)
NewFieldValue: str(NewValue)
SecondObjectName:
OldValue:
NewValue:
OldFieldValue: str(OldValue)
NewFieldValue: str(NewValue)
`
Our source Salesforce has a number of __History objects that track changes to individual columns in the actual object e.g. "MyObject__c" has a "MyObject__History".
The columns in all the __History objects are the same:
The issue is that the Old and New values are derived from all the columns and their datatypes from the base object, so the source data can be strings, ints, floats, booleans, dates etc.
The Old and New Values are sent to Varchar fields in the target (Snowflake).
We have some fields in our objects which are text, but contain numeric values e.g. "000356", "1234_1" etc.
In sync.py of tap-salesforce, the last method is fix_record_anytype() which reformats the data for anyType fields using this code (some noise removed for brevity):
(Sorry - the code block isn't working)
`
if schema['properties'][k].get("type") is None:
val = v
val = try_cast(v, int)
val = try_cast(v, float)
if v in ["true", "false"]:
val = (v == "true")
`
This has the following results:
All of these transformations are incorrect, at least for our use case - we need the data exactly as it is in Salesforce - we reconcile each field for each row. The transformations make the data impossible to reconcile (some can be worked around but still...)
Our solution is to remove the calls to the method above, which produces the proper results by taking the value from the Salesforce CSV file, which by definition arrives as a string, and sending it direct to the target.
(Note that we have our own variation of the tap for Oauth2 reasons...)
A second note is that we have to add a stream-map to get the anyType fields to be transmitted (note that they are renamed from OldValue to OldFieldValue etc).
`
mappers:
variant: meltano
pip_url: <whatever - we have a clone of the repo>
namespace: meltano-map-transform
executable: meltano-map-transform
mappings:
config:
stream_maps:
ObjectName:
OldValue:
NewValue:
OldFieldValue: str(OldValue)
NewFieldValue: str(NewValue)
SecondObjectName:
OldValue:
NewValue:
OldFieldValue: str(OldValue)
NewFieldValue: str(NewValue)
`