Skip to content

Unnecessary mangling of anyType columns #88

@AdamantiumKiwi

Description

@AdamantiumKiwi

Our source Salesforce has a number of __History objects that track changes to individual columns in the actual object e.g. "MyObject__c" has a "MyObject__History".

The columns in all the __History objects are the same:

Column Name Data type
Id id
IsDeleted boolean
ParentId reference
CreatedById reference
CreatedDate datetime
Field picklist
DataType picklist
OldValue anyType
NewValue anyType

The issue is that the Old and New values are derived from all the columns and their datatypes from the base object, so the source data can be strings, ints, floats, booleans, dates etc.

The Old and New Values are sent to Varchar fields in the target (Snowflake).

We have some fields in our objects which are text, but contain numeric values e.g. "000356", "1234_1" etc.

In sync.py of tap-salesforce, the last method is fix_record_anytype() which reformats the data for anyType fields using this code (some noise removed for brevity):
(Sorry - the code block isn't working)

`
if schema['properties'][k].get("type") is None:
val = v
val = try_cast(v, int)
val = try_cast(v, float)
if v in ["true", "false"]:
val = (v == "true")

        if v == "":
            val = None

        rec[k] = val

`
This has the following results:

  1. Booleans are sent as "True" or "False" rather than "true" or "false" respectively;
  2. Integers are sent with trailing .0 e.g. 1234 arrives as 1234.0 (note that the try_cast(v,int) is redundant)
  3. If the field is numeric but has leading zeroes, they are stripped and .0 is added e.g. 001234 arrives as 1234.0
  4. If the field has single underscores embedded but is otherwise numeric, the underscores are removed and .0 is appended e.g. 2435_23 arrives as 243523.0
  5. If the field is null it arrives as the string "None"

All of these transformations are incorrect, at least for our use case - we need the data exactly as it is in Salesforce - we reconcile each field for each row. The transformations make the data impossible to reconcile (some can be worked around but still...)

Our solution is to remove the calls to the method above, which produces the proper results by taking the value from the Salesforce CSV file, which by definition arrives as a string, and sending it direct to the target.

(Note that we have our own variation of the tap for Oauth2 reasons...)

A second note is that we have to add a stream-map to get the anyType fields to be transmitted (note that they are renamed from OldValue to OldFieldValue etc).

`
mappers:

  • name: meltano-map-transformer
    variant: meltano
    pip_url: <whatever - we have a clone of the repo>
    namespace: meltano-map-transform
    executable: meltano-map-transform
    mappings:
    • name: stream-maps
      config:
      stream_maps:
      ObjectName:
      OldValue:
      NewValue:
      OldFieldValue: str(OldValue)
      NewFieldValue: str(NewValue)
      SecondObjectName:
      OldValue:
      NewValue:
      OldFieldValue: str(OldValue)
      NewFieldValue: str(NewValue)
      `

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions